Hive中的集合數據類型

Hive系列文章

  1. Hive表的基本操作
  2. Hive中的集合數據類型
  3. Hive動態分區詳解
  4. hive中orc格式表的數據導入
  5. Java通過jdbc連接hive
  6. 通過HiveServer2訪問Hive
  7. SpringBoot連接Hive實現自助取數
  8. hive關聯hbase表
  9. Hive udf 使用方法
  10. Hive基于UDF進行文本分詞
  11. Hive窗口函數row number的用法
  12. 數據倉庫之拉鏈表

除了使用礎的數據類型string等,Hive中的列支持使用struct, map, array集合數據類型。

1. Array的使用

創建數據庫表,以array作為數據類型

create table  person(name string,work_locations array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
SQL

數據

biansutao beijing,shanghai,tianjin,hangzhou
linan changchu,chengdu,wuhan

入庫數據

LOAD DATA LOCAL INPATH '/home/hadoop/person.txt' OVERWRITE INTO TABLE person;
SQL

查詢

hive> select * from person;
biansutao       ["beijing","shanghai","tianjin","hangzhou"]
linan   ["changchu","chengdu","wuhan"]
Time taken: 0.355 seconds
hive> select name from person;
linan
biansutao
Time taken: 12.397 seconds
hive> select work_locations[0] from person;
changchu
beijing
Time taken: 13.214 seconds
hive> select work_locations from person;   
["changchu","chengdu","wuhan"]
["beijing","shanghai","tianjin","hangzhou"]
Time taken: 13.755 seconds
hive> select work_locations[3] from person;
NULL
hangzhou
Time taken: 12.722 seconds
hive> select work_locations[4] from person;
NULL
NULL
Time taken: 15.958 seconds

2. Map 的使用

創建數據庫表

create table score(name string, score map<string,int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
SQL

要入庫的數據

biansutao '數學':80,'語文':89,'英語':95
jobs '語文':60,'數學':80,'英語':99

入庫數據

LOAD DATA LOCAL INPATH '/home/hadoop/score.txt' OVERWRITE INTO TABLE score;
SQL

查詢

hive> select * from score;
biansutao       {"數學":80,"語文":89,"英語":95}
jobs    {"語文":60,"數學":80,"英語":99}
Time taken: 0.665 seconds
hive> select name from score;
jobs
biansutao
Time taken: 19.778 seconds
hive> select t.score from score t;
{"語文":60,"數學":80,"英語":99}
{"數學":80,"語文":89,"英語":95}
Time taken: 19.353 seconds
hive> select t.score['語文'] from score t;
60
89
Time taken: 13.054 seconds
hive> select t.score['英語'] from score t;
99
95
Time taken: 13.769 seconds

修改map字段的分隔符

Storage Desc Params:         
    colelction.delim        ##                  
    field.delim             \t                  
    mapkey.delim            =                   
    serialization.format    \t                  

可以通過desc formatted tableName查看表的屬性。
hive-2.1.1中,可以看出colelction.delim,這里是colelction而不是collection,hive里面這個單詞寫錯了,所以還是要按照錯誤的來。

alter table t8 set serdepropertyes('colelction.delim'=',');
SQL

3. Struct 的使用

創建數據表

CREATE TABLE test(id int,course struct<course:string,score:int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
SQL

數據

1 english,80
2 math,89
3 chinese,95

入庫

LOAD DATA LOCAL INPATH '/home/hadoop/test.txt' OVERWRITE INTO TABLE test;
SQL

查詢

hive> select * from test;
OK
1       {"course":"english","score":80}
2       {"course":"math","score":89}
3       {"course":"chinese","score":95}
Time taken: 0.275 seconds
hive> select course from test;
{"course":"english","score":80}
{"course":"math","score":89}
{"course":"chinese","score":95}
Time taken: 44.968 seconds
select t.course.course from test t; 
english
math
chinese
Time taken: 15.827 seconds
hive> select t.course.score from test t;
80
89
95
Time taken: 13.235 seconds

4. 不支持組合的復雜數據類型

我們有時候可能想建一個復雜的數據集合類型,比如下面的a字段,本身是一個Map,它的key是string類型的,value是Array集合類型的。

建表

create table test1(id int,a MAP<STRING,ARRAY<STRING>>)
row format delimited fields terminated by '\t' 
collection items terminated by ','
MAP KEYS TERMINATED BY ':';
SQL

導入數據

1 english:80,90,70
2 math:89,78,86
3 chinese:99,100,82

LOAD DATA LOCAL INPATH '/home/hadoop/test1.txt' OVERWRITE INTO TABLE test1;

這里查詢出數據:

hive> select * from test1;
OK
1   {"english":["80"],"90":null,"70":null}
2   {"math":["89"],"78":null,"86":null}
3   {"chinese":["99"],"100":null,"82":null}
SQL

可以看到,已經出問題了,我們意圖是想"english":["80", "90", "70"],實際上把90和70也當作Map的key了,value值都是空的。分析一下我們的建表語句,collection items terminated by ','制定了集合類型(map, struct, array)數據元素之間分隔符是", ",實際上map也是屬于集合的,那么也會按照逗號分出3個key-value對;由于MAP KEYS TERMINATED BY ':'定義了map中key-value的分隔符是":",第一個“english”可以準確識別,后面的直接把value置為"null"了。




作者:柯廣的網絡日志

微信公眾號:Java大數據與數據倉庫