Hive中Map函数的应用

当一个用户拥有多个标签(私有属性)或一个标签拥有多个用户时,可以使用map来维护复杂的关系。

建表

drop table if exists hive_map;
CREATE TABLE hive_map(
id INT,
name MAP < STRING, STRING >
)
STORED AS PARQUET;

Hive插入

只能使用MR引擎,否则报错。Impala无法使用str_to_map函数。

insert into table hive_map
select 1,str_to_map(concat_ws(',',collect_list(concat("a",":","a1"))));

insert into table hive_map
select 2,str_to_map(concat_ws(',',collect_list(concat("b",":","a1"))));

insert into table hive_map
select
id,str_to_map(concat_ws(',',collect_list(concat(key,":",value)))) as name
from (
select 3 as id,'aa' as key,'aa1' as value
union ALL
select 3 as id,'bb' as key,'bb2' as value
union ALL
select 3 as id,'cc' as key,'cc3' as value
)t1
group by id;

Hive查询

hive 能直接获取对应的kv关系,但不能直接判断key/value存在。

select id,name["a"] from hive_map where name["a"] = "a1";

Hive应用

假设有一张表,表名为t,其中字段params的数据类型是map,其map的具体k-v对如下:

{'k0':'abc','k1':'01,02,03','k2':'456'}
  1. size(Map)函数:可得map的长度。返回值类型:int

    select size(t.params);
    >> 3
  2. map_keys(Map)函数:可得map中所有的key; 返回值类型: array

    select map_keys(t.params);
    >> ["k0","k1","k2"]
  3. map_values(Map)函数:可得map中所有的value; 返回值类型: array

    select map_value(t.params);
    >> ["abc","01,02,03","456"]
  4. 判断map中是否包含某个key值:

    select array_contains(map_keys(t.params),'k0');
    >> true
  5. 在k-v对中,若value有多个值的情况,如 {**‘k1’:’01,02,03’} ,如果要用 ‘k1’ 中 ‘02’作为过滤条件,则语句如下:

    (这里用到split来处理)

    select * 
    from t
    where split(t.params['k1'],',')[1]
    >> 02
  6. 如果过滤条件为:k2的值必须为’45’开头,则语句如下:

    (这里用到substr方法来处理,这里注明一下,1和2分别表示起始位置和长度)

    select * 
    from t
    where substr(t.params['k2'],1,2) = '45'
Author: Tunan
Link: http://yerias.github.io/2021/06/03/hive/25/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.