纯净、安全、绿色的下载网站

首页|软件分类|下载排行|最新软件|IT学院

当前位置:首页IT学院IT技术

Impala的count(distinct QUESTION_ID) 与ndv(QUESTION_ID)

张江英   2020-12-29 我要评论

在impala中,一个select执行多个count(distinct col)会报错,举例:

select C_DEPT2,
         count(distinct QUESTION_BUSI_ID) as wo_num,
         count(distinct CREATOR_ID) as creator_num
  from pdm.kudu_q_basic
 where substr(CREATE_DATE, 1, 7) = '2020-10'
 group by C_DEPT2

报错信息:

ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT QUESTION_BUSI_ID); deviating function: count(DISTINCT CREATOR_ID)
Consider using NDV() instead of COUNT(DISTINCT) if estimated counts are acceptable. Enable the APPX_COUNT_DISTINCT query option to perform this rewrite automatically.

这时候,可通过以下方法解决:

1、得到的是近似值,数据量越大越不准确:

(1)SQL运行前,先运行命令:set APPX_COUNT_DISTINCT=true;

set APPX_COUNT_DISTINCT=true;
select C_DEPT2,
       count(distinct QUESTION_BUSI_ID) as wo_num,
       count(distinct CREATOR_ID) as creator_num
  from pdm.kudu_q_basic
 where substr(CREATE_DATE, 1, 7) = '2020-10'
 group by C_DEPT2
 order by C_DEPT2

 

 

(2)将count(distinct col)用函数ndv(col)代替

select C_DEPT2,
       ndv(QUESTION_BUSI_ID) as wo_num,
       ndv(CREATOR_ID) as creator_num
  from pdm.kudu_q_basic
 where substr(CREATE_DATE, 1, 7) = '2020-10'
 group by C_DEPT2
 order by C_DEPT2

需要注意的是,在set APPX_COUNT_DISTINCT=true;的情况下,使用count(distinct col)会自动转化成ndv(col),得到的是近似值,所以以上两种方法的结果数据一致。

2、精确值。拆分为子查询,再关联,如下:

set APPX_COUNT_DISTINCT = false; -- 将参数置为false,使用count(distinct col),确保不会转化成ndv(col)
select a.C_DEPT2, a.wo_num, b.creator_num from (select C_DEPT2, count(distinct QUESTION_BUSI_ID) as wo_num from pdm.kudu_q_basic where substr(CREATE_DATE, 1, 7) = '2020-10' group by C_DEPT2) a left join (select C_DEPT2, count(distinct CREATOR_ID) as creator_num from pdm.kudu_q_basic where substr(CREATE_DATE, 1, 7) = '2020-10' group by C_DEPT2) b on a.C_DEPT2 = b.C_DEPT2 order by a.C_DEPT2

 

 

 

验证:

select C_DEPT2, count(*)
  from pdm.kudu_q_basic -- 表中无重复数据
 where substr(CREATE_DATE, 1, 7) = '2020-10'
 group by C_DEPT2
 order by C_DEPT2

 

 

总结:解决在impala中一个select执行多个count(distinct col)报错问题,可以用过设置参数set APPX_COUNT_DISTINCT = true;或将count(distinct col)用ndv(col)解决,但得到的是近似值,不准确。还可以通过分别在子查询中进行count(distinct col)再关联得到准确值,但要注意参数 APPX_COUNT_DISTINCT = false,不然会自动转化为ndv(col)得到的还是近似值。

 


相关文章

猜您喜欢

  • pandas 数据类型转换的实现

    本文着重讲解了pandas 数据类型转换的实现,文中通过代码实例讲解的非常细致,对大家的工作和学习具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧..
  • vue+element UI实现树形表格

    本文着重为大家仔细讲解了vue+element UI实现树形表格,文中代码实例讲解的非常细致,希望能够帮助到您,欢迎大家阅读和收藏..

网友评论

Copyright 2020 www.kgesoft.com 【K歌软件站】 版权所有 软件发布

声明:所有软件和文章来自软件开发商或者作者 如有异议 请与本站联系 点此查看联系方式