hive运算时类型自动转化问题_资讯

hive运算时类型自动转化问题

创始人

2025-05-31 04:42:48

比如table id int ,name string

我们可以

select * from table where id='1'

select * from table where name=1

select * from table where name=cast(1 as decimal)

这些都不会报错，因为涉及到了类型的自动转化。

但是当我们有 nvl(1,'1')时结果类型时什么样呢？研究下。

以 +好运算符为例。

打开hive源码发现两个类

GenericUDFOPPlus

GenericUDFOPNumericPlus

一时间我还分不清我们平常使用 +好运算符是哪个类找了半天发现是

GenericUDFBaseNumeric

@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {if (arguments.length != 2) {throw new UDFArgumentException(opName + " requires two arguments.");}for (int i = 0; i < 2; i++) {Category category = arguments[i].getCategory();if (category != Category.PRIMITIVE) { --只能是基础的类型throw new UDFArgumentTypeException(i, "The "+ GenericUDFUtils.getOrdinal(i + 1)+ " argument of " + opName + "  is expected to a "+ Category.PRIMITIVE.toString().toLowerCase() + " type, but "+ category.toString().toLowerCase() + " is found");}}// During map/reduce tasks, there may not be a valid HiveConf from the SessionState.// So lookup and save any needed conf information during query compilation in the Hive conf// (where there should be valid HiveConf from SessionState).  Plan serialization will ensure// we have access to these values in the map/reduce tasks.if (confLookupNeeded) {CompatLevel compatLevel = HiveCompat.getCompatLevel(SessionState.get().getConf());ansiSqlArithmetic = compatLevel.ordinal() > CompatLevel.HIVE_0_12.ordinal();confLookupNeeded = false;}--获取左右两边的值leftOI = (PrimitiveObjectInspector) arguments[0];rightOI = (PrimitiveObjectInspector) arguments[1];--获取结果类型 继续看这个方法resultOI = PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(deriveResultTypeInfo());converterLeft = ObjectInspectorConverters.getConverter(leftOI, resultOI);converterRight = ObjectInspectorConverters.getConverter(rightOI, resultOI);--注意这里 是根据resultOi取将leftOi和rightOi转化为这个类型return resultOI;
}

deriveResultTypeInfo

private PrimitiveTypeInfo deriveResultTypeInfo() throws UDFArgumentException {PrimitiveTypeInfo left = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(leftOI);--左边是什么类型PrimitiveTypeInfo right = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(rightOI);--右边是什么类型if (!FunctionRegistry.isNumericType(left) || !FunctionRegistry.isNumericType(right)) { --如果左右两边有个不是number类型就报错List argTypeInfos = new ArrayList(2);argTypeInfos.add(left);argTypeInfos.add(right);throw new NoMatchingMethodException(this.getClass(), argTypeInfos, null);}
--注意这里说了据 如果左右两边有个不准确的类型(str,dou,float) 就是resultOI=double// If any of the type isn't exact, double is chosen.if (!FunctionRegistry.isExactNumericType(left) || !FunctionRegistry.isExactNumericType(right)) {
--这个方法是说 如果左右两边有string 我们先转化为double
--如果有个null 我们就返回另外一个类型。也就是double或float(string上一步已经是double)
--如果此时 是float 和double 那就看谁的level高 可以看到float高于doublereturn deriveResultApproxTypeInfo();}
--这里就是准确类型相加 一般是int decimal long
--首先看有无null 有null就取另外一遍类型。
--再看类型相等就取同样的类型
--再看类型级别。byte>short>int>long>decimal 。不一样取 高级别return deriveResultExactTypeInfo();
}

deriveResultApproxTypeInfo

protected PrimitiveTypeInfo deriveResultApproxTypeInfo() {PrimitiveTypeInfo left = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(leftOI);PrimitiveTypeInfo right = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(rightOI);// string types get converted to doubleif (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(left.getPrimitiveCategory())== PrimitiveGrouping.STRING_GROUP) {left = TypeInfoFactory.doubleTypeInfo;}if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(right.getPrimitiveCategory())== PrimitiveGrouping.STRING_GROUP) {right = TypeInfoFactory.doubleTypeInfo;}    // Use type promotionPrimitiveCategory commonCat = FunctionRegistry.getPrimitiveCommonCategory(left, right);if (commonCat == PrimitiveCategory.DECIMAL) {// Hive 0.12 behavior where double * decimal -> decimal is gone.return TypeInfoFactory.doubleTypeInfo;} else if (commonCat == null) {return TypeInfoFactory.doubleTypeInfo;} else {return left.getPrimitiveCategory() == commonCat ? left : right;}
}

deriveResultExactTypeInfo

/*** Default implementation for getting the exact type info for the operator result. It worked for all* but divide operator.** @return*/
protected PrimitiveTypeInfo deriveResultExactTypeInfo() {PrimitiveTypeInfo left = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(leftOI);PrimitiveTypeInfo right = (PrimitiveTypeInfo) TypeInfoUtils.getTypeInfoFromObjectInspector(rightOI);// Now we are handling exact types. Base implementation handles type promotion.PrimitiveCategory commonCat = FunctionRegistry.getPrimitiveCommonCategory(left, right);if (commonCat == PrimitiveCategory.DECIMAL) {return deriveResultDecimalTypeInfo();} else {return left.getPrimitiveCategory() == commonCat ? left : right;}
}

deriveResultDecimalTypeInfo

/*** Derive the object inspector instance for the decimal result of the operator.*/
protected DecimalTypeInfo deriveResultDecimalTypeInfo() {int prec1 = leftOI.precision();int prec2 = rightOI.precision();int scale1 = leftOI.scale();int scale2 = rightOI.scale();return deriveResultDecimalTypeInfo(prec1, scale1, prec2, scale2);
}

计算精度可以详见我之前的文章。hive3.1decimal计算详细逻辑_hive decimal_cclovezbf的博客-CSDN博客

public static PrimitiveCategory getPrimitiveCommonCategory(TypeInfo a, TypeInfo b) {if (a.getCategory() != Category.PRIMITIVE || b.getCategory() != Category.PRIMITIVE) {return null;}--PrimitiveCategory pcA = ((PrimitiveTypeInfo)a).getPrimitiveCategory();PrimitiveCategory pcB = ((PrimitiveTypeInfo)b).getPrimitiveCategory();
--先看是不是都是基本类型if (pcA == pcB) {// Same primitive categoryreturn pcA;}if (pcA == PrimitiveCategory.VOID) {// Handle NULL, we return the type of pcBreturn pcB;}if (pcB == PrimitiveCategory.VOID) {// Handle NULL, we return the type of pcAreturn pcA;}PrimitiveGrouping pgA = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(pcA);PrimitiveGrouping pgB = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(pcB);
--看看左右类型是不是同一个组 int/long/decimal/float/doub 都是number组
--string varchar 属于string组if (pgA == pgB) {// Equal groups, return what we can handleswitch (pgA) {case NUMERIC_GROUP: {Integer ai = TypeInfoUtils.numericTypes.get(pcA);Integer bi = TypeInfoUtils.numericTypes.get(pcB);return (ai > bi) ? pcA : pcB; --这里就看level了}case DATE_GROUP: {Integer ai = TypeInfoUtils.dateTypes.get(pcA);Integer bi = TypeInfoUtils.dateTypes.get(pcB);return (ai > bi) ? pcA : pcB;}case STRING_GROUP: {// handle string types properlyreturn PrimitiveCategory.STRING;}default:break;}}

isNumericType long decimal float double string varchar 都算number类型

public static boolean isNumericType(PrimitiveTypeInfo typeInfo) {switch (typeInfo.getPrimitiveCategory()) {case BYTE:case SHORT:case INT:case LONG:case DECIMAL:case FLOAT:case DOUBLE:case STRING: // String or string equivalent is considered numeric when used in arithmetic operator.  -- 这里说了string在 op 运算时可以看做 numbercase VARCHAR:case CHAR:case VOID: // NULL is considered numeric type for arithmetic operators.return true;default:return false;}
}

isExactNumericType

/*** Check if a type is exact (not approximate such as float and double). String is considered as* double, thus not exact.*  这里说 float double string 都不准确。前两个精度丢失，string 是不确定。* @param typeInfo* @return*/
public static boolean isExactNumericType(PrimitiveTypeInfo typeInfo) {switch (typeInfo.getPrimitiveCategory()) {case BYTE:case SHORT:case INT:case LONG:case DECIMAL:return true;default:return false;}
}

group的概念

case BOOLEAN:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.BOOLEAN_GROUP;
case BYTE:
case SHORT:
case INT:
case LONG:
case FLOAT:
case DOUBLE:
case DECIMAL:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.NUMERIC_GROUP;
case STRING:
case CHAR:
case VARCHAR:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.STRING_GROUP;
case DATE:
case TIMESTAMP:
case TIMESTAMPLOCALTZ:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.DATE_GROUP;
case INTERVAL_YEAR_MONTH:
case INTERVAL_DAY_TIME:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.INTERVAL_GROUP;
case BINARY:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.BINARY_GROUP;
case VOID:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.VOID_GROUP;
default:return PrimitiveObjectInspectorUtils.PrimitiveGrouping.UNKNOWN_GROUP;
}

level

registerNumericType(PrimitiveCategory.BYTE, 1);
registerNumericType(PrimitiveCategory.SHORT, 2);
registerNumericType(PrimitiveCategory.INT, 3);
registerNumericType(PrimitiveCategory.LONG, 4);
registerNumericType(PrimitiveCategory.DECIMAL, 5);
registerNumericType(PrimitiveCategory.FLOAT, 6);
registerNumericType(PrimitiveCategory.DOUBLE, 7);
registerNumericType(PrimitiveCategory.STRING, 8);

到这里我们总结下。 int long float double decimal string void 。(byte varchar char date暂时不考虑)。

1先看左右两边又没有不准确的类型(float double string void)。

1.1如果哪边有类型属于group_string，那么这个category->category.double

1.2左右两边全都是int double dec，属于精确数字计算

2.再比较左右两边类型

2.1 category_left或者category_right有void，那就返回另外一边类型，不用比较group

2.2 category_left == category_right ，那就返回这个类型，不用比较group

2.3 category_left != category_right 看他们属于哪个group (number_group,string_group)。

再比较左右两边的group

2.3.1group_left == group_right，

2.3.1.1如果是number_group 那么就看谁的level高。(string>double>floatdec>long>int)

2.3.1.2如果是string_group 那么就返回commonCatory(double/float)

2.3.2 group_left != group_right

2.3.2.1此时一般都是一边是group_number 一边是group_string，那就返回group_string的commonCatory(double float)

2.3.2.2 如果一边为null一边为group_number 返回commonCatory=null

3. 根据左右两边group 和category比较后返回的类型commonCatory

3.1 如果你是1.1 非精确数字计算

3.1.1 最后两个比较后的resultOi=catetoy_decimal 就返回 double(// Hive 0.12 behavior where double * decimal -> decimal is gone.)

3.1.2commonCatory=null 返回 double，

3.1.3commonCatory前两个，那么就看你的返回类型和leftOi是否一样，一样就返回left 否则返回rightOi

3.2 如果你是精确数字计算

3.2.1 返回结果是 decimla 还需要计算看第5步

3.2.1返回类型不是dec，那么就看你的返回类型和leftOi是否一样，一样就返回left 否则返回rightOi

4.如果是4.2.1 也就是准确数字计算，resultOi 返回decimal，还需要将leftOi和rigthOi转化一下。然后再运算

好了直接实战。

select str + na, -- double null
str + integ, -- double 2
str + lon, -- double 2
str + doub, -- double 2
str + floa, -- double 2.200000047683716
str + dec1, -- double 2
str + dec2, -- double 2
na + integ, -- integer null
na + lon, -- bigint null
na + doub, -- double null
na + floa, -- float null
na + dec1, -- double null
na + dec2, -- double null
lon + doub, -- double 2
lon + floa, -- float 2.2
lon + dec1, -- decimal(38,17) 2.00000000000000000
lon + dec1, -- decimal(38,17) 2.00000000000000000
doub + floa, -- double 2.200000047683716
doub + dec1, -- double 2
doub + dec2, -- double 2
dec1 + dec2 -- decimal(38.17) 2.00000000000000000
from (
select 1 as integ,
1L as lon,
float(1.2) as floa,
'1' as str,
double(1.0) as doub,
cast(1 as decimal(38, 18)) as dec1,
cast(1 as decimal(10, 5)) as dec2,
null as na
) t

先详细解释下几个

string +na ，string根据 1.1 转化为double ，根据2.1返回double

string + float, string根据1.1转化为double，根据 2.3接着2.3.1.1，此时dou和floa都是number_group，判断level double高，返回double，

这里简单来说，string+任何其他数字都是double。

na+dec ，走2.1返回decimal 走 3.1非精确计算，走3.1.1返回double

累了。。。反正第四步decimla转化和后面计算有点累了不看了。

上一篇：EdgerOS之深入解析高效可靠的轻量级服务VSOA

下一篇：土耳其总统：支持俄乌代表团的谈判继续推进

hive运算时类型自动转化问题

相关内容

热门资讯