hadoop - Extract individual column from a HIVE table -


below select query hive table:

select *  test_aviation limit 5;                                                    ok 2015    1   1   1   4   2015-01-01  aa  19805   aa  n787aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0855    -5.00   0.00    0.00    -1  0900-0959   17.00   0912    1230    7.00    1230    1237    7.00    7.00    0.00    0   1200-1259   0.00        0.00    390.00  402.00  378.00  1.00    2475.00 10           2015    1   1   2   5   2015-01-02  aa  19805   aa  n795aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0850    -10.00  0.00    0.00    -1  0900-0959   15.00   0905    1202    9.00    1230    1211    -19.00  0.00    0.00    -2  1200-1259   0.00        0.00    390.00  381.00  357.00  1.00    2475.00 10           2015    1   1   3   6   2015-01-03  aa  19805   aa  n788aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0853    -7.00   0.00    0.00    -1  0900-0959   15.00   0908    1138    13.00   1230    1151    -39.00  0.00    0.00    -2  1200-1259   0.00        0.00    390.00  358.00  330.00  1.00    2475.00 10           2015    1   1   4   7   2015-01-04  aa  19805   aa  n791aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0853    -7.00   0.00    0.00    -1  0900-0959   14.00   0907    1159    19.00   1230    1218    -12.00  0.00    0.00    -1  1200-1259   0.00        0.00    390.00  385.00  352.00  1.00    2475.00 10           2015    1   1   5   1   2015-01-05  aa  19805   aa  n783aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0853    -7.00   0.00    0.00    -1  0900-0959   27.00   0920    1158    24.00   1230    1222    -8.00   0.00    0.00    -1  1200-1259   0.00        0.00    390.00  389.00  338.00  1.00    2475.00 10           time taken: 0.067 seconds, fetched: 5 row(s) 

structure of hive table

hive> describe test_aviation; ok col_value               string                                       time taken: 0.221 seconds, fetched: 1 row(s) 

i want segregate entire table in different columns.i have written query below extract 12th column:

select regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 12)  test_aviation; 

output:

total jobs = 1 launching job 1 out of 1 number of reduce tasks set 0 since there's no reduce operator starting job = job_1437067221195_0008, tracking url = http://localhost:8088/proxy/application_1437067221195_0008/ kill command = /usr/local/hadoop/bin/hadoop job  -kill job_1437067221195_0008 hadoop job information stage-1: number of mappers: 1; number of reducers: 0 2015-07-17 02:46:56,215 stage-1 map = 0%,  reduce = 0% 2015-07-17 02:47:27,650 stage-1 map = 100%,  reduce = 0% ended job = job_1437067221195_0008 errors error during job, obtaining debugging information... job tracking url: http://localhost:8088/proxy/application_1437067221195_0008/ examining task id: task_1437067221195_0008_m_000000 (and more) job job_1437067221195_0008  task failures(4):  ----- task id:   task_1437067221195_0008_m_000000  url:   http://localhost:8088/taskdetails.jsp?jobid=job_1437067221195_0008&tipid=task_1437067221195_0008_m_000000 ----- diagnostic messages task: error: java.lang.runtimeexception: org.apache.hadoop.hive.ql.metadata.hiveexception: hive runtime error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\taa\t19805\taa\tn787aa\t1\tjfk\tnew york\t ny\tny\t36\tnew york\t22\tlax\tlos angeles\t ca\tca\t06\tcalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}     @ org.apache.hadoop.hive.ql.exec.mr.execmapper.map(execmapper.java:195)     @ org.apache.hadoop.mapred.maprunner.run(maprunner.java:54)     @ org.apache.hadoop.mapred.maptask.runoldmapper(maptask.java:450)     @ org.apache.hadoop.mapred.maptask.run(maptask.java:343)     @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:163)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:415)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1628)     @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158) caused by: org.apache.hadoop.hive.ql.metadata.hiveexception: hive runtime error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\taa\t19805\taa\tn787aa\t1\tjfk\tnew york\t ny\tny\t36\tnew york\t22\tlax\tlos angeles\t ca\tca\t06\tcalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}     @ org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:550)     @ org.apache.hadoop.hive.ql.exec.mr.execmapper.map(execmapper.java:177)     ... 8 more caused by: org.apache.hadoop.hive.ql.metadata.hiveexception: unable execute method public java.lang.string org.apache.hadoop.hive.ql.udf.udfregexpextract.evaluate(java.lang.string,java.lang.string,java.lang.integer)  on object org.apache.hadoop.hive.ql.udf.udfregexpextract@4def4616 of class org.apache.hadoop.hive.ql.udf.udfregexpextract arguments {2015  1   1   1   4   2015-01-01  aa  19805   aa  n787aa  1   jfk new york     ny ny  36  new york    22  lax los angeles  ca ca  06  california  91  0900    0855    -5.00   0.00    0.00    -1  0900-0959   17.00   0912    1230    7.00    1230    1237    7.00    7.00    0.00    0   1200-1259   0.00        0.00    390.00  402.00  378.00  1.00    2475.00 10          :java.lang.string, ^(?:([^,]*),?){1}:java.lang.string, 12:java.lang.integer} of size 3     @ org.apache.hadoop.hive.ql.exec.functionregistry.invoke(functionregistry.java:1243)     @ org.apache.hadoop.hive.ql.udf.generic.genericudfbridge.evaluate(genericudfbridge.java:182)     @ org.apache.hadoop.hive.ql.exec.exprnodegenericfuncevaluator._evaluate(exprnodegenericfuncevaluator.java:166)     @ org.apache.hadoop.hive.ql.exec.exprnodeevaluator.evaluate(exprnodeevaluator.java:77)     @ org.apache.hadoop.hive.ql.exec.exprnodeevaluator.evaluate(exprnodeevaluator.java:65)     @ org.apache.hadoop.hive.ql.exec.selectoperator.processop(selectoperator.java:79)     @ org.apache.hadoop.hive.ql.exec.operator.forward(operator.java:793)     @ org.apache.hadoop.hive.ql.exec.tablescanoperator.processop(tablescanoperator.java:92)     @ org.apache.hadoop.hive.ql.exec.operator.forward(operator.java:793)     @ org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:540)     ... 9 more caused by: java.lang.reflect.invocationtargetexception     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:606)     @ org.apache.hadoop.hive.ql.exec.functionregistry.invoke(functionregistry.java:1219)     ... 18 more caused by: java.lang.indexoutofboundsexception: no group 12     @ java.util.regex.matcher.group(matcher.java:487)     @ org.apache.hadoop.hive.ql.udf.udfregexpextract.evaluate(udfregexpextract.java:56)     ... 23 more 

please me extract different columns hive table.

try this:

select split(col_value,' ')[11] column_12 test_aviation; 

assuming have space delimiters.

'\\t' if tab  '\\|' pipe...  ':'  

and on


Comments

Popular posts from this blog

php - Zend Framework / Skeleton-Application / Composer install issue -

c# - Better 64-bit byte array hash -

python - PyCharm Type error Message -