hadoop - Extract individual column from a HIVE table -
below select query hive table:
select * test_aviation limit 5; ok 2015 1 1 1 4 2015-01-01 aa 19805 aa n787aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10 2015 1 1 2 5 2015-01-02 aa 19805 aa n795aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0850 -10.00 0.00 0.00 -1 0900-0959 15.00 0905 1202 9.00 1230 1211 -19.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 381.00 357.00 1.00 2475.00 10 2015 1 1 3 6 2015-01-03 aa 19805 aa n788aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 15.00 0908 1138 13.00 1230 1151 -39.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 358.00 330.00 1.00 2475.00 10 2015 1 1 4 7 2015-01-04 aa 19805 aa n791aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 14.00 0907 1159 19.00 1230 1218 -12.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 385.00 352.00 1.00 2475.00 10 2015 1 1 5 1 2015-01-05 aa 19805 aa n783aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 27.00 0920 1158 24.00 1230 1222 -8.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 389.00 338.00 1.00 2475.00 10 time taken: 0.067 seconds, fetched: 5 row(s)
structure of hive table
hive> describe test_aviation; ok col_value string time taken: 0.221 seconds, fetched: 1 row(s)
i want segregate entire table in different columns.i have written query below extract 12th column:
select regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 12) test_aviation;
output:
total jobs = 1 launching job 1 out of 1 number of reduce tasks set 0 since there's no reduce operator starting job = job_1437067221195_0008, tracking url = http://localhost:8088/proxy/application_1437067221195_0008/ kill command = /usr/local/hadoop/bin/hadoop job -kill job_1437067221195_0008 hadoop job information stage-1: number of mappers: 1; number of reducers: 0 2015-07-17 02:46:56,215 stage-1 map = 0%, reduce = 0% 2015-07-17 02:47:27,650 stage-1 map = 100%, reduce = 0% ended job = job_1437067221195_0008 errors error during job, obtaining debugging information... job tracking url: http://localhost:8088/proxy/application_1437067221195_0008/ examining task id: task_1437067221195_0008_m_000000 (and more) job job_1437067221195_0008 task failures(4): ----- task id: task_1437067221195_0008_m_000000 url: http://localhost:8088/taskdetails.jsp?jobid=job_1437067221195_0008&tipid=task_1437067221195_0008_m_000000 ----- diagnostic messages task: error: java.lang.runtimeexception: org.apache.hadoop.hive.ql.metadata.hiveexception: hive runtime error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\taa\t19805\taa\tn787aa\t1\tjfk\tnew york\t ny\tny\t36\tnew york\t22\tlax\tlos angeles\t ca\tca\t06\tcalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"} @ org.apache.hadoop.hive.ql.exec.mr.execmapper.map(execmapper.java:195) @ org.apache.hadoop.mapred.maprunner.run(maprunner.java:54) @ org.apache.hadoop.mapred.maptask.runoldmapper(maptask.java:450) @ org.apache.hadoop.mapred.maptask.run(maptask.java:343) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:163) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1628) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158) caused by: org.apache.hadoop.hive.ql.metadata.hiveexception: hive runtime error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\taa\t19805\taa\tn787aa\t1\tjfk\tnew york\t ny\tny\t36\tnew york\t22\tlax\tlos angeles\t ca\tca\t06\tcalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"} @ org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:550) @ org.apache.hadoop.hive.ql.exec.mr.execmapper.map(execmapper.java:177) ... 8 more caused by: org.apache.hadoop.hive.ql.metadata.hiveexception: unable execute method public java.lang.string org.apache.hadoop.hive.ql.udf.udfregexpextract.evaluate(java.lang.string,java.lang.string,java.lang.integer) on object org.apache.hadoop.hive.ql.udf.udfregexpextract@4def4616 of class org.apache.hadoop.hive.ql.udf.udfregexpextract arguments {2015 1 1 1 4 2015-01-01 aa 19805 aa n787aa 1 jfk new york ny ny 36 new york 22 lax los angeles ca ca 06 california 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10 :java.lang.string, ^(?:([^,]*),?){1}:java.lang.string, 12:java.lang.integer} of size 3 @ org.apache.hadoop.hive.ql.exec.functionregistry.invoke(functionregistry.java:1243) @ org.apache.hadoop.hive.ql.udf.generic.genericudfbridge.evaluate(genericudfbridge.java:182) @ org.apache.hadoop.hive.ql.exec.exprnodegenericfuncevaluator._evaluate(exprnodegenericfuncevaluator.java:166) @ org.apache.hadoop.hive.ql.exec.exprnodeevaluator.evaluate(exprnodeevaluator.java:77) @ org.apache.hadoop.hive.ql.exec.exprnodeevaluator.evaluate(exprnodeevaluator.java:65) @ org.apache.hadoop.hive.ql.exec.selectoperator.processop(selectoperator.java:79) @ org.apache.hadoop.hive.ql.exec.operator.forward(operator.java:793) @ org.apache.hadoop.hive.ql.exec.tablescanoperator.processop(tablescanoperator.java:92) @ org.apache.hadoop.hive.ql.exec.operator.forward(operator.java:793) @ org.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:540) ... 9 more caused by: java.lang.reflect.invocationtargetexception @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ org.apache.hadoop.hive.ql.exec.functionregistry.invoke(functionregistry.java:1219) ... 18 more caused by: java.lang.indexoutofboundsexception: no group 12 @ java.util.regex.matcher.group(matcher.java:487) @ org.apache.hadoop.hive.ql.udf.udfregexpextract.evaluate(udfregexpextract.java:56) ... 23 more
please me extract different columns hive table.
try this:
select split(col_value,' ')[11] column_12 test_aviation;
assuming have space delimiters.
'\\t' if tab '\\|' pipe... ':'
and on
Comments
Post a Comment