hadoop - importing compressed (gzip) data from s3 to hive -
i have bunch of .gzip files in s3://mybucket/file/*.gzip.
i loading table using:
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1000; set hive.enforce.bucketing = true; set hive.exec.compress.output=true; set io.seqfile.compression.type=block; set mapred.output.compression.codec=org.apache.hadoop.io.compress.gzipcodec; create external table db.tablename(col1 dataype,col1 dataype,col1 dataype,col1 dataype) partitioned (col datatype) clustered (col2) sorted (col1,col2) 200 buckets row format delimited fields terminated '\t' lines terminated '\n' location 's3://mybucket/file';
it creates table doesn't load data s3 hive/hdfs.
any appreciated?
thanks sanjeev
i think files present in s3://mybucket/file/ not organized in correct directory structure hive partitions. suggest create external table without partitions , buckets on s3://mybucket/file/ , write hive query read data table , write partitioned/bucketed table.
Comments
Post a Comment