HBase performance with large number of scans -
i have table hundred of million records. table contains data servers , events genereated on them. following row key of table:
rowkey = md5(serverid) + timestamp [32 hex characters + 10 digits = 42 characters]
one of use case list events time t1 t2. this, normal scan taking time. speed things, have done following:
- fetch list of unique serverid table (real fast).
- divide above list in 256 buckets based on first 2 hex characters of md5 of serverids.
- for each bucket, call co-processor (parallel requests) list of serverid, start time , end time.
the co-processor scan table follow:
for (string serverid : serverids) { byte[] startkey = generatekeyserverid, starttime); byte[] endkey = generatekey(serverid, endtime); scan scan = new scan(startkey, endkey); internalscanner scanner = env.getregion().getscanner(scan); .... }
i able result quick fast approach. concern large number of scans. if table has 20,000 serverids above code making 20,000 scans. impact overall performance , scalability of hbase?
try using timestamp filter. following syntax test in hbase shell import java.util.arraylist import org.apache.hadoop.hbase.filter.timestampsfilter list=arraylist.new() list.add(1444398443674) //start timestamp list.add(1444457737937) //end timestamp scan 'eventlogtable', {filter=>timestampsfilter.new(list)}
same api exits in java , other languages too.
Comments
Post a Comment