Gather and maintain stats for HBase tables in a designated HBase table #64

jtaylor-sfdc · 2013-02-22T01:50:39Z

Our current stats gathering is way too simplistic - it's only keeping a cache per client connection to a cluster for the min and max key for a table. Instead, we should:

have a system table that stores the stats
create a coprocessor that updates the stats during compaction (i.e. using the preCompactSelection, postCompactSelection, preCompact, postCompact methods)
keep a kind of histogram - the key boundary of every N bytes within a region. Perhaps we can do a delta update on minor compaction and a complete update on major compaction.
keep the min key/max key of a table in the stats table too

tonyhuang · 2013-02-22T19:03:38Z

Hi Jesse, when you finish an rc for this ticket, could you inform me?

Thanks
Tony

testn · 2013-03-13T05:33:47Z

Do you think we can optimize the query better if we have the cardinality information in the table? If so, hyperloglog might be a good choice.

jtaylor-sfdc · 2013-03-13T05:55:53Z

Wow, that HyperLogLog is pretty interesting - thanks for the pointer. For stats, we're calculating it at major compression where a full pass is made through the data anyway, so I don't think it'll help there. But for COUNT DISTINCT and SELECT DISTINCT, it could definitely be useful.

testn · 2013-03-13T06:06:44Z

It will only give out the cardinality but not the unique value itself. I'm thinking whether we can implement the combination of HyperLogLog and BloomFilter at the column value itself to determine the strategy to aggregate the data. If so, that would be awesome.

ghost assigned jyates Feb 22, 2013

jtaylor-sfdc mentioned this issue Feb 22, 2013

Use stats to guide query parallelization #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gather and maintain stats for HBase tables in a designated HBase table #64

Gather and maintain stats for HBase tables in a designated HBase table #64

jtaylor-sfdc commented Feb 22, 2013

tonyhuang commented Feb 22, 2013

testn commented Mar 13, 2013

jtaylor-sfdc commented Mar 13, 2013

testn commented Mar 13, 2013

Gather and maintain stats for HBase tables in a designated HBase table #64

Gather and maintain stats for HBase tables in a designated HBase table #64

Comments

jtaylor-sfdc commented Feb 22, 2013

tonyhuang commented Feb 22, 2013

testn commented Mar 13, 2013

jtaylor-sfdc commented Mar 13, 2013

testn commented Mar 13, 2013