Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBase client should use a single Connection object #322

Closed
busbey opened this issue Jun 25, 2015 · 6 comments
Closed

HBase client should use a single Connection object #322

busbey opened this issue Jun 25, 2015 · 6 comments

Comments

@busbey
Copy link
Collaborator

busbey commented Jun 25, 2015

per the javadocs for HBase Connection, we should be making and sharing a single Connection for the application and then creating Table instance per DB instance. As is, we create a ZK connection per thread and tons of threads.

@bijugs
Copy link
Contributor

bijugs commented Jan 23, 2016

In the current core/Client.java code, _db.cleanup() is called on each thread which prevents from having a single DBs connection object across multiple threads. This prevents DB like hbase not being able to use a single connection object across threads. Using single connection object is a good practice and also aligns with how applications are written. This may be applicable to other databases as well.

If we move out the logic to call the cleanup() method after all the thread execution in Client.java here, then DBs like hbase can work with a single connection object.

Note that the quick hack referred in the write-up is to make the conversation easier.

@bijugs
Copy link
Contributor

bijugs commented Jan 24, 2016

@busbey . Regarding your query about stats from using single hbase connection and multi-connections in PR #591 , here are some from hbase10 load and run with 10 threads and 1 million recordcount/operation count. The test was run on a 4 RS VM cluster with hbase 0.98 on it.

hbase10 load Multi -Conn Single-Conn
[OVERALL] RunTime(ms) 935029 915693
[OVERALL] Throughput(ops/sec) 1069.485545 1092.069067
[CLEANUP] Operations 20 20
[CLEANUP] AverageLatency(us) 93268.8 19950.7
[CLEANUP] MinLatency(us) 5 5296
[CLEANUP] MaxLatency(us) 359679 290815
[CLEANUP] 95thPercentileLatency(us) 218367 6911
[CLEANUP] 99thPercentileLatency(us) 359679 290815
[INSERT] Operations 1000000 1000000
[INSERT] AverageLatency(us) 9294.284818 9103.472805
[INSERT] MinLatency(us) 2256 2154
[INSERT] MaxLatency(us) 487423 796159
[INSERT] 95thPercentileLatency(us) 14031 13751
[INSERT] 99thPercentileLatency(us) 21791 20639
[INSERT] Return=OK 1000000 1000000
hbase10 run Multi -Conn Single-Conn
[OVERALL] RunTime(ms) 615661 610723
[OVERALL] Throughput(ops/sec) 1624.2705 1637.403536
[CLEANUP] Operations 20 20
[CLEANUP] AverageLatency(us) 86529.55 19059
[CLEANUP] MinLatency(us) 6 4844
[CLEANUP] MaxLatency(us) 315647 282111
[CLEANUP] 95thPercentileLatency(us) 179071 8839
[CLEANUP] 99thPercentileLatency(us) 315647 282111
[READ] Operations 499167 500393
[READ] AverageLatency(us) 3239.730739 3367.402448
[READ] MinLatency(us) 844 841
[READ] MaxLatency(us) 406271 389119
[READ] 95thPercentileLatency(us) 5367 5635
[READ] 99thPercentileLatency(us) 9671 8951
[READ] Return=OK 499167 500393
[UPDATE] Operations 500833 499607
[UPDATE] AverageLatency(us) 8941.107647 8746.065097
[UPDATE] MinLatency(us) 2306 2352
[UPDATE] MaxLatency(us) 467455 381951
[UPDATE] 95thPercentileLatency(us) 14431 13871
[UPDATE] 99thPercentileLatency(us) 20271 18895
[UPDATE] Return=OK 500833 499607

@cmatser
Copy link
Collaborator

cmatser commented Jan 25, 2016

I believe what you are asking for can be done without updating core code. You can look at the Cassandra binding as an example as it is using a single database connection instance. It keeps track of the number of init() calls so that in the cleanup(), it closes the connection on the last thread.

While I'm not opposed to updating core code in general and I do find you solution elegant, I like the current method as it allows the DB binding implementer to choose the most appropriate style for their particular database. My 2 pennies.

@bijugs
Copy link
Contributor

bijugs commented Jan 26, 2016

Thanks @cmatser for your comments. The thought was that if all the DBs/datastores supported will benefit from the change in the core, it will help by not including the logic individually to use single connection. Since it may not be the case, will include the logic in hbase module as it is in cassandra.

@bijugs
Copy link
Contributor

bijugs commented Jan 27, 2016

@busbey I have updated PR #591 with changes to use single HBase connection in [hbase098] and [hbase10].

@busbey
Copy link
Collaborator Author

busbey commented Jan 30, 2016

Fixed in #591

@busbey busbey closed this as completed Jan 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants