-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cassandra] Update CassandraCQLClient to use PreparedStatement for better performance #1051
Conversation
@robertpang Could you please review #930 ? |
@Cricket007 Reviewed #930 with feedback. |
A gentle reminder. May I ask for a review? Thanks. |
I'd say your PR is more focused than #930. You only concern yourself with the core workload value for Because of this, I still don't think your codes are very different from #930. If you only focus on this conditional. if (readallfields || fields == null) {
ps = statement;
} In #930, there is no re-creation of any You implement update, which I’m fine with implementing as well. You mention differences with INSERT for TTL and non-key fields to null, which I’m not even sure YCSB does that. At least when all the fields are used by default.
The same amount of changes are required, whether if done in init, or the operation body, no? You mention concerns about duplicative QueryBuilder calls. Taking your delete method as an example, you are re-building that query for every call to that method. Matter of fact, I see you kept the Additionally, I think |
Thanks for your review. I think there is a misunderstanding. My PR does not assume any specific workload. That is the reason why it does not depend on any of the core workload properties. On the other hand, your PR #930 depends heavily on those properties and pre-prepares the statements in init() specifically for core workload. For other 3rd-party workloads, say one that updates 2 fields in one UPDATE statement (possible by writing their own Java class instead of using CoreWorkload.java), PR #930 will fall back to preparing the statement on-the-fly. So your PR is essentially special-casing for core workloads.
I am unsure about that by comparing the amount of code changes in #930 and #1051.
I gave your approach of pre-preparing the statements in the init() call some consideration. In order to avoid special-casing for core workload like your PR #930 does, it will require pre-preparing all possible combinations of fields in SELECT, SCAN and UPDATE which is not a good idea. I am unsure if special-casing is right or how much additional performance gain is possible by preparing in init() vs. on-demand.
Actually the existing YCSB tests for database bindings should cover that sufficiently because the prepared statements will be exercised identically by the existing tests. |
I updated my PR to avoid rebuilding query in each call to look up the prepared statement. I will resist pre-preparing the statements init() call just for the core workload but leaving other workloads behind. In typical perf tests, the workloads will be run sufficiently long that will amortize the statement preparation overhead. Also, there is already an existing feature request for warm-up step and PR #423 that implements it. I think a warm-up step is a better solution and the right thing to do in general. |
I agree a warm up step makes sense. Without knowing what the reported YCSB time is truly supposed to track, having the operation metrics collecting the time for preparing a statement, even if just once, personally doesn't seem to be beneficial. Especially when there are a few extra cycles to build a query, then check if it exists. It's fair to say I'm not actively committed to maintaining #930 - and as stated in there, it's a subset of changes pulled out of #98 anyway, so I don't take full ownership of the changes applied or feel too sentimental about the logic for whether or not the queries are pre-prepared. It just doesn't make sense to me to use the QueryBuilder any more than necessary. The efforts on improving cassandra performance from #98 was two years ago, and #930 was stalled because I moved off a Cassandra project, but now this one exists... My point is that the idea is all the same. We just want PreparedStatements. I don't think it's a competition about whose PR is the most feature complete, or smallest diff, or even runs the fastest... I therefore don't know if it makes sense to have conflicting PR's. I was curious if you'd attempted to run #930 on your own Cassandra installation, and let us know your runtime observations. I'd say it's only fair since I've attempted to run this code on mine. Good news, though, after restarting my cluster and ensuring my run operation count is the same as the load count, I get no longer get failures for any of the workloads In any case, do you have the metrics before and after you've avoiding rebuilding the query? |
To get relatively comparable results, I reran YCSB back-to-back again for both cases for comparison: With "rebuilding of query string" each time: Avoiding "rebuilding of query string" each time: There is marginal improvement by avoiding rebuilding the query string but it could be within the margin of "noise". I will say the use of prepared statements produces the most noticeable improvement and the rest is much less significant. |
@Cricket007 I had a chance to run your changes and here are the results: Workload : Throughput(ops/sec) - [OpType]: AverageLatency(us) From what I see, the numbers are very similar. IMO, the benefits of pre-preparing the statements in init() specially for core workload is small but it marginalizes other non-core workloads. I think it is better for the database bindings to stay workload-agnostic for it to be useful generally. |
A gentle reminder again. May I ask for a review and approval to merge? Thank you. |
@busbey ^^ |
Another gentle reminder again. May I ask for a review and approval to merge? Thank you. @busbey ?? |
@Cricket007 and @robertpang should reviewers be checking this issue as well as #930? Or just one of them? I'll work through these. To set expectations my time is currently very restricted (on the order of 1 hour a week); it'd be faster if another maintainer could take a look. |
Thanks @busbey I think we've determined both branches work, just a different methodology about how PreparedStatements should be maintained. To sum up the differences, #930 accounts for the |
So we'll have to pick one or the other, correct? |
Unfortunately, or merge the non conflicting changes |
@busbey Thanks for your review. Yes, both PR #930 and #1051 work. I have used #1051 successfully to benchmark Apache Cassandra (results here). Apart from implementing UPDATE properly, my goal in #1051 is to keep the code change concise for easy review and merge. I avoided accounting for the |
Hi there, I can testify that we've been running this branch under heavy load multiple times on multiple platforms (GCE, AWS) and it's very stable. I hope it will get merge to master as soon as possible, especially since it passed all of the checks. Dor (@scylladb) |
Thanks, @dorlaor. Hi @busbey and other YCSB maintainers, just pinging again for the approval to merge. The use of PreparedStatement is crucial to us and others to get meaningful YCSB performance numbers against a Cassandra datastore. And as Dor @scylladb and us @yugabyte have used this branch to run YCSB benchmark against our own products successfully, we will be grateful if we can merge it into the master soon. Thank you. |
Hi @busbey and other YCSB maintainers, just pinging yet again for the approval to merge. This PR has been ready for merge for 4 months and we will appreciate if someone can review and give the approval soon. Thank you. |
+1 for the need and I can attest it works fine
…On Mon, Mar 26, 2018 at 12:27 PM, Robert Pang ***@***.***> wrote:
Hi @busbey <https://github.com/busbey> and other YCSB maintainers, just
pinging yet again for the approval to merge. This PR has been ready for
merge for 4 months and we will appreciate if someone can review and give
the approval soon. Thank you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABp6RdXdU5lvbI_8ItpO-ZRIy8-5zXCxks5tiUEOgaJpZM4P-h2z>
.
|
Hi @robertpang! Brief status update: this is in my review backlog. I get the level of improvement and I want to see the Cassandra binding improved as much as anyone else does. I appreciate the level of effort that's come in thus far on getting prepared statements for cassandra ready and reviewed. It looks like I started a review back in November before my dev time went to 0 for ~3 months for personal reasons. IIRC, when I last looked at things there were two different approaches to adding prepared statement support for the Cassandra bindings and the two authors didn't have consensus on which to use. That means I have to review both of them and then make a call on one, the other, or a merge of the two. ATM, my priorities for what time I have on the project are centered on picking up the stale release cadence. (#981 tracks the 0.13.0 release, which has been in RC status since Jul 2017. Our last stable release was December 2016.) Things that will help me review this faster:
Any of those free up time from me (and any other maintainer who comes around with some time to invest), which makes it more likely we have time to spend on getting this merged. |
If Scylla and others in the Cassandra space are throughly testing this PR, then I'll let this continue on here and close my branch. |
Thanks, @Cricket007. Appreciate it. @busbey and other YCSB maintainers, appreciate your time and attention on this. We can now move forward with one PR (this). I will be here for any questions regarding this PR. |
Hey guys, any progress? We just have folks who want to run YCSB and I push them elsewhere since there are no prepared statements in the master branch |
so far it looks like the first of my bullet points got done. Thanks for that everyone. 0.13.0 went out and #1117 covers the upcoming 0.14.0 release. Unfortunately, we still don't even have a solid enumeration of the testing needed there. How about this: I'll prioritize getting this merged into master prior to the 0.14.0 release moving forward so long as someone commits to the needed cassandra testing for 0.14.0 RCs. any takers? |
@busbey What do Cassandra tests involve and how frequent do you foresee them to be? |
@busbey are you referring to these tests: We tested this branch under non stop traffic at rate of multiple 100k ops (multiple ycsb processes and threads) and in parallel to several other databases what had their own load. |
when we do a release we include the tested/supported state of each datastore binding (the docs on "writing release notes" in our release manager guide should polly go into more detail). basically, they're all listed as "experimental and untested" unless a) someone tests running them against a live instance and documents specific versions and instructions followed (preferably the README from the binding) or b) someone has previously done testing against a live instance and nothing for the specific binding has changed since then. See, for example, the release notes for v0.12.0 and the sections "tested datastores" vs "untested datastores". Compare to comments on issue #873, where folks stated that they had tested the various things. How often this testing is needed depends on how often we put up RCs and how often the code changes. our last tested release was 0.12.0 in December 2016. my goal is to get us back on a regular cadence and hopefully it would be more frequent than once every 18 months. Cassandra hasn't been in the tested-for-release group since v0.8.0 in April 2016. I'd love to see that change, which is why I'd be willing to delay 0.14 further to make it happen. |
@busbey Thanks for explaining. As so, I will be glad to volunteer to do the testing for Apache Cassandra. I had run YCSB against Apache Cassandra on Google Cloud Platform a couple of times recently (see https://forum.yugabyte.com/t/ycsb-benchmark-results-for-yugabyte-and-apache-cassandra-again-with-p99-latencies/99) and will have no problem with that. After all, that is how I came to add support for prepared statements in the data binding and submit this PR for it to be included in YCSB. While on this, I would like to ask to include our @yugabyte DB as a tested datastore for YCSB also. We support Apache Cassandra client driver and therefore its existing data binding works identically for us also. We have done a couple of test runs above to confirm that and can do another run. Will that be ok? |
We test workloads of 1 billion keys (1TB-3TB) with this branch and
everything seems rock solid. Let us know what you wish
we should focus on.
*YCSB Commands - Scylla (x8 YCSB clients) - Population: #nohup
~/YCSB/bin/ycsb load cassandra-cql -p hosts=[IPs] -p recordcount=1000000000
-p insertstart=0 -p operationcount=1000000000 -p insertcount=125000000 -p
cassandra.writeconsistencylevel=QUORUM -s -P workloads/workloada -threads
50 > 1load_scylla_1TB &- Workload A (uniform): #nohup ~/YCSB/bin/ycsb run
cassandra-cql -p hosts=[IPs] -p recordcount=1000000000 -p insertstart=0 -p
operationcount=125000000 -p insertcount=125000000 -p
cassandra.writeconsistencylevel=QUORUM -p
cassandra.readconsistencylevel=QUORUM -p maxexecutiontime=5400 -s -P
workloads/workloada -threads 35 > 1scylla_1TB_workloada &Workload A
(zipfian): #nohup ~/YCSB/bin/ycsb run cassandra-cql -p hosts=[IPs] -p
recordcount=1000000000 -p insertstart=0 -p operationcount=125000000 -p
insertcount=125000000 -p cassandra.writeconsistencylevel=QUORUM -p
cassandra.readconsistencylevel=QUORUM -p maxexecutiontime=5400 -s -P
workloads/zipfian_workloada -threads 40 > 1scylla_1TB_zipfian_workloada &*
…On Fri, Apr 27, 2018 at 4:47 PM, Robert Pang ***@***.***> wrote:
@busbey <https://github.com/busbey> Thanks for explaining. As so, I will
be glad to volunteer to do the testing for Apache Cassandra. I had run YCSB
against Apache Cassandra on Google Cloud Platform a couple of times
recently (see https://forum.yugabyte.com/t/ycsb-benchmark-results-for-
yugabyte-and-apache-cassandra-again-with-p99-latencies/99) and will have
no problem with that. After all, that is how I came to add support for
prepared statements in the data binding and submit this PR for it to be
included in YCSB.
While on this, I would like to ask to include our @yugabyte
<https://github.com/YugaByte> DB as a tested datastore for YCSB also. We
support Apache Cassandra client driver and therefore its existing data
binding works identically for us also. Will that be ok?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1051 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABp6RQogz0di7Yv7Dta77jzp1dCfhfsRks5ts64UgaJpZM4P-h2z>
.
|
Yeah that sounds fine. IIRC we've had commercial vendor offerings listed before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Overall the changes look really good.
I've requested one minor change and there's a nice-to-have nit.
|
||
if (debug) { | ||
System.out.println(stmt.getQueryString()); | ||
System.out.println("key = " + key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Mind switching this to a logging framework so long as we're retooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated. Changed to use the logging framework.
@@ -195,6 +208,14 @@ public void init() throws DBException { | |||
|
|||
session = cluster.connect(keyspace); | |||
|
|||
readStmts = new ConcurrentHashMap<Set<String>, PreparedStatement>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these don't rely on any information that's available at init time, could we move it into a static initializer? It'll simplify reasoning about them being used safely.
Or even just do object allocation where they're declared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Allocated the objects in static initializers.
Any progress here? |
@busbey Sorry for the delay as I was occupied with a major release. The couple of suggestions look good and will take care of them right away. |
Hi @busbey, You can find the change with your comments fixed here: https://github.com/haaawk/YCSB/tree/i458 @robertpang please have a look. Best Regards, |
… to @haaawk for incoporating the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the quick turn around!
@@ -184,13 +209,13 @@ public void init() throws DBException { | |||
} | |||
|
|||
Metadata metadata = cluster.getMetadata(); | |||
System.err.printf("Connected to cluster: %s\n", | |||
logger.error("Connected to cluster: {}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit info level maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
metadata.getClusterName()); | ||
|
||
for (Host discoveredHost : metadata.getAllHosts()) { | ||
System.out.printf("Datacenter: %s; Host: %s; Rack: %s\n", | ||
logger.info("Datacenter: {}; Host: {}; Rack: {}\n", new Object[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need the array creation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, slf4j's info()
API has overloads for 1 and 2 arguments but not more. So the Object
array is needed. https://www.slf4j.org/api/org/slf4j/Logger.html#info(java.lang.String,%20java.lang.Object...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Object ... arguments
version is varargs; you can pass it as many parameters as you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no info(String format, Object...args) in the version of slf4j used in the project. That's why I had to use new Object[] {...}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! well that's easy enough to fix. #1149 updates us to current stable, rather than some version from 2012.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Will rebase and make the change after #1149 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and made the suggested change.
|
||
return Status.OK; | ||
} catch (Exception e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a logger call that passes the exception as a final unbound argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
logger.debug(stmt.getQueryString()); | ||
logger.debug("key = {}", key); | ||
|
||
session.execute(stmt.bind(key)); | ||
|
||
return Status.OK; | ||
} catch (Exception e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be replaced by passing the exception as a final argument to the logger call below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -287,7 +330,7 @@ public Status read(String table, String key, Set<String> fields, | |||
|
|||
} catch (Exception e) { | |||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be replaced by passing the exception as a final argument to the logger call below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -381,7 +434,7 @@ public Status scan(String table, String startkey, int recordcount, | |||
|
|||
} catch (Exception e) { | |||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be replaced by passing the exception as a final argument to the logger call below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…tter performance.
…ding building the query string unless the statement has not been prepared.
… to @haaawk for incoporating the changes.
Rebased master with upstream and fixed the |
Just waiting on the Travis build. Thanks for pushing on this! |
Addressing issue #458. The PreparedStatement is created and cached following the same approach as in JdbcDBClient. Here are the numbers before and after the use of PreparedStatement:
Apache Cassandra version: 3.11.1, replication_factor = 3, QUORUM consistency level
Server: 3-node Google Cloud Platform (GCP) n1-standard-16 machine, 16 VCPU's, 60GB memory, 2 x 375 GB direct attached SSD
YCSB parameter: recordcount=1000000, operationcount=10000000, maxexecutiontime=90
Before:
Workload : Throughput(ops/sec) - OpType & AverageLatency(us)
workload a : 64956.256567473 - [READ]: 4282.700504426243 [UPDATE]: 3346.0443676539962
workload b : 63345.63794747335 - [READ]: 3949.758514786955 [UPDATE]: 3189.5031351661996
workload c : 64365.03672249063 - [READ]: 3849.324261325788
workload d : 66134.2080457653 - [READ]: 3777.5643821626963 [INSERT]: 3162.619993576045
workload e : 7939.354223757893 - [INSERT]: 16607.25331648768 [SCAN]: 31967.764074558985
workload f : 42042.92384862897 - [READ]: 4218.613333769682 [READ-MODIFY-WRITE]: 7568.5240789843865 [UPDATE]: 3351.4569375155224
After:
Workload : Throughput(ops/sec) - OpType & AverageLatency(us)
workload a : 74710.34153952844 - [READ]: 3762.9415808058598 [UPDATE]: 2873.006871596396
workload b : 70607.19238937287 - [READ]: 3542.8850799849365 [UPDATE]: 2852.0433860133608
workload c : 72872.62719944527 - [READ]: 3399.9530996221742
workload d : 80464.90059049785 - [READ]: 3111.0480158195774 [INSERT]: 2492.7417853628635
workload e : 8376.560197625033 - [INSERT]: 15429.41884128906 [SCAN]: 30362.55064932235
workload f : 43846.903949293024 - [READ]: 4074.0924531425676 [READ-MODIFY-WRITE]: 7230.656645026845 [UPDATE]: 3152.440234449529
Also changed the update() method to use Cassandra UPDATE statement. While it may not matter to YCSB, there are subtle semantic and on-disk differences between INSERT and UPDATE statements so changing to use the statement type that matches the action taken.
Suggestions and comments are welcome.