Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added high-content benchmarks and fixed starvation bugs #5257

Merged
merged 1 commit into from
Mar 17, 2016

Conversation

spencerkimball
Copy link
Member

The bank benchmark was never doing anything before now as there
was no money in any of the accounts, so nothing would transfer.

Added high-contention workloads to the benchmarks and in the
process discovered starvation scenarios and an outright bug.

The starvation scenario was encountered when a batch containing
writes and an end transaction failed due to a serialized txn
having Timestamp != OrigTimestamp. In that event, we discard
the entire batch engine, meaning the intents for the writes are
discarded with everything else.

This change commits the results of a batch which fails with
EndTransaction.

There was also a bug when restarting a transaction. The replay
of the BeginTransaction statement was incorrectly using the
value of the transaction record instead of the transaction
passed in with the batch request. In the event that the txn
was pushed or aborted during the restart, this would cause
various problems when commit time comes.

The 2, 4, and 8 account benchmarks are pretty slow, but
optimizing those is the next step.


This change is Reviewable

@petermattis
Copy link
Collaborator

@tschottdorf should look at the storage changes. The change to the bank benchmark looks fine modulo the account selection.


Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions.


sql/bank_test.go, line 65 [r1] (raw file):
I usually do this on the command line via the -cpu flag. I think specifying it like this prohibits using the -cpu flag to change the setting.


sql/bank_test.go, line 69 [r1] (raw file):
Why did you change this? The code was previously selecting to random accounts, guaranteeing that they were different. Now it is possible for to == from after randomly selecting to.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Mar 15, 2016

Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


sql/bank_test.go, line 89 [r1] (raw file):
can you make this an argument passed to the function?


storage/replica.go, line 1514 [r1] (raw file):
cc #3037


storage/replica_command.go, line 325 [r1] (raw file):
this comment is inaccurate now


storage/replica_test.go, line 2859 [r1] (raw file):
nit: print the detail here rather than pErr


storage/replica_test.go, line 2881 [r1] (raw file):
ditto


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Mar 15, 2016

Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions.


sql/bank_test.go, line 28 [r1] (raw file):
Time to nuke this flag as well? Seems like a wrinkle we don't really use - should pick the amount randomly based on the average available per account.


storage/replica.go, line 1510 [r1] (raw file):
This needs a test.


storage/replica_command.go, line 325 [r1] (raw file):
Since this is what fixed the previous behavior, should also add a comment as to why we don't allow the write.
Actually, the right thing to do should be updating our txn from the persisted proto, and then writing the new entry.


storage/replica_test.go, line 2826 [r1] (raw file):
s/.$/, preventing regression of a bug in which which blindly overwrote the transaction record on BeginTransaction./


storage/replica_test.go, line 2859 [r1] (raw file):
The detail doesn't actually contain the whole message, you want pErr.


Comments from the review on Reviewable.io

@spencerkimball
Copy link
Member Author

Review status: all files reviewed at latest revision, 8 unresolved discussions.


sql/bank_test.go, line 65 [r1] (raw file):
If the parallelism is set to 1, for example, nothing makes sense in terms of the benchmark. You'd get zero contention. It's crucial we have a set level of parallelism here. Should I just assume it will be run with -cpu 4?


sql/bank_test.go, line 69 [r1] (raw file):
The previous code was biased towards selecting *numAccounts - 1, which is not correct if trying to select a uniformly random account. It's not possible to select to == from with this code. The for loop condition prevents it.


sql/bank_test.go, line 89 [r1] (raw file):
OK


storage/replica.go, line 1510 [r1] (raw file):
Added


storage/replica_command.go, line 325 [r1] (raw file):
Fixed comment and updated request txn from extant txn record and am now rewriting it.


Comments from the review on Reviewable.io

@spencerkimball spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from 0d26695 to 72bc578 Compare March 15, 2016 16:01
@tamird
Copy link
Contributor

tamird commented Mar 15, 2016

Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


sql/bank_test.go, line 65 [r1] (raw file):
I believe the default is already the number of CPUs.


storage/replica_test.go, line 1668 [r2] (raw file):
looks like this happens on a not write intent error?


Comments from the review on Reviewable.io

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


sql/bank_test.go, line 65 [r1] (raw file):
-cpu defaults to the runtime.NumCPU which seems fine. I think it is useful to be able to set -cpu 1 on the command line to see how fast the benchmark runs without contention.


sql/bank_test.go, line 69 [r1] (raw file):
It is subtle, but I think the probabilities work out for the previous code without the for-loop. You'll be setting to to numAccounts - 1 with a probability of ((numAccounts - 1) / numAccounts) * (1 / (numAccounts - 1)) = 1 / numAccounts.


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Mar 15, 2016

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


storage/replica_command.go, line 325 [r1] (raw file):
We shouldn't change the args, and we should pass the new txn out of BeginTxn:

h.Txn = h.Txn.Clone()
h.Txn.Update(&txn)
reply.Txn = h.Txn // rest of the batch should use the new txn

storage/replica_test.go, line 1666 [r2] (raw file):
How do you guarantee that this doesn't just push the intent?


storage/replica_test.go, line 2871 [r2] (raw file):
should also test what I suggested in the comment above: the retrying batch uses the updated txn returned by BeginTransaction.


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Mar 16, 2016

Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


storage/replica_command.go, line 325 [r2] (raw file):
s/, updated/ updated/


Comments from the review on Reviewable.io

@spencerkimball
Copy link
Member Author

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


sql/bank_test.go, line 69 [r1] (raw file):
Ah, I didn't pay attention to the *numAccounts - 1 part there.


storage/replica_test.go, line 1666 [r2] (raw file):
Good point, Changed sender to the range itself instead of using the store. The range won't try any sort of push txn or resolve intent.


storage/replica_test.go, line 1668 [r2] (raw file):
Fixed error message


storage/replica_test.go, line 2871 [r2] (raw file):
OK, think I understood what you're asking for and added a clause to the test.


Comments from the review on Reviewable.io

@spencerkimball spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from 72bc578 to e6d0a16 Compare March 16, 2016 17:20
@tbg
Copy link
Member

tbg commented Mar 16, 2016

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 9 unresolved discussions.


storage/replica_test.go, line 2929 [r3] (raw file):
https://xkcd.com/859/


storage/replica_test.go, line 2930 [r3] (raw file):
a missing !?


Comments from the review on Reviewable.io

@tbg
Copy link
Member

tbg commented Mar 16, 2016

:lgtm:


Review status: all files reviewed at latest revision, 9 unresolved discussions.


Comments from the review on Reviewable.io

@tamird
Copy link
Contributor

tamird commented Mar 16, 2016

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


Comments from the review on Reviewable.io

@spencerkimball
Copy link
Member Author

Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


storage/replica_test.go, line 2930 [r3] (raw file):
Fixed


Comments from the review on Reviewable.io

The bank benchmark was never doing anything before now as there
was no money in any of the accounts, so nothing would transfer.

Added high-contention workloads to the benchmarks and in the
process discovered starvation scenarios and an outright bug.

The starvation scenario was encountered when a batch containing
writes and an end transaction failed due to a serialized txn
having `Timestamp` != `OrigTimestamp`. In that event, we discard
the entire batch engine, meaning the intents for the writes are
discarded with everything else.

This change commits the results of a batch which fails with
EndTransaction.

There was also a bug when restarting a transaction. The replay
of the `BeginTransaction` statement was incorrectly using the
value of the transaction record instead of the transaction
passed in with the batch request. In the event that the txn
was pushed or aborted during the restart, this would cause
various problems when commit time comes.

The 2, 4, and 8 account benchmarks are pretty slow, but
optimizing those is the next step.
@spencerkimball spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from e6d0a16 to 21701f3 Compare March 17, 2016 04:12
spencerkimball added a commit that referenced this pull request Mar 17, 2016
…ion-benchmark

Added high-content benchmarks and fixed starvation bugs
@spencerkimball spencerkimball merged commit aedb0e9 into master Mar 17, 2016
@spencerkimball spencerkimball deleted the spencerkimball/high-contention-benchmark branch March 17, 2016 04:12
@tamird
Copy link
Contributor

tamird commented Mar 17, 2016

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


Comments from the review on Reviewable.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants