Added high-content benchmarks and fixed starvation bugs #5257

spencerkimball · 2016-03-15T05:35:18Z

The bank benchmark was never doing anything before now as there
was no money in any of the accounts, so nothing would transfer.

Added high-contention workloads to the benchmarks and in the
process discovered starvation scenarios and an outright bug.

The starvation scenario was encountered when a batch containing
writes and an end transaction failed due to a serialized txn
having Timestamp != OrigTimestamp. In that event, we discard
the entire batch engine, meaning the intents for the writes are
discarded with everything else.

This change commits the results of a batch which fails with
EndTransaction.

There was also a bug when restarting a transaction. The replay
of the BeginTransaction statement was incorrectly using the
value of the transaction record instead of the transaction
passed in with the batch request. In the event that the txn
was pushed or aborted during the restart, this would cause
various problems when commit time comes.

The 2, 4, and 8 account benchmarks are pretty slow, but
optimizing those is the next step.

This change is

petermattis · 2016-03-15T13:02:08Z

@tschottdorf should look at the storage changes. The change to the bank benchmark looks fine modulo the account selection.

Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions.

sql/bank_test.go, line 65 [r1] (raw file):
I usually do this on the command line via the -cpu flag. I think specifying it like this prohibits using the -cpu flag to change the setting.

sql/bank_test.go, line 69 [r1] (raw file):
Why did you change this? The code was previously selecting to random accounts, guaranteeing that they were different. Now it is possible for to == from after randomly selecting to.

Comments from the review on Reviewable.io

tamird · 2016-03-15T13:57:24Z

Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 7 unresolved discussions.

sql/bank_test.go, line 89 [r1] (raw file):
can you make this an argument passed to the function?

storage/replica.go, line 1514 [r1] (raw file):
cc #3037

storage/replica_command.go, line 325 [r1] (raw file):
this comment is inaccurate now

storage/replica_test.go, line 2859 [r1] (raw file):
nit: print the detail here rather than pErr

storage/replica_test.go, line 2881 [r1] (raw file):
ditto

Comments from the review on Reviewable.io

tbg · 2016-03-15T15:20:21Z

Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions.

sql/bank_test.go, line 28 [r1] (raw file):
Time to nuke this flag as well? Seems like a wrinkle we don't really use - should pick the amount randomly based on the average available per account.

storage/replica.go, line 1510 [r1] (raw file):
This needs a test.

storage/replica_command.go, line 325 [r1] (raw file):
Since this is what fixed the previous behavior, should also add a comment as to why we don't allow the write.
Actually, the right thing to do should be updating our txn from the persisted proto, and then writing the new entry.

storage/replica_test.go, line 2826 [r1] (raw file):
s/.$/, preventing regression of a bug in which which blindly overwrote the transaction record on BeginTransaction./

storage/replica_test.go, line 2859 [r1] (raw file):
The detail doesn't actually contain the whole message, you want pErr.

Comments from the review on Reviewable.io

spencerkimball · 2016-03-15T15:58:46Z

Review status: all files reviewed at latest revision, 8 unresolved discussions.

sql/bank_test.go, line 65 [r1] (raw file):
If the parallelism is set to 1, for example, nothing makes sense in terms of the benchmark. You'd get zero contention. It's crucial we have a set level of parallelism here. Should I just assume it will be run with -cpu 4?

sql/bank_test.go, line 69 [r1] (raw file):
The previous code was biased towards selecting *numAccounts - 1, which is not correct if trying to select a uniformly random account. It's not possible to select to == from with this code. The for loop condition prevents it.

sql/bank_test.go, line 89 [r1] (raw file):
OK

storage/replica.go, line 1510 [r1] (raw file):
Added

storage/replica_command.go, line 325 [r1] (raw file):
Fixed comment and updated request txn from extant txn record and am now rewriting it.

Comments from the review on Reviewable.io

tamird · 2016-03-15T16:09:19Z

Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions.

sql/bank_test.go, line 65 [r1] (raw file):
I believe the default is already the number of CPUs.

storage/replica_test.go, line 1668 [r2] (raw file):
looks like this happens on a not write intent error?

Comments from the review on Reviewable.io

petermattis · 2016-03-15T16:16:36Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

sql/bank_test.go, line 65 [r1] (raw file):
-cpu defaults to the runtime.NumCPU which seems fine. I think it is useful to be able to set -cpu 1 on the command line to see how fast the benchmark runs without contention.

sql/bank_test.go, line 69 [r1] (raw file):
It is subtle, but I think the probabilities work out for the previous code without the for-loop. You'll be setting to to numAccounts - 1 with a probability of ((numAccounts - 1) / numAccounts) * (1 / (numAccounts - 1)) = 1 / numAccounts.

Comments from the review on Reviewable.io

tbg · 2016-03-15T18:48:35Z

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.

storage/replica_command.go, line 325 [r1] (raw file):
We shouldn't change the args, and we should pass the new txn out of BeginTxn:

h.Txn = h.Txn.Clone()
h.Txn.Update(&txn)
reply.Txn = h.Txn // rest of the batch should use the new txn

storage/replica_test.go, line 1666 [r2] (raw file):
How do you guarantee that this doesn't just push the intent?

storage/replica_test.go, line 2871 [r2] (raw file):
should also test what I suggested in the comment above: the retrying batch uses the updated txn returned by BeginTransaction.

Comments from the review on Reviewable.io

tbg · 2016-03-16T03:49:31Z

Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

storage/replica_command.go, line 325 [r2] (raw file):
s/, updated/ updated/

Comments from the review on Reviewable.io

spencerkimball · 2016-03-16T17:14:56Z

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.

sql/bank_test.go, line 69 [r1] (raw file):
Ah, I didn't pay attention to the *numAccounts - 1 part there.

storage/replica_test.go, line 1666 [r2] (raw file):
Good point, Changed sender to the range itself instead of using the store. The range won't try any sort of push txn or resolve intent.

storage/replica_test.go, line 1668 [r2] (raw file):
Fixed error message

storage/replica_test.go, line 2871 [r2] (raw file):
OK, think I understood what you're asking for and added a clause to the test.

Comments from the review on Reviewable.io

tbg · 2016-03-16T17:45:10Z

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 9 unresolved discussions.

storage/replica_test.go, line 2929 [r3] (raw file):
https://xkcd.com/859/

storage/replica_test.go, line 2930 [r3] (raw file):
a missing !?

Comments from the review on Reviewable.io

tbg · 2016-03-16T17:45:24Z

Review status: all files reviewed at latest revision, 9 unresolved discussions.

Comments from the review on Reviewable.io

tamird · 2016-03-16T17:53:49Z

Reviewed 3 of 3 files at r3.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.

Comments from the review on Reviewable.io

spencerkimball · 2016-03-17T04:12:22Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.

storage/replica_test.go, line 2930 [r3] (raw file):
Fixed

Comments from the review on Reviewable.io

The bank benchmark was never doing anything before now as there was no money in any of the accounts, so nothing would transfer. Added high-contention workloads to the benchmarks and in the process discovered starvation scenarios and an outright bug. The starvation scenario was encountered when a batch containing writes and an end transaction failed due to a serialized txn having `Timestamp` != `OrigTimestamp`. In that event, we discard the entire batch engine, meaning the intents for the writes are discarded with everything else. This change commits the results of a batch which fails with EndTransaction. There was also a bug when restarting a transaction. The replay of the `BeginTransaction` statement was incorrectly using the value of the transaction record instead of the transaction passed in with the batch request. In the event that the txn was pushed or aborted during the restart, this would cause various problems when commit time comes. The 2, 4, and 8 account benchmarks are pretty slow, but optimizing those is the next step.

…ion-benchmark Added high-content benchmarks and fixed starvation bugs

tamird · 2016-03-17T13:55:19Z

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 7 unresolved discussions.

Comments from the review on Reviewable.io

spencerkimball assigned tbg Mar 15, 2016

spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from 0d26695 to 72bc578 Compare March 15, 2016 16:01

tbg mentioned this pull request Mar 16, 2016

kv: must handle idempotence for BeginTransaction #4443

Closed

spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from 72bc578 to e6d0a16 Compare March 16, 2016 17:20

spencerkimball force-pushed the spencerkimball/high-contention-benchmark branch from e6d0a16 to 21701f3 Compare March 17, 2016 04:12

spencerkimball added a commit that referenced this pull request Mar 17, 2016

Merge pull request #5257 from cockroachdb/spencerkimball/high-content…

aedb0e9

…ion-benchmark Added high-content benchmarks and fixed starvation bugs

spencerkimball merged commit aedb0e9 into master Mar 17, 2016

spencerkimball deleted the spencerkimball/high-contention-benchmark branch March 17, 2016 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added high-content benchmarks and fixed starvation bugs #5257

Added high-content benchmarks and fixed starvation bugs #5257

spencerkimball commented Mar 15, 2016

petermattis commented Mar 15, 2016

tamird commented Mar 15, 2016

tbg commented Mar 15, 2016

spencerkimball commented Mar 15, 2016

tamird commented Mar 15, 2016

petermattis commented Mar 15, 2016

tbg commented Mar 15, 2016

tbg commented Mar 16, 2016

spencerkimball commented Mar 16, 2016

tbg commented Mar 16, 2016

tbg commented Mar 16, 2016

tamird commented Mar 16, 2016

spencerkimball commented Mar 17, 2016

tamird commented Mar 17, 2016

Added high-content benchmarks and fixed starvation bugs #5257

Added high-content benchmarks and fixed starvation bugs #5257

Conversation

spencerkimball commented Mar 15, 2016

petermattis commented Mar 15, 2016

tamird commented Mar 15, 2016

tbg commented Mar 15, 2016

spencerkimball commented Mar 15, 2016

tamird commented Mar 15, 2016

petermattis commented Mar 15, 2016

tbg commented Mar 15, 2016

tbg commented Mar 16, 2016

spencerkimball commented Mar 16, 2016

tbg commented Mar 16, 2016

tbg commented Mar 16, 2016

tamird commented Mar 16, 2016

spencerkimball commented Mar 17, 2016

tamird commented Mar 17, 2016