range-local intent resolution #1916

tbg · 2015-08-03T19:48:55Z

this passes all tests except for one (which I've disabled; maybe
@bdarnell has an idea?). Basically carried out a lot of the work announced in
RFC #1873:

coordinator sends intents along with EndTransaction
range resolves all it possibly can in the EndTransaction batch, and hands
everything else off to an async resolver.
"skipped intents" passed up by the range_command functions are now executed
even if the command returns an error. This is required for EndTransaction
which finds out that it's been aborted.

I addressed all of the TODOs I wanted to deal with before merging. The
transaction record cleanup mechanism isn't there yet: it's the natural candidate
for the next PR. I think this is ready to merge after adding some more tests,
so it's ready for a closer look.

bdarnell · 2015-08-03T21:00:15Z

proto/data.go

+// IsPrev is a more efficient version of k.Next().Equal(m).
+func (k Key) IsPrev(m Key) bool {
+	l := len(m) - 1
+	return l == len(k) && m[l] == 0


This isn't enough. We still need to check that k and m[:l] are equal (consider foo and bar\x00).

geez. Good point.

bdarnell · 2015-08-03T21:21:55Z

LGTM so far.

mrtracy · 2015-08-03T22:10:12Z

Very well commented; everything appears to do what it claims, by my eye. LGTM

tamird · 2015-08-03T23:56:44Z

storage/store.go

+		// need-to-resolve-some-intents-or-a-split-may-never-finish thing
+		// going on on during node shutdown.
+		if !r.ContainsKey(intent.Key) {
+			b := &client.Batch{}


is this using a separate batch for each intent?

yes, see the comment.

tbg · 2015-08-04T09:31:14Z

storage/store.go

+		// TODO(tschottdorf): Also no support for key-range intents yet. We
+		// need to support those because the EndTransaction path will throw
+		// them our way. And we need a test that does exactly that.
+		// TODO(tschottdorf): currently synchronous. See if we still have that


ping @tamird re: comment below

even if it's synchronous, it needs to be tied to the stopper. Is this being invoked inside a RunWorker?

it's not important now. This is provisional code until the TODOs above are addressed. We'll want to do this async eventually.

I think you're asking for a deadlock in a test if you don't RunTask here, but ok

tbg · 2015-08-05T17:58:34Z

PTAL, just rebased and added another test for the intent resolution.
The test is still skipped (or should it be fixed now? @bdarnell), I'll open an issue to fix it up if we don't have one yet.

bdarnell · 2015-08-05T18:08:51Z

The test still needs to be fixed; go ahead and open another issue.

bdarnell · 2015-08-05T20:22:30Z

kv/txn_coord_sender.go

 	aHeader.User = bHeader.User
 	aHeader.UserPriority = bHeader.UserPriority
-	aHeader.Txn = bHeader.Txn
+	// Only allow individual transactions in a batch if


s/individual transactions/transactions on the individual requests of a batch/

From discussion with @spencerkimball: It's weird that ResolveIntent even uses the header transaction, since it is not really a part of that transaction. Instead of allowing requests from multiple transactions to appear in the same batch, it's probably better to just add a Transaction field to ResolveIntentRequest so it doesn't use the header transaction at all (and then we can preserve the invariant that all requests in a batch have the same transaction)

done. I had the same thoughts about header usage by Resolve and Push (and even carried it out in unpushed code) but decided it's worth its own PR. I'll file an issue.

bdarnell · 2015-08-05T20:36:54Z

storage/replica.go

+			continue
+		}
+
+		// If it is local, it goes directly into Raft.


How much does this split save us? If it's going through raft it already involves non-local communication; it would be simpler to just treat everything as if it were remote.

Well thing is we actually want to make sure that local intents are resolved synchronously. If we group these all into a single batch and invoke through non-local dist-sender, we'd need to wait for both the fast local intents and the slow non-local intents before finishing the txn.

We're not resolving them synchronously - we're waiting for the commands to be proposed to raft, and then moving on before they are committed (at least if the comments are still accurate). Some local intents are resolved completely synchronously in EndTransaction, though. The ones that make it here are just the ones that need some other transaction to be pushed before they can be resolved.

Scratch this. I see that this is only used from the Store.resolveWriteIntentError execution path.

How much does this split save us?

good question. The optimization here is to put things into Raft but not wait for them, which means almost certainly that they'll commit before our retrying client gets its next attempt in. So it definitely saves us from having to wait for Raft, which is a couple of round-trips.
I've added a TODO checking this for premature optimization. I imagine we'll refactor this part anyways when we have range-local batch support.

bdarnell · 2015-08-05T20:43:43Z

LGTM

bdarnell · 2015-08-05T21:24:15Z

storage/replica.go

+		action := func() {
+			// Trace this under the ID of the intent owner.
+			ctx := tracer.ToCtx(ctx, r.rm.Tracer().NewTrace(resolveArgs.Header().Txn))
+			if _, err := r.addWriteCmd(ctx, resolveArgs, &wg); err != nil && log.V(0) {


This part isn't in the raft command processing - it's called from the Store that originally proposed the EndTransactionCommand after it has been applied.

I don't understand this comment.
Changed V(0) (back) to V(1) above.

I think I was replying to a comment from @spencerkimball which he has since deleted.

tbg · 2015-08-06T10:24:05Z

I'm ready to merge but going to wait for @bdarnell explaining the one comment I didn't understand.

@bdarnell

this passes all tests except for one (which I've disabled; maybe @bdarnell has an idea?). Basically carried out a lot of the work announced in RFC cockroachdb#1873: - coordinator sends intents along with EndTransaction - range resolves all it possibly can in the EndTransaction batch, and hands everything else off to an async resolver. - "skipped intents" passed up by the range_command functions are now executed even if the command returns an error. This is required for EndTransaction which finds out that it's been aborted. There are a lot of TODOs documented in the code, and I intend to address all of those that are relevant for correctness. Some parts of the RFC are also not yet implemented (gc'ing Txn entries), but the road should be very clear and I'll add those before merging. I did want to get this out early because it's a fairly complex refactoring and will only get bigger from here.

... if they are "intent-less" operations and the batch itself is non-transactional. this allows sending InternalResolveIntent requests for multiple transactions' intents in (*Replica).resolveIntents().

this addresses most of the comments in `(*Replica).resolveIntents`. tests for this functionality are next.

this fairly high-level test makes sure that only non-local intents are dispatched for resolution via a new RPC. It also documents potential for future optimizations. also added a test for verifying range-local key ranges and edited some comments.

range-local intent resolution

tamird · 2015-08-21T17:25:56Z

storage/replica.go

+	if !r.rm.Stopper().RunAsyncTask(action) {
+		// As with local intents, try async to not keep the caller waiting, but
+		// when draining just go ahead and do it synchronously. See #1684.
+		action()


this line appears to have resulted in a deadlock: https://circleci.com/gh/cockroachdb/cockroach/6250

maybe not. #2206

tbg added the PTAL label Aug 3, 2015

tbg force-pushed the intents_range branch 2 times, most recently from 035e7b0 to 861eda1 Compare August 3, 2015 20:25

bdarnell reviewed Aug 3, 2015
View reviewed changes

tamird reviewed Aug 3, 2015
View reviewed changes

tbg force-pushed the intents_range branch 5 times, most recently from f837200 to 5caed68 Compare August 4, 2015 07:46

tbg reviewed Aug 4, 2015
View reviewed changes

tbg force-pushed the intents_range branch 5 times, most recently from c891ce6 to e3bc0ef Compare August 4, 2015 12:54

tbg changed the title ~~WIP: range-local intent resolution~~ range-local intent resolution Aug 4, 2015

tbg force-pushed the intents_range branch 2 times, most recently from 39db2aa to d1525f6 Compare August 5, 2015 17:55

bdarnell mentioned this pull request Aug 5, 2015

TestRaftAfterRemoveRange fails with CPUS=4 #1980

Closed

bdarnell reviewed Aug 5, 2015
View reviewed changes

tbg mentioned this pull request Aug 6, 2015

Txn on Batch and ResolveIntent #1988

Closed

tbg mentioned this pull request Aug 6, 2015

execution errors with attached engine.Batch (for EndTransaction) #1989

Closed

tbg force-pushed the intents_range branch from 5dae9d8 to eaf3843 Compare August 6, 2015 10:40

tbg added 2 commits August 6, 2015 10:52

allow some txns in non-txn batch

b1370d4

... if they are "intent-less" operations and the batch itself is non-transactional. this allows sending InternalResolveIntent requests for multiple transactions' intents in (*Replica).resolveIntents().

tbg force-pushed the intents_range branch from eaf3843 to 00c4e52 Compare August 6, 2015 14:53

tbg added 2 commits August 6, 2015 11:07

add intent key range handling at Replica level

a244988

this addresses most of the comments in `(*Replica).resolveIntents`. tests for this functionality are next.

test for local intent resolution

20d9bfe

this fairly high-level test makes sure that only non-local intents are dispatched for resolution via a new RPC. It also documents potential for future optimizations. also added a test for verifying range-local key ranges and edited some comments.

tbg force-pushed the intents_range branch from 00c4e52 to 20d9bfe Compare August 6, 2015 15:07

tbg added a commit that referenced this pull request Aug 6, 2015

Merge pull request #1916 from tschottdorf/intents_range

0c71a20

range-local intent resolution

tbg merged commit 0c71a20 into cockroachdb:master Aug 6, 2015

tbg removed the PTAL label Aug 6, 2015

tbg deleted the intents_range branch August 6, 2015 15:25

tamird reviewed Aug 21, 2015
View reviewed changes

cucaroach mentioned this pull request Nov 4, 2021

sql: goroutines seemingly not cancelling #72445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

range-local intent resolution #1916

range-local intent resolution #1916

tbg commented Aug 3, 2015

bdarnell Aug 3, 2015

tbg Aug 4, 2015

bdarnell commented Aug 3, 2015

mrtracy commented Aug 3, 2015

tamird Aug 3, 2015

tbg Aug 4, 2015

tbg Aug 4, 2015

tamird Aug 4, 2015

tbg Aug 4, 2015

tamird Aug 4, 2015

tbg commented Aug 5, 2015

bdarnell commented Aug 5, 2015

bdarnell Aug 5, 2015

bdarnell Aug 5, 2015

tbg Aug 6, 2015

tbg Aug 6, 2015

bdarnell Aug 5, 2015

spencerkimball Aug 5, 2015

bdarnell Aug 5, 2015

spencerkimball Aug 5, 2015

tbg Aug 6, 2015

bdarnell commented Aug 5, 2015

bdarnell Aug 5, 2015

tbg Aug 6, 2015

bdarnell Aug 6, 2015

tbg commented Aug 6, 2015

tamird Aug 21, 2015

tamird Aug 21, 2015

range-local intent resolution #1916

range-local intent resolution #1916

Conversation

tbg commented Aug 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdarnell commented Aug 3, 2015

mrtracy commented Aug 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbg commented Aug 5, 2015

bdarnell commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdarnell commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbg commented Aug 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment