Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not return TransactionAbortedError when rolling back an aborted txn #5885

Merged
merged 1 commit into from
Apr 7, 2016

Conversation

kkaneda
Copy link
Contributor

@kkaneda kkaneda commented Apr 6, 2016

(I was curious if we can fix #3037 (and #1989). The change introduces some complexity and I'm not totally sure we should fix these issues.)

This commit changes EndTransaction to return nil error on txn abort so that the we can commit a batch and update the transaction record.

Change txnSender.Send so that it convert a nil error to TransactionAbortedError so that txn.Exec will initiate a restart. This change is hacky, but I couldn't come up with a better approach.

We still return TransactionAbortedError in other places and keep the code path for handling TransactionAbortedError.


This change is Reviewable

@tbg
Copy link
Member

tbg commented Apr 6, 2016

I like the idea of letting an EndTransaction{Commit:false} succeed even if it finds an already aborted transaction. Assuming there are no weird caveats, this can help "bring the intents home". Plus, we potentially double-abort transactions in various places, and making this not error out can allow us to start logging errors which were previously too noisy. I think that part of the change should be safe unless there's code out there which derives from a successful abort the information that the transaction was previously alive.

On the other hand, your change seems to go further and is half-way to getting rid of TransactionAbortedError, which I don't think is a change we want to make at the moment - changing semantics in that way is brittle. What's the reason you made those changes?


Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


storage/replica_test.go, line 1771 [r1] (raw file):
s/Error/Fatal/


storage/replica_test.go, line 2097 [r1] (raw file):
if test.expErrRegexp == "


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 6, 2016

Not getting an error in the deadline exceeded case seems particularly problematic, since the caller (or grand-caller) of the transaction is unlikely to inspect the txn proto (or even have access to it) after EndTransaction.


Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


client/txn.go, line 83 [r1] (raw file):
use GetArg?


Comments from Reviewable

@kkaneda
Copy link
Contributor Author

kkaneda commented Apr 6, 2016

Ah, I see. The idea was briefly mentioned in #3037:

b) if the commit only fails because of the deadline, you return this information not as an error but encoded in the response. This is similar to the recent discussions about ConditionalPut. A client would not receive a TransactionAbortedError but instead find out that its proto isn't PENDING any more. Semantically that should actually work out well, but needs a bunch of refactoring (TransactionAbortedError would just go away, we don't really need it - a BatchResponse carries a transaction proto as well).

If I understand correctly, this allows us to commit a batch and update a transaction record. We can instead tweak applyRaftCommandInBatch to write on TransactionAbortedError like TransactionRetryError, but I thought just dropping TransactionAbortedError would be simpler (and I had fewer test failures with that approach than the other approach).

I thought a caller can still inspect an error since we convert a nil-error to NewTransactionAbortedError in txnSender.Send.

Having said, I totally agree that the change is (very) brittle. Let me try getting rid of the part and see if just making the change to the rollback part works. Thanks!

@tamird
Copy link
Contributor

tamird commented Apr 6, 2016

I thought a caller can still inspect an error since we convert a nil-error to NewTransactionAbortedError in txnSender.Send.

Ah, I missed that - that seems right. We just need to adjust the tests so we can demonstrate how a user of a transaction with a deadline would actually see an error (currently the tests are changed to inspect the proto).

@tbg
Copy link
Member

tbg commented Apr 6, 2016

I'm against changing the deadline semantics at this point. Let's keep it simple and swallow errors only if the intention was to abort and there is indeed a transaction aborted proto. That way the only change in semantic is that you don't fail to abort if someone else did it for you already. If you're attempting to commit with a deadline constraint, I think you should get the error - what you wanted to happen did not happen.

@kkaneda kkaneda force-pushed the kaneda/no_txn_abort_error branch from d33847d to 3b8d91f Compare April 7, 2016 02:08
@kkaneda
Copy link
Contributor Author

kkaneda commented Apr 7, 2016

Sure, updated the PR. It is now simple and just changes Replica.EndTransation to return a nil-error when it finds a transaction that has already been aborted, but the request is requesting a rollback (Commit = false). Could you take another look?

@kkaneda kkaneda changed the title Write a txn status instead of returning TransactionAbortedError Do not return TransactionAbortedError when rolling back an aborted txn Apr 7, 2016
@tbg
Copy link
Member

tbg commented Apr 7, 2016

:lgtm:


Reviewed 7 of 7 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


storage/replica_command.go, line 371 [r2] (raw file):
Add to the comment that rolling back an already rolled-back txn is ok.


storage/replica_command.go, line 372 [r2] (raw file):
Could you remove that TODO?


storage/replica_command.go, line 416 [r2] (raw file):
add to the TODO:

Add external intents to txn proto?


storage/replica_test.go, line 2128 [r2] (raw file):
From here on it's like above, so you could for i := range []int{0,1} { ... }.


Comments from Reviewable

Change Replica.EndTransation to return a nil-error when it finds a
transaction that has already been aborted, but the request is
requesting a rollback (Commit = false).
@kkaneda kkaneda force-pushed the kaneda/no_txn_abort_error branch from 3b8d91f to 066b1f9 Compare April 7, 2016 13:47
@kkaneda
Copy link
Contributor Author

kkaneda commented Apr 7, 2016

Thanks for the review!


Review status: 0 of 2 files reviewed at latest revision, 7 unresolved discussions.


storage/replica_command.go, line 371 [r2] (raw file):
Done.


storage/replica_command.go, line 372 [r2] (raw file):
Done.


storage/replica_command.go, line 416 [r2] (raw file):
Done.


storage/replica_test.go, line 2128 [r2] (raw file):
Done.


Comments from Reviewable

@kkaneda kkaneda merged commit c0a3004 into master Apr 7, 2016
@kkaneda kkaneda deleted the kaneda/no_txn_abort_error branch April 7, 2016 15:26
kkaneda added a commit that referenced this pull request Apr 8, 2016
This is a follow-up of #5885. Resolve local intents and update the
transaction record when we attempt to roll back the transaction and
find that it has already been aborted.

Define helper method resolveExplicitIntents and call it.
kkaneda added a commit that referenced this pull request Apr 9, 2016
This is a follow-up of #5885. Resolve local intents and update the
transaction record when we attempt to roll back the transaction and
find that it has already been aborted.

Define helper method resolveExplicitIntents and call it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: untangle aborting transactions from cancelling batches
3 participants