execution errors with attached engine.Batch (for EndTransaction) #1989

tbg · 2015-08-06T10:35:58Z

all replica commands (EndTransaction, Put, ...) operate on an engine.Batch which is discarded by the caller if an error is returned.
That worked well so far, but some commands actually fail and need something written to the engine: In #1916, EndTransaction resolves local intents in the same batch, but on an error, there's little it can do locally: it needs to return them as skipped intents so that they can be resolved indirectly.

Even worse is when the error is related to the Txn itself: in case of an epoch or timestamp regression, you'd reasonably expect to have the transaction aborted - but you can't, because it's returning an error. This means that any open intents anywhere in the system cannot resolve until they abort the transaction or it times out.

A carefully set up system to return something like a

type ErrorWithBatch{
  error
  ms *engine.MVCCStats
  batch engine.Batch
}

could be the best way to deal with the above. In case of errors, we create a new batch anyways to update the response cache: it's possible that the change will be relatively easy.

Once that (or a related mechanism - after all, the above is only my initial suggestion) is in place, we can clean up in EndTransaction on error.

The text was updated successfully, but these errors were encountered:

bdarnell · 2015-08-06T15:41:00Z

There's no reason that all the methods in replica_command.go have the same signature; they already differ because only some of them return intents. Rather than trying to combine an error and other data in the same object, we could make EndTransaction return an engine.Batch in addition to its other return values. Replica.executeCmd could return this batch to Replica.applyRaftCommandInBatch, since the latter method is already prepared to commit a batch whether there was an error or not.

tbg · 2015-08-06T16:35:13Z

but Replica.applyRaftCommandInBatch discards the batch on error, which means r.executeCmd needs to get two more return parameters (stats and batch). My suggestion doesn't look so bad, I think. I'm impartial though. Whatever gets the job done and is most legible.

    // Execute the command.
    reply, intents, rErr := r.executeCmd(batch, ms, args)
    if proto.IsWrite(args) {
    if rErr == nil {
        // ...
    } else {
        batch.Close()
        if bErr, ok := rErr.(*errWithBatch); ok {
            batch = // ...
            // call MergeMVCCStats, set ms = ...
        } else {
            batch = r.rm.Engine().NewBatch()
        }
    }
    // ...
}

spencerkimball · 2015-08-06T17:14:17Z

I'm unclear on the specific use cases where this is necessary. Could you elaborate a little?

tbg · 2015-08-06T17:18:56Z

EndTransaction which fails with, for example, timestamp regression still needs to abort the transaction.

spencerkimball · 2015-08-06T17:21:52Z

That's not a use case which justifies this kind of a change. That's a sanity check--not something that is expected to ever occur (certainly not in the normal course of events).

tbg · 2015-08-06T17:25:26Z

Ok, but then we should panic there.

Another case is EndTransaction which tries to commit but has been aborted by someone else (who obviously didn't know anything about the intents). Will still be good to resolve those intents which are local right away.

spencerkimball · 2015-08-06T18:57:58Z

I'm not sure it's panic-worthy. It's not going to affect the correctness of the system or lead to corruption.

For the second case, doesn't seem worthwhile to introduce the complexity of returning a batch just so we end up cleaning up local intents synchronously.

tbg · 2015-08-11T16:58:38Z

Going to close for now and address some of the above issues in upcoming work towards completing #1821.

(I was curious if we can fix #3037 (and #1989). The change introduces some complexity and I'm not totally sure we should fix these issues.) This commit changes EndTransaction to return nil error on txn abort so that the we can commit a batch and update the transaction record. Change txnSender.Send so that it convert a nil error to TransactionAbortedError so that txn.Exec will initiate a restart. This change is hacky, but I couldn't come up with a better approach. We still return TransactionAbortedError in other places and keep the code path for handling TransactionAbortedError.

tbg mentioned this issue Aug 6, 2015

range-local intent resolution #1916

Merged

tbg closed this as completed Aug 11, 2015

This was referenced Nov 7, 2015

client/Txn: support CommitBy with a deadline #3028

Merged

storage: untangle aborting transactions from cancelling batches #3037

Closed

kkaneda mentioned this issue Apr 6, 2016

Do not return TransactionAbortedError when rolling back an aborted txn #5885

Merged

bdarnell mentioned this issue Jun 13, 2016

sql: TestTxnObeysLeaseExpiration is disabled #7031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution errors with attached engine.Batch (for EndTransaction) #1989

execution errors with attached engine.Batch (for EndTransaction) #1989

tbg commented Aug 6, 2015

bdarnell commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 11, 2015

execution errors with attached engine.Batch (for EndTransaction) #1989

execution errors with attached engine.Batch (for EndTransaction) #1989

Comments

tbg commented Aug 6, 2015

bdarnell commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 6, 2015

spencerkimball commented Aug 6, 2015

tbg commented Aug 11, 2015