Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: splitPostApply can see tombstone for RHS #40470

Closed
knz opened this issue Sep 4, 2019 · 9 comments
Closed

storage: splitPostApply can see tombstone for RHS #40470

knz opened this issue Sep 4, 2019 · 9 comments
Assignees
Labels
A-disaster-recovery S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@knz
Copy link
Contributor

knz commented Sep 4, 2019

Describe the problem

node fails with

F190904 14:33:35.276986 138 storage/store.go:2148  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] split trigger found right-hand side with tombstone {NextReplicaID:5}: [n1,s1,r316/?:{-}]

To Reproduce

  1. roachprod create knz-gce -u knz -c gce --geo -n 10
  2. roachprod start $CLUSTER:1,4,7-8
  3. roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=5000 --db=tpcc --experimental-direct-ingestion"

This fails within 1-2 minutes.

Relevant log lines:

I190904 14:33:33.261425 180 server/status/runtime.go:498  [n1] runtime stats: 4.7 GiB RSS, 407 goroutines, 2.6 GiB/718 MiB/3.4 GiB GO alloc/idle/total, 1.1 GiB/1.3 GiB CGO alloc/total, 210805.4 CGO/sec, 381.5/16.5 %(u/s)time, 0.0 %gc (3x),
 223 MiB/85 MiB (r/w)net
I190904 14:33:33.964062 11702 storage/replica_raft.go:291  [n1,s1,r261/1:/Table/56{-/1}] proposing REMOVE_REPLICA[(n4,s4):3]: after=[(n1,s1):1 (n3,s3):2 (n2,s2):5] next=6
W190904 14:33:34.137913 11847 storage/replica_raft.go:105  [n1,s1,r323/1:/Table/60/1/2{633/2/…-766/4/…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:34.489669 12200 storage/replica_command.go:1521  [n1,replicate,s1,r247/1:/{Table/61/3-Max}] change replicas (add [] remove [(n3,s3):2]): existing descriptor r247:/{Table/61/3-Max} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n4,s4):5,
next=6, gen=20]
I190904 14:33:34.669612 11610 storage/replica_raftstorage.go:793  [n1,s1,r313/4:{-}] applying LEARNER snapshot [id=faf20096 index=15]
I190904 14:33:34.944105 11610 storage/replica_raftstorage.go:814  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] applied LEARNER snapshot [total=274ms ingestion=4@217ms id=faf20096 index=15]
I190904 14:33:35.053208 12088 storage/split_queue.go:149  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] split saw concurrent descriptor modification; maybe retrying
W190904 14:33:35.202730 12090 storage/replica_raft.go:105  [n1,s1,r304/1:/Table/53/1/250{3/7/-…-5/3/-…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:35.232699 12190 storage/replica_command.go:395  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] initiating a split of this range at key /Table/54/1/1254/11837 [r329] (77 MiB above threshold size 64 MiB)
F190904 14:33:35.276986 138 storage/store.go:2148  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] split trigger found right-hand side with tombstone {NextReplicaID:5}: [n1,s1,r316/?:{-}]
goroutine 138 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000448301, 0xc000448360, 0x0, 0x7c662a)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x7c04a40, 0xc000000004, 0x73d2595, 0x10, 0x864, 0xc0006e43c0, 0x89)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x93e
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4e5e460, 0xc03ccc99e0, 0x4, 0x2, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4e5e460, 0xc03ccc99e0, 0x1, 0xc000000004, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/storage.splitPostApply(0x4e5e460, 0xc03ccc99e0, 0x0, 0x15c142ce810fb293, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2148 +0xb3e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleSplitResult(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_result.go:233
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult(0xc003e298c0, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0, 0xc0003e1e40, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:943 +0x852
github.com/cockroachdb/cockroach/pkg/storage.(*replicaStateMachine).ApplySideEffects(0xc003e298c0, 0x4eacee0, 0xc063842008, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_application_state_machine.go:856 +0x72d
github.com/cockroachdb/cockroach/pkg/storage/apply.mapCheckedCmdIter(0x7f378dc4b0c8, 0xc003e29ad8, 0xc0033953d8, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/cmd.go:182 +0x11b
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).applyOneBatch(0xc003395800, 0x4e5e460, 0xc03ccc99e0, 0x4eacfa0, 0xc003e29a78, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:276 +0x228
github.com/cockroachdb/cockroach/pkg/storage/apply.(*Task).ApplyCommittedEntries(0xc003395800, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/apply/task.go:242 +0xcf
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc003e29800, 0x4e5e460, 0xc03ccc99e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:759 +0xd87
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x4e5e460, 0xc03ccc99e0, 0xc003e29800, 0x4e5e460)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3599 +0x131
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000adc000, 0x4e5e460, 0xc03ccc99e0, 0xc009c02200, 0xc074b31e98, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3352 +0x150

Expected behavior

Import succeeds

Context

kena@knz-gce-0001:~$ ./cockroach  version
Build Tag:    v19.2.0-alpha.20190606-2012-g58d0fc3
Build Time:   2019/09/04 11:31:21
Distribution: CCL
Platform:     linux amd64 (x86_64-unknown-linux-gnu)
Go Version:   go1.12.5
C Compiler:   gcc 6.3.0
Build SHA-1:  58d0fc3676726c7fa3ebaf41e99f54f305f25fa0
Build Type:   release
@knz
Copy link
Contributor Author

knz commented Sep 4, 2019

cc @dt for triage
(I was trying --experimental-direct-ingestion as suggested offline - without the flag, the import seems OK)

@knz knz added A-disaster-recovery S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. labels Sep 4, 2019
@dt
Copy link
Member

dt commented Sep 4, 2019

--experimental-direct-ingestion is the default in 19.2 so adding the flag should do nothing.

Is this BulkIO? that stack trace like it is coming from core, no? Or does core think that IMPORT is mis-behaving?

@knz
Copy link
Contributor Author

knz commented Sep 4, 2019

ok so you're saying this failure is not deterministic, and I was jut (un)lucky to see the panic when I passed the flag explicitly and not see the panic when I omitted the flag?

If so that's a bummer because it will make the investigation a bit lengthy.

@knz
Copy link
Contributor Author

knz commented Sep 4, 2019

@danhhz how do you think this should be triaged?

@tbg
Copy link
Member

tbg commented Sep 4, 2019

Will be fixed by #39796.

@tbg
Copy link
Member

tbg commented Sep 4, 2019

Actually if you haven't removed this cluster yet, could you grab the logs from all nodes?

@tbg
Copy link
Member

tbg commented Sep 4, 2019

The crash here indicates that a split trigger found a tombstone for the right-hand side replica. We assert against that because the thinking is that this should never happen: we can't replicaGC a replica that is uninitialized (=has never received a snapshot), and (or so we thought) the RHS is always uninitialized before the snapshot because snapshots are blocked by the LHS until the split trigger applies. But I think there's a way in which this can be violated if the LHS comes into existence late on the crashing store. Something like this:

  1. using preemptive snapshots still
  2. r1 is on store s1 only
  3. r1 gets added to s2 and gets split (creating r2) in rapid succession
  4. preemptive snap on s2 gets replicaGC'ed
  5. right hand side r2 manages to place a preemptive snap on s2 (LHS is not there to block it)
  6. preemptive snap on s2/r2 gets replicaGC'ed, leaving a tombstone for actualDesc.NextReplicaID
  7. s2/r1 catches up and executes the split trigger, finding the tombstone on the RHS

Some variant of this basically has to be it. Raft tombstones are only written in two places: when a range is subsumed and during replicaGC. On merges, we use math.MaxInt32 as NextReplicaID but the example shows a 5. So it must be ReplicaGC. ReplicaGC can only happen to initialized replicas (which includes preemptive snaps), so the RHS has to have been init'ed at some point. And this implies that the pre-split LHS wasn't there then.

An alternative could work like this

  1. add s2/r1, split, remove s2/r1 again before it has executed the trigger
  2. r1/s2 replicagc'ed
  3. r2/s2 gets fully added (r1/s2 is gone so snapshot goes through)
  4. r2/s2 gets removed again and replicaGC'ed, leaving tombstone
  5. r1/s2 gets re-added, so it catches up across the split trigger seeing the tombstone.

I think the second scenario is more likely since @knz was running 19.2-alpha which never uses preemptive snaps.

@tbg tbg changed the title storage+tpcc: import fixtures with direct ingestion triggers assertion failure storage: splitPostApply can see tombstone for RHS Sep 6, 2019
tbg added a commit to tbg/cockroach that referenced this issue Sep 6, 2019
This is the next band-aid on top of cockroachdb#39658 and cockroachdb#39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

There is one more bug here that is not addressed by this change,
namely the fact that there might be a tombstone for the right
hand side. This continues to be tracked in cockroachdb#40470.

See also
cockroachdb#40367 (comment)
for more work we need to do to establish sanity.

Release note: None
@dt
Copy link
Member

dt commented Sep 6, 2019

Also seen in import/tpcc/warehouses=1000/nodes=32 run on 9/2 (node 30's logs) in https://teamcity.cockroachdb.com/viewLog.html?buildId=1463583&buildTypeId=Cockroach_Nightlies_WorkloadNightly

tbg added a commit to tbg/cockroach that referenced this issue Sep 6, 2019
This is the next band-aid on top of cockroachdb#39658 and cockroachdb#39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

There is one more bug here that is not addressed by this change,
namely the fact that there might be a tombstone for the right
hand side. This continues to be tracked in cockroachdb#40470.

See also
cockroachdb#40367 (comment)
for more work we need to do to establish sanity.

Release note: None
@nvanbenschoten
Copy link
Member

I still see this with #39796.

@ajwerner ajwerner self-assigned this Sep 12, 2019
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
@craig craig bot closed this as completed in 2020115 Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

No branches or pull requests

5 participants