Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc: restore unsynced watchers #9281

Merged
merged 1 commit into from
Feb 6, 2018
Merged

Conversation

yudai
Copy link
Contributor

@yudai yudai commented Feb 6, 2018

For #9086

In case syncWatchersLoop() starts before Restore() is called,
watchers already added by that moment are moved to s.synced by the loop.
However, there is a broken logic that moves watchers from s.synced
to s.uncyned without setting keyWatchers of the watcherGroup.
Eventually syncWatchers() fails to pickup those watchers from s.unsynced
and no events are sent to the watchers, because newWatcherBatch() called
in the function uses wg.watcherSetByKey() internally that requires
a proper keyWatchers value.

This might be a cause of missing update issues that happen on node failures.

@yudai yudai force-pushed the fix_unrestored_watchers branch from 36cfb33 to 438c7eb Compare February 6, 2018 01:36
@xiang90
Copy link
Contributor

xiang90 commented Feb 6, 2018

lgtm.

@yudai can you add a test for this fix?

@xiang90
Copy link
Contributor

xiang90 commented Feb 6, 2018

/cc @gyuho @jpbetz this needs to be backported to both 3.2 and 3.3

@codecov-io
Copy link

Codecov Report

Merging #9281 into master will decrease coverage by 0.22%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9281      +/-   ##
==========================================
- Coverage      76%   75.78%   -0.23%     
==========================================
  Files         364      364              
  Lines       30231    30231              
==========================================
- Hits        22978    22911      -67     
- Misses       5657     5716      +59     
- Partials     1596     1604       +8
Impacted Files Coverage Δ
internal/mvcc/watchable_store.go 85.23% <0%> (-1.11%) ⬇️
proxy/grpcproxy/register.go 69.44% <0%> (-25%) ⬇️
clientv3/namespace/watch.go 87.87% <0%> (-12.13%) ⬇️
etcdserver/api/v3rpc/lease.go 78.37% <0%> (-8.11%) ⬇️
internal/auth/simple_token.go 87.03% <0%> (-6.49%) ⬇️
pkg/adt/interval_tree.go 87.08% <0%> (-4.51%) ⬇️
proxy/grpcproxy/watch.go 90.06% <0%> (-3.11%) ⬇️
etcdserver/api/v3rpc/watch.go 83.4% <0%> (-2.98%) ⬇️
etcdserver/api/v3election/election.go 64.7% <0%> (-2.95%) ⬇️
clientv3/concurrency/election.go 79.52% <0%> (-2.37%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3903385...438c7eb. Read the comment docs.

In case syncWatchersLoop() starts before Restore() is called,
watchers already added by that moment are moved to s.synced by the loop.
However, there is a broken logic that moves watchers from s.synced
to s.uncyned without setting keyWatchers of the watcherGroup.
Eventually syncWatchers() fails to pickup those watchers from s.unsynced
and no events are sent to the watchers, because newWatcherBatch() called
in the function uses wg.watcherSetByKey() internally that requires
a proper keyWatchers value.
@yudai yudai force-pushed the fix_unrestored_watchers branch from 438c7eb to fcab10b Compare February 6, 2018 18:30
@yudai
Copy link
Contributor Author

yudai commented Feb 6, 2018

@xiang90 Added a test case as a sub test. Let me know in case you like simpler test cases.

}

t.Run("Normal", test(0))
t.Run("RunSyncWatchLoopBeforeRestore", test(time.Millisecond*120)) // longer than default waitDuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yudai this test will 100% fail without this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiang90 It's hard to guarantee because it's a timing issue, but it fails at least 99% on my env.
The existing test case was failing sometimes because of random delay caused by the runtime here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good enough.

@xiang90
Copy link
Contributor

xiang90 commented Feb 6, 2018

I prefer to move the sub test func out of the test func. Not a huge fan of defining func scope functions. but not a big deal.

LGTM

@xiang90
Copy link
Contributor

xiang90 commented Feb 6, 2018

defer to @gyuho

Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @yudai!

/cc @jpbetz Probably we can do patch release together with Golang security release https://groups.google.com/forum/#!topic/golang-announce/lGkem2e5WyQ. Probably this or next week.

@gyuho gyuho merged commit 63183f8 into etcd-io:master Feb 6, 2018
jpbetz added a commit to jpbetz/etcd that referenced this pull request Feb 7, 2018
jpbetz added a commit that referenced this pull request Feb 8, 2018
@jpbetz
Copy link
Contributor

jpbetz commented Feb 13, 2018

cc @wojtek-t

monopole pushed a commit to monopole/kubernetes that referenced this pull request Mar 6, 2018
…rade

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```
k8s-publishing-bot added a commit to kubernetes/sample-apiserver that referenced this pull request Mar 6, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
k8s-publishing-bot added a commit to kubernetes/apiextensions-apiserver that referenced this pull request Mar 6, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
k8s-publishing-bot added a commit to kubernetes/sample-apiserver that referenced this pull request Mar 7, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
jpbetz added a commit to jpbetz/etcd that referenced this pull request Mar 8, 2018
sttts pushed a commit to sttts/sample-apiserver that referenced this pull request Mar 16, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
sttts pushed a commit to sttts/sample-apiserver that referenced this pull request Mar 16, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
sttts pushed a commit to sttts/apiextensions-apiserver that referenced this pull request Mar 16, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
openshift-publish-robot pushed a commit to openshift/kubernetes-sample-apiserver that referenced this pull request Jan 14, 2019
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Bump etcd server patch version to 3.2.16

etcd 3.2.16 contains a critical fix for HA clusters: etcd-io/etcd#9281

Also, update newly added tests to use `REGISTRY` make variable.

Release note:
```release-note
Upgrade the default etcd server version to 3.2.16
```

Kubernetes-commit: 9aae9b58a5e2b78d27f2bd093fc1c4299bf63c36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants