Leases may be auto-renewed indefinitely due to leader elections #9888

jpbetz · 2018-06-26T21:01:54Z

Per the etcd ops guide: "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."

Kubernetes event leases are 1hr by default are never renewed, and if not revoked according to their original TTL, the volume of events can eventually exceed the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).

Would it be possible to either:

a) Persist remaining TTL for long leases so they are expired on-time after leader changes (say, TTLs of, say more than 5min ?)
b) Allow auto-renew to be disabled when creating leases

From @xiang90's comment on kubernetes/kubernetes#65497 it sounds like this might need to persist remaining durations to support either.

xref: kubernetes/kubernetes#65497

jpbetz · 2018-06-26T21:03:34Z

c.f. #9526. Thanks for finding @gyuho

xiang90 · 2018-06-26T21:04:09Z

a) Do not auto-renew TTLs for long leases (say, TTLs of, say more than 5min ?)

As I suggested in the k8s issue, a better approach is to persist the remaining TTL of long leases lazily. It wont hurt performance :)

jpbetz · 2018-06-26T21:06:06Z

@xiang90 Sounds good, I'll update the description to match. What do you mean by "lazily"?

xiang90 · 2018-06-26T21:13:01Z

@jpbetz

If we persist the deadline of all the leases through raft for every keepalive request, then after a new leader is elected it wont need to refresh all leases (since it knows the deadline already). But this can be super expensive when we have a lot of keepalives for short leases(i believe chubby paper mentioned this too). Yes, some users will send keepalive at second level.

However, if we persist nothing then we have to do a refresh which leads to the problem you just described. We can make a tradeoff by persisting the deadline of a long lease every X minutes (or only every X minutes only if it is refreshed by a keepalive)

jpbetz · 2018-06-26T21:37:21Z

Got it. Thanks @xiang90!

wojtek-t · 2018-06-29T08:03:06Z

+1 to what Xiang described - this sounds really reasonable.

jpbetz · 2018-06-29T17:00:42Z

Great, I'll write up a short "design doc" for how we plan to implement and circulate it for review.

jpbetz · 2018-07-14T00:52:16Z

@xiang90 Here's an approach that seems to work well and I believe is inline with what was proposed above: #9924. Let me know if this differs from what you were thinking.

jpbetz mentioned this issue Jun 26, 2018

etcd lease auto-renewal can extend event TTL indefinitely kubernetes/kubernetes#65497

Closed

jpbetz self-assigned this Jun 26, 2018

jpbetz mentioned this issue Jul 14, 2018

lease: Persist remainingTTL to prevent indefinite auto-renewal of long lived leases #9924

Merged

gyuho added area/performance type/feature labels Jul 23, 2018

jpbetz closed this as completed in #9924 Jul 24, 2018

ishan16696 mentioned this issue Dec 18, 2023

[Feature] Persist etcd lease TTLs by enabling checkpointing and analyse its effect like on write throughput, disk usage etc. gardener/etcd-druid#733

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leases may be auto-renewed indefinitely due to leader elections #9888

Leases may be auto-renewed indefinitely due to leader elections #9888

jpbetz commented Jun 26, 2018 •

edited

Loading

jpbetz commented Jun 26, 2018

xiang90 commented Jun 26, 2018

jpbetz commented Jun 26, 2018 •

edited

Loading

xiang90 commented Jun 26, 2018 •

edited

Loading

jpbetz commented Jun 26, 2018

wojtek-t commented Jun 29, 2018

jpbetz commented Jun 29, 2018

jpbetz commented Jul 14, 2018

Leases may be auto-renewed indefinitely due to leader elections #9888

Leases may be auto-renewed indefinitely due to leader elections #9888

Comments

jpbetz commented Jun 26, 2018 • edited Loading

jpbetz commented Jun 26, 2018

xiang90 commented Jun 26, 2018

jpbetz commented Jun 26, 2018 • edited Loading

xiang90 commented Jun 26, 2018 • edited Loading

jpbetz commented Jun 26, 2018

wojtek-t commented Jun 29, 2018

jpbetz commented Jun 29, 2018

jpbetz commented Jul 14, 2018

jpbetz commented Jun 26, 2018 •

edited

Loading

jpbetz commented Jun 26, 2018 •

edited

Loading

xiang90 commented Jun 26, 2018 •

edited

Loading