Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tighten up default backpressure settings #1409

Closed
Tracked by #2028
stepashka opened this issue Mar 25, 2022 · 7 comments
Closed
Tracked by #2028

tighten up default backpressure settings #1409

stepashka opened this issue Mar 25, 2022 · 7 comments
Labels
c/compute Component: compute, excluding postgres itself c/storage/pageserver Component: storage: pageserver
Milestone

Comments

@stepashka
Copy link
Member

it makes sense to configure backpressure in seconds, instead of bytes
configuration in seconds would untie the backpressure mechanics from the WAL redo speed
this relates to #1343 and to #1361

@stepashka stepashka added c/storage/pageserver Component: storage: pageserver c/cloud/compute labels Mar 25, 2022
@stepashka stepashka added this to the 0.6 Towards Tech Prev milestone Mar 25, 2022
@stepashka
Copy link
Member Author

WDYT, @knizhnik , @petuhovskiy , @arssher

@knizhnik
Copy link
Contributor

We can not calculate precise time lag: we can only estimate it.
What we know is current LSN position at compute and current LSN position at page safekeep/server/S3. So we can measure lag between them. Backpressure allows to limit this lag.
Now it is done in very rude way: backend is just blocked until backpressure limit is satisfied.
I thunk that in future we will need more flexible policy.

Backpressure is used for two purposes:

  1. Reduce query latency caused by wait_at_lsn. SO we intersted that this latency doesn;t exceed some reasonable value (i.e. 1 second). Average speed of applying WAL is abut 10Mb/sec. So it leads to max_write lag to be about 1Mb. Certainly real speed depends on many conditions (and not only on type of WAL record) so can vary several times. This ss why specifying lag in seconds rather than bytes seemsto be bad idea.
  2. Avoid storage overflow caused by keeping WAL at compute node and safekeepers. Here time criteria is absolutely not applicable.

@hlinnaka
Copy link
Contributor

Good idea, if we can pull it off. How easily and how reliably can we estimate the time lag?

@stepashka
Copy link
Member Author

@kelvich , adding you in a conversation here, WDYT?

We can not calculate precise time lag: we can only estimate it.
How easily and how reliably can we estimate the time lag?

@knizhnik
Copy link
Contributor

knizhnik commented Apr 4, 2022

IMHO it is wrong direction.
Let speak about write_replication_lag, which seems to be the most critical.
It cause wait_for_lsn delays until pageserver receives correspondent LSN.
So whats happen on this way of WAL record from compute node to page server?

  1. Record in inserted in WAL write buffers
  2. Record is written and flushed to the disk by some backend (postgres may combine several transactions and flush their changes using one fsync)
  3. walsender is notified
  4. walsendwer reads record from WAL file
  5. walsender broadcasts part of WAL to safekeepers
  6. safekeepers receive message from wal sender and write them to the disk
  7. safekeeper sends wal records to pageserver
  8. pageserver receives wal records and parse then
  9. pageserver puts received WAL records in inmem layers (actually them are appended to ephemeral file)
  10. If checkpoint distance is reached, than in-mem layer is frozen and is written to the disk by separate thread.
  11. After processing of all received WAL records, pageserver adjust last_written_lsn and notify threads waiting for LSN.

So, as you can see - there are a lot of steps and time of execution of each steps depends on many factors, i.e. network and system load. It is almost impossible to somehow "theoretically" predict this time. We can only to to measure average speed and so make some prediction of how fast N bytes of write lag will reach pageserver and be processed by it.
But, as far as it mostly depends on system load and background activity of pageserer (compaction, GC...) such estimation actually has very little sense. It is like most reliable weather forecast (use current temperature).

IMHO it is better to concentrate on providing some kind of "adaptive" behavior: if write lag increased, we may try to reduce pageserver background activity (delay or suspend compaction/GC, use larger delays,...). Increased write lag may be a reason to move tenant to another pageserver.

@shanyp
Copy link
Contributor

shanyp commented Jul 19, 2023

@hlinnaka @knizhnik any followups ?
Is this something that we should first monitor the time_lag and then see if its still relevant ?

@jcsp
Copy link
Collaborator

jcsp commented Mar 11, 2024

Stale.

@jcsp jcsp closed this as completed Mar 11, 2024
@stepashka stepashka added the c/compute Component: compute, excluding postgres itself label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute, excluding postgres itself c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

5 participants