tighten up default backpressure settings #1409

stepashka · 2022-03-25T11:20:32Z

it makes sense to configure backpressure in seconds, instead of bytes
configuration in seconds would untie the backpressure mechanics from the WAL redo speed
this relates to #1343 and to #1361

stepashka · 2022-03-25T11:20:51Z

WDYT, @knizhnik , @petuhovskiy , @arssher

knizhnik · 2022-03-25T12:20:20Z

We can not calculate precise time lag: we can only estimate it.
What we know is current LSN position at compute and current LSN position at page safekeep/server/S3. So we can measure lag between them. Backpressure allows to limit this lag.
Now it is done in very rude way: backend is just blocked until backpressure limit is satisfied.
I thunk that in future we will need more flexible policy.

Backpressure is used for two purposes:

Reduce query latency caused by wait_at_lsn. SO we intersted that this latency doesn;t exceed some reasonable value (i.e. 1 second). Average speed of applying WAL is abut 10Mb/sec. So it leads to max_write lag to be about 1Mb. Certainly real speed depends on many conditions (and not only on type of WAL record) so can vary several times. This ss why specifying lag in seconds rather than bytes seemsto be bad idea.
Avoid storage overflow caused by keeping WAL at compute node and safekeepers. Here time criteria is absolutely not applicable.

hlinnaka · 2022-03-25T12:52:21Z

Good idea, if we can pull it off. How easily and how reliably can we estimate the time lag?

stepashka · 2022-04-04T11:51:49Z

@kelvich , adding you in a conversation here, WDYT?

We can not calculate precise time lag: we can only estimate it.
How easily and how reliably can we estimate the time lag?

knizhnik · 2022-04-04T12:17:29Z

IMHO it is wrong direction.
Let speak about write_replication_lag, which seems to be the most critical.
It cause wait_for_lsn delays until pageserver receives correspondent LSN.
So whats happen on this way of WAL record from compute node to page server?

Record in inserted in WAL write buffers
Record is written and flushed to the disk by some backend (postgres may combine several transactions and flush their changes using one fsync)
walsender is notified
walsendwer reads record from WAL file
walsender broadcasts part of WAL to safekeepers
safekeepers receive message from wal sender and write them to the disk
safekeeper sends wal records to pageserver
pageserver receives wal records and parse then
pageserver puts received WAL records in inmem layers (actually them are appended to ephemeral file)
If checkpoint distance is reached, than in-mem layer is frozen and is written to the disk by separate thread.
After processing of all received WAL records, pageserver adjust last_written_lsn and notify threads waiting for LSN.

So, as you can see - there are a lot of steps and time of execution of each steps depends on many factors, i.e. network and system load. It is almost impossible to somehow "theoretically" predict this time. We can only to to measure average speed and so make some prediction of how fast N bytes of write lag will reach pageserver and be processed by it.
But, as far as it mostly depends on system load and background activity of pageserer (compaction, GC...) such estimation actually has very little sense. It is like most reliable weather forecast (use current temperature).

IMHO it is better to concentrate on providing some kind of "adaptive" behavior: if write lag increased, we may try to reduce pageserver background activity (delay or suspend compaction/GC, use larger delays,...). Increased write lag may be a reason to move tenant to another pageserver.

shanyp · 2023-07-19T07:39:12Z

@hlinnaka @knizhnik any followups ?
Is this something that we should first monitor the time_lag and then see if its still relevant ?

jcsp · 2024-03-11T13:04:35Z

Stale.

stepashka added c/storage/pageserver Component: storage: pageserver c/cloud/compute labels Mar 25, 2022

stepashka added this to the 0.6 Towards Tech Prev milestone Mar 25, 2022

stepashka modified the milestones: 0.7 Towards Tech Prev, 1.0 Technical preview May 6, 2022

stepashka modified the milestones: 1.0 Technical preview, 2022/06 Jun 17, 2022

kelvich mentioned this issue Jul 5, 2022

Epic: resolve the pageserver backpressure problems #2028

Closed

15 tasks

jcsp closed this as completed Mar 11, 2024

stepashka added the c/compute Component: compute, excluding postgres itself label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tighten up default backpressure settings #1409

tighten up default backpressure settings #1409

stepashka commented Mar 25, 2022

stepashka commented Mar 25, 2022

knizhnik commented Mar 25, 2022

hlinnaka commented Mar 25, 2022

stepashka commented Apr 4, 2022

knizhnik commented Apr 4, 2022

shanyp commented Jul 19, 2023

jcsp commented Mar 11, 2024

tighten up default backpressure settings #1409

tighten up default backpressure settings #1409

Comments

stepashka commented Mar 25, 2022

stepashka commented Mar 25, 2022

knizhnik commented Mar 25, 2022

hlinnaka commented Mar 25, 2022

stepashka commented Apr 4, 2022

knizhnik commented Apr 4, 2022

shanyp commented Jul 19, 2023

jcsp commented Mar 11, 2024