Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dumping corrupt WAL segments #145

Merged
merged 2 commits into from
Apr 1, 2022
Merged

Conversation

antons-antons
Copy link

Add ability to dump WAL segment with corrupt page headers and recrods
skips over missing/broken page headers
skips over misformatted log recrods
allows dumping log record from a particular file (without a need of
carefully crafted input) starting from an optional offset

specifically allows dumping safekeeper log file like this:

pg_waldump -i -F ~/workspace/zenith02/zenith/.zenith/safekeepers/single/b27555b7b65c060781acb33cf5d2050e/e871b666c23e437c49d2a60ebeee3b7b/000000010000000000000001.partial

or

pg_waldump -i -F test.wal

file doesn't need to be complete (but then it will produce some bogus error at the end):

rmgr: XLOG        len (rec/tot):     54/    54, tx:          0, offset: 0x63CFF8, prev 0/0163CF80, desc: PARAMETER_CHANGE max_connections=100 max_worker_processes=8 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=replica wal_log_hints=on track_commit_timestamp=off
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, offset: 0x63D030, prev 0/0163CFF8, desc: RUNNING_XACTS nextXid 1024 latestCompletedXid 1023 oldestRunningXid 1024
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, offset: 0x63D068, prev 0/0163D030, desc: RUNNING_XACTS nextXid 1024 latestCompletedXid 1023 oldestRunningXid 1024
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, offset: 0x63D0A0, prev 0/0163D068, desc: CHECKPOINT_ONLINE redo 0/163D068; tli 1; prev tli 1; fpw true; xid 0:1024; oid 12975; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1024; online
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, offset: 0x63D118, prev 0/0163D0A0, desc: RUNNING_XACTS nextXid 1024 latestCompletedXid 1023 oldestRunningXid 1024
pg_waldump: fatal: error in WAL record at 0/63D118: invalid record length at 0/FFFFF8: wanted 24, got 0

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
Copy link

@knizhnik knizhnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need to change pg_waldump to be able to dump first WAL segment at compute node? It seems to me that explicit specification of first valid record (and it is address of checkpoint record which is printed in pageserver.log and pg.log) also works.

Also vanilla pg_waldump supports skipping WAL from specified position til first valid record:

pg_waldump -s 0/016BC330 000000010000000000000001
first record is after 0/16BC330, at 0/16BC340, skipping over 16 bytes
rmgr: Transaction len (rec/tot):     34/    34, tx:       1025, lsn: 0/016BC340, prev 0/016BC300, desc: COMMIT 2022-03-22 12:09:59.594747 MSK

But if we really want to change pg_wal dump utility, then I think we should also add handling of files with *.partial suffix (added to the current segment by pageserver). Because right now files with such prefix are not recognized by pg_waldump even if file is explicitly specified) and it is necessary to rename file file.

@@ -729,7 +786,10 @@ usage(void)
printf(_(" -b, --bkp-details output detailed information about backup blocks\n"));
printf(_(" -e, --end=RECPTR stop reading at WAL location RECPTR\n"));
printf(_(" -f, --follow keep retrying after reaching end of WAL\n"));
printf(_(" -F, --file=FNAME dump log records from a single file\n"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need --file option while it is possible to specify log file with for standar pg_waldump:

pg_waldump  000000010000000000000001 

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're implications on the file name (something this PR is addressing), try running POPS pg_waldump on 000000010000000000000001.partial

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But may be just restrict log file name checking in case of presence of --skip options?

printf(_(" -n, --limit=N number of records to display\n"));
printf(_(" -o, --offset=OFFSET offset of the first record to in a file to dump\n"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need --offser option if it is possible to specify startig point using -s option:

pg_waldump -s 0/016BC340 000000010000000000000001

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-s sets an LSN, LSN is an offset within whole WAL, while offset is specific to a file.
changing code to allow -s to serve dual purpose makes the code more confusing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it seems to be quite easy to map relative offset to global LSN and visa versa. If it is somehow needed...
How do you know offset of valid record? And LSN of valid record you can know from logs.

@@ -279,6 +279,14 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
bool gotheader;
int readOff;

#define SKIP_INVALID_RECORD(rec_ptr) do { \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if rec_ptr should be aligned here.
Is there any warranty that it is maxaligned (8-bytes aligned) in all places where SKIP_INVALID_RECORD is called?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's sort of a defacto guarantee, with Page Header (both short and log) multiples of 8 and log records are maxaligned explicitly. Is it possible that log record in a corrupt or alternatively formatted file are not aligned, so I'm a little on the fence on how I want to advance. but records from normally running system are always maxaligned

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each wal record is 8-bytes aligned. But if parsing of record is terminated because of error somewhere in the middle, there is no warranty that RecPtr is aligned on 8.
I have not checked the code precisely, so I am not sure if such scenario is really possible.
But if for some reasons RecPtr is really not aligned, than going to skip_invalid: label can cause unaligned access fault.

@antons-antons
Copy link
Author

I wonder why do we need to change pg_waldump to be able to dump first WAL segment at compute node? It seems to me that explicit specification of first valid record (and it is address of checkpoint record which is printed in pageserver.log and pg.log) also works.

requires knowing where does first log record start (which is not necessary hard, just adds an unnecessary extra step) and valid segment and page headers (much more involved step as page header includes "position" which depends on the file name)

Also vanilla pg_waldump supports skipping WAL from specified position til first valid record:

pg_waldump -s 0/016BC330 000000010000000000000001
first record is after 0/16BC330, at 0/16BC340, skipping over 16 bytes
rmgr: Transaction len (rec/tot):     34/    34, tx:       1025, lsn: 0/016BC340, prev 0/016BC300, desc: COMMIT 2022-03-22 12:09:59.594747 MSK
  1. POPS version requires segment and page headers to be present and valid in the file (not true for Safekeeper's initial segment);
  2. filename is parsed to guess segment number
  3. pg_waldump has hard time if data is mangled

But if we really want to change pg_wal dump utility, then I think we should also add handling of files with *.partial suffix (added to the current segment by pageserver). Because right now files with such prefix are not recognized by pg_waldump even if file is explicitly specified) and it is necessary to rename file file.

I had a change to allow .partial files (still requires most of the changes from this PR) I removed it as 'follow' option makes the code pretty ugly

@antons-antons antons-antons requested a review from knizhnik March 22, 2022 16:07
@knizhnik
Copy link

knizhnik commented Mar 22, 2022

Sorry, what do you maen by "POPS" version?
Do you mean that we already have changed something in xlogreader?
Because (as you can see above), if I specify starting point explicitly, pg_waldump is not checking page header.

In any case.
I am not against this patch.
It can be really useful.
Just wonder why do you prefer to add extra options if it seems to be not necessary.
Also I do not think that it is really important to be able to specify what kind of errors we want to ignore.
There is single -i (ignore_forma_errors) option. Why then we need three skip_* flags?

Concerning proposing this patch to community, I afraid that it will be difficult to explain why somebody needs to analyze corrupted WAL segment.May be I am wrong, but the problem we are solving with not completely filled WAL segment is very zenith specific.

@hlinnaka
Copy link
Contributor

First impression: This is a pretty invasive patch. I hoped we could come up with something with a much smaller diff footprint :-(.

How do you use this? I tried this, but didn't work:

$ ./tmp_install/bin/pg_waldump  .zenith/pgdatadirs/tenants/4aaf8293b7ff871767b9f10af45b6bc8/main/pg_wal/000000010000000000000001 
pg_waldump: fatal: could not find a valid record after 0/1000000
$ ./tmp_install/bin/pg_waldump -F .zenith/pgdatadirs/tenants/4aaf8293b7ff871767b9f10af45b6bc8/main/pg_wal/000000010000000000000001 
pg_waldump: fatal: error in WAL record at 0/0: invalid record length at 0/28: wanted 24, got 0

@hlinnaka
Copy link
Contributor

How about something like this: https://github.com/zenithdb/postgres/tree/heikki-pg_waldump ? Much smaller diff footprint

@antons-antons
Copy link
Author

First impression: This is a pretty invasive patch. I hoped we could come up with something with a much smaller diff footprint :-(.

How do you use this? I tried this, but didn't work:

$ ./tmp_install/bin/pg_waldump  .zenith/pgdatadirs/tenants/4aaf8293b7ff871767b9f10af45b6bc8/main/pg_wal/000000010000000000000001 
pg_waldump: fatal: could not find a valid record after 0/1000000
$ ./tmp_install/bin/pg_waldump -F .zenith/pgdatadirs/tenants/4aaf8293b7ff871767b9f10af45b6bc8/main/pg_wal/000000010000000000000001 
pg_waldump: fatal: error in WAL record at 0/0: invalid record length at 0/28: wanted 24, got 0

yeah, the patch had to be invasive in order to properly integrate with Postgres way of reading WAL.

regarding parameters, -i was specifically added to handle files with format errors (like missing pages headers and misformatted log records) and is independent from -F parameter.

try $ ./tmp_install/bin/pg_waldump -i -F .zenith/pgdatadirs/tenants/4aaf8293b7ff871767b9f10af45b6bc8/main/pg_wal/000000010000000000000001

@antons-antons
Copy link
Author

How about something like this: https://github.com/zenithdb/postgres/tree/heikki-pg_waldump ? Much smaller diff footprint

Less intrusive but only covers some functionality intended:

  1. doesn't handle ".partial" files (dumping latest segment on Safekeeper)
  2. skips log records before the first page (from init db we do not get page aligned records)
  3. covers only scenarios where we need to skip 0 padding, doesn't allow skipping corrupt log records or headers
  4. doesn't correctly setup the reader (e.g. the code is not sufficient to dump the log records
./tmp_install/build/src/bin/pg_waldump/pg_waldump  .zenith/safekeepers/single/**/000000010000000000000001
pg_waldump: fatal: WAL segment size must be a power of two between 1 MB and 1 GB, but the WAL file "000000010000000000000001" header specifies 0 bytes  

@@ -279,6 +279,14 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
bool gotheader;
int readOff;

#define SKIP_INVALID_RECORD(rec_ptr) do { \
rec_ptr += MAXALIGN(1); \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we first add 1, and then MAXALIGN here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de facto it will be equivalent, but I agree that would make the code a little more robust

@arssher
Copy link

arssher commented Mar 28, 2022

I agree it is more invasive than what we really need. It allows to dump whatever valid records contained -- whether they are after zero space hole, corrupted ones and so on, trying each single (maxaligned) byte to detect the valid record. We need only to skip zeros in the beginning + probably adjust for the initial header. But since the code is already there... well. But let's explicitly document what it does.

I also don't see value in separate --file and --offset options. They confusingly conflict with existing ones (specifying segment name as positional argument -- yes, we should teach to accept .partial suffix -- for the former and -s for the latter). If I specify both -s and --offset, what offset is used? And really you more often now LSN than the offset.

It would be nice to have such resilient waldump of course.

@antons-antons
Copy link
Author

antons-antons commented Mar 28, 2022

I also don't see value in separate --file and --offset options. They confusingly conflict with existing ones
(specifying segment name as positional argument -- yes, we should teach to accept .partial suffix -- for the former and -s for the latter).

supporting .partial makes pg_waldump code even uglier (check https://github.com/zenithdb/postgres/blob/main/src/bin/pg_waldump/pg_waldump.c#L299) and would require code invasive changes to filename handling (partially https://github.com/zenithdb/postgres/blob/main/src/include/access/xlog_internal.h#L165)

If I specify both -s and --offset, what offset is used?

The code will error out at parameters validation. start is for WAL stream and offset only meaningful within a file (in addition to the difference in format)

And really you more often now LSN than the offset.

with safekeeper files, currently our lowest LSN (after init db) is 0/01696620 which lands at 0x0696620 in segment 000000010000000000000001;
do you want to rely a debugging workflow on file name parsing?
For limited case that we have today, we may claim LSN and offset are the same, but this change is solving beyond current needs (as I suspect we will be debugging corrupt transaction log in the future)

This change is invasive (and I'm not crazy happy about that either) mostly due to the peculiar design of WAL in Postgres (and the fact that pg_waldump is not an integral part of the infrastructure)

(edit: GitHub ate part of the response)

@antons-antons antons-antons merged commit a260728 into main Apr 1, 2022
@antons-antons antons-antons deleted the antons_pg_waldump branch April 1, 2022 19:44
MMeent pushed a commit that referenced this pull request Jul 7, 2022
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
MMeent pushed a commit that referenced this pull request Aug 18, 2022
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
lubennikovaav pushed a commit that referenced this pull request Nov 21, 2022
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
MMeent pushed a commit that referenced this pull request Feb 10, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
MMeent pushed a commit that referenced this pull request Feb 10, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
MMeent pushed a commit that referenced this pull request May 11, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Aug 10, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request Feb 6, 2024
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
tristan957 pushed a commit that referenced this pull request May 10, 2024
* Enable dumping corrupt WAL segments

 Add ability to dump WAL segment with corrupt page headers and recrods
 skips over missing/broken page headers
 skips over misformatted log recrods
 allows dumping log record from a particular file starting from an
optional offset
 (without a need of carefully crafted input)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants