Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #79

Open
Barre opened this issue Jan 21, 2025 · 7 comments
Open

Segmentation fault #79

Barre opened this issue Jan 21, 2025 · 7 comments

Comments

@Barre
Copy link

Barre commented Jan 21, 2025

  1. I Initiated a process performing bulk deletes on public.table
  2. Interrupted/stopped the delete process by sigkilling my process
  3. Immediately executed squeeze operation via psql:

database=# SELECT squeeze.squeeze_table('public', 'table');

Postgresql is version 17 and pg_squeeze is installed though apt install postgresql-17-squeeze from official postgres repos.

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> 
\q

Postgres logs:

2025-01-21 22:06:24.049 CET [3085609] user@database LOG:  could not send data to client: Broken pipe
2025-01-21 22:06:24.049 CET [3085609] user@database FATAL:  connection to client lost
2025-01-21 22:06:24.162 CET [3091451] DETAIL:  Waiting for transactions (approximately 28) older than 1223432421 to end.
2025-01-21 22:06:24.182 CET [3091451] DETAIL:  There are no old transactions anymore.
2025-01-21 22:06:24.197 CET [3091451] LOG:  starting logical decoding for slot "pg_squeeze_slot_24577_3091451"
2025-01-21 22:06:24.197 CET [3091451] DETAIL:  Streaming transactions committing after 2458/E56E8EA8, reading WAL from 2458/C5619990.
2025-01-21 22:06:24.200 CET [1041796] LOG:  background worker "squeeze worker" (PID 3091451) was terminated by signal 11: Segmentation fault
2025-01-21 22:06:24.200 CET [1041796] LOG:  terminating any other active server processes
@Barre
Copy link
Author

Barre commented Jan 22, 2025

Right after server recovery, I was able to properly squeeze that table using the same command.

database=# SELECT squeeze.squeeze_table('public', 'table');

@ahouska
Copy link
Contributor

ahouska commented Jan 22, 2025

Do you happen to have a core dump? Backtrace of that segfault might be useful.

@Barre
Copy link
Author

Barre commented Jan 22, 2025

Doesn't look like postgres was configured to core dump, unfortunately. I will configure postgres to do it and try to reproduce.

@Barre
Copy link
Author

Barre commented Jan 22, 2025

Some extra logs that may be useful:

[Tue Jan 21 22:06:29 2025] postgres[3091451]: segfault at 63e52675cdcc ip 00007c6cea9a1486 sp 00007fff5cb13668 error 4 in libc.so.6[7c6cea828000+188000] likely on CPU 92 (core 42, socket 0)
[Tue Jan 21 22:06:29 2025] Code: 00 00 00 48 3b 15 ca 8d 06 00 0f 87 f4 01 00 00 4c 8d 04 11 49 31 c8 49 c1 e8 3f 81 e1 00 0f 00 00 44 01 c1 0f 84 af 00 00 00 <62> e1 fe 48 6f 6c 16 ff 62 e1 fe 48 6f 74 16 fe 48 89 f9 48 83 cf

@ahouska
Copy link
Contributor

ahouska commented Jan 22, 2025

It could perhaps help if we knew at which address the pg_squeeze.so library was loaded at the time of the crash, which we don't (Maybe /proc/<the worker PID>/maps of an existing PG process would help, but I'm not sure if the addresses are the same after restart.) And even if we knew which function triggered the segfault, we would miss the context (i.e. calling functions and values of variables.)

Can you please try to arrange the server for core dumps? (https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD)

If not, I need a test case that crashes the server after a "reasonable" number of tries. I tried what you described above several times, but with no luck.

@Barre
Copy link
Author

Barre commented Jan 23, 2025

So, I did the required configuration, but I was unable to reproduce the crash, that's going to be a hard one.

@ahouska
Copy link
Contributor

ahouska commented Jan 23, 2025

Thanks. I'm trying to enhance my test application to increase the chance that I can reproduce the problem myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants