Skip to content

Commit

Permalink
Merge pull request #154 from supabase/db-deletes-wal
Browse files Browse the repository at this point in the history
fix: drop replication slot when db deletes wal segment
  • Loading branch information
w3b6x9 authored May 22, 2021
2 parents c293c76 + df39135 commit 48edd9e
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 0 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ A few reasons:
2. Decoupling. For example, if you want to send a new slack message every time someone makes a new purchase you might build that functionality directly into your API. This allows you to decouple your async functionality from your API.
3. This is built with Phoenix, an [extremely scalable Elixir framework](https://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections).

### Does this server guarentee delivery of every data change?

Not yet! Due to the following limitations:

1. Postgres database runs out of disk space due to Write-Ahead Logging (WAL) buildup, which can crash the database and prevent Realtime server from streaming replication and broadcasting changes.
2. Realtime server can crash due to a larger replication lag than available memory, forcing the creation of a new replication slot and resetting streaming replication to read from the latest WAL data.
3. When Realtime server falls too far behind for any reason, for example disconnecting from database as WAL continues to build up, then database can delete WAL segments the server still needs to read from, for example after reconnecting.

## Quick start

Expand Down
43 changes: 43 additions & 0 deletions server/lib/adapters/postgres/epgsql_server.ex
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,49 @@ defmodule Realtime.Adapters.Postgres.EpgsqlServer do
{:stop, msg, state}
end

@doc """
Removes the existing replication slot when epgsql replication process crashes due to
database deleting WAL segment when Realtime server has fallen too far behind.
## Example process exit message
{:EXIT, #PID<0.2324.0>,
{:error,
{:error, :error, "58P01", :undefined_file,
"requested WAL segment 00000001000000000000007F has already been removed",
[file: "walsender.c", line: "2447", routine: "XLogRead", severity: "ERROR"]}}}
"""
@impl true
def handle_info(
{:EXIT, _pid,
{:error,
{:error, :error, "58P01", :undefined_file, error_msg,
[file: "walsender.c", line: _line, routine: "XLogRead", severity: "ERROR"]}}} = msg,
%{
replication_epgsql_pid: replication_epgsql_pid,
select_epgsql_pid: select_epgsql_pid
} = state
)
when is_binary(error_msg) do
:ok = :epgsql.close(replication_epgsql_pid)

stop_msg =
case String.split(error_msg) do
["requested", "WAL", "segment", _, "has", "already", "been", "removed"] ->
:ok = maybe_drop_replication_slot(state)
{:error, {error_msg, :replication_slot_dropped}}

_ ->
msg
end

:ok = :epgsql.close(select_epgsql_pid)

{:stop, stop_msg, state}
end

@impl true
def handle_info(
msg,
Expand Down

0 comments on commit 48edd9e

Please sign in to comment.