Interprocess alerts in transit grows. Publishing hangs. #578

gazugafan · 2020-07-06T22:16:42Z

Running into some strange behavior with a new server setup. Everything seemed fine at first, but it seems that sometimes trying to publish a message hangs, and then interprocess alerts in transit keeps growing. Once this happens, it becomes impossible to publish to any channel. For example...

sudo systemctl restart nginx
curl --request POST --data "testing" http://127.0.0.1:8080/nchan_stub_status
total published messages: 0
stored messages: 0
shared memory used: 20K
shared memory limit: 1048576K
channels: 3
subscribers: 8
redis pending commands: 0
redis connected servers: 0
total interprocess alerts received: 14
interprocess alerts in transit: 3
interprocess queued alerts: 3
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.2.7
curl --request POST --data "testing" http://127.0.0.1:8080/pub/test
***no response here. Need to CTRL+C

curl --request POST --data "testing" http://127.0.0.1:8080/nchan_stub_status
total published messages: 1
stored messages: 0
shared memory used: 24K
shared memory limit: 1048576K
channels: 3
subscribers: 8
redis pending commands: 0
redis connected servers: 0
total interprocess alerts received: 14
interprocess alerts in transit: 5
interprocess queued alerts: 5
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.2.7

Here's the NCHAN portion of the NGINX config...

nchan_shared_memory_size 1G;
nchan_message_buffer_length 500;

server {
        listen 127.0.0.1:8080;
        location ~ /pub/(.*)$ {
                nchan_publisher;
                nchan_channel_id "$1";
                nchan_channel_id_split_delimiter ",";
        }

        location /nchan_stub_status {
                nchan_stub_status;
        }
}

server {
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/mydomain.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/mydomain.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

    server_name  mydomain.com;
    root /var/www/public;
    index index.php index.html index.htm;

    location ~ /sub/(.*)$ {
        nchan_subscriber;
        #nchan_authorize_request /_nchan_auth;
        nchan_channel_id "$1";
        nchan_channel_id_split_delimiter ",";
        nchan_subscriber_first_message oldest;
    }
}

NGINX Version: nginx/1.17.10 (CentOS)

I think this might have something to do with the fact that we've migrated to a completely new server with a fresh new install of NGINX and NCHAN. The issue seems to only happen after first subscribing to a channel using a ?last_event_id= query parameter and then trying to publish a message on that channel. I suspect we're sending event IDs saved from the OLD server, which do not exist at all in the new server's NCHAN store.

Do you think this could lead to the issue I'm describing? I can't imagine that's really it, as that would mean the whole pub/sub system could be brought down by one bad subscription request. Any thoughts?

The text was updated successfully, but these errors were encountered:

himulawang · 2021-08-17T06:41:08Z

I've got the same issue. Anyone any thoughts?

gazugafan · 2021-08-18T03:00:25Z

I've since migrated from NCHAN to Centrifugo, which hasn't given me any such trouble. It was a mostly painless migration. Pretty close to a drop-in replacement.
https://github.com/centrifugal/centrifugo

gazugafan · 2021-08-18T03:02:25Z

... it was missing one feature, which I've decided to live without for now. Looks like they're implementing it, though!
centrifugal/centrifugo#446

tpneumat · 2022-01-12T18:34:00Z

Same issue. We pass a last_event_id that does not exist because we want to start BEFORE our first known message ID so we can get all the data from start. If you pass the first actual last_event_id, it skips over that. but doing this causes channels to eventually lockup, while some other channels keep going. There is also a warning by doing this " Missed message for websocket subscriber". Please fix

tpneumat mentioned this issue Feb 23, 2022

Worker crash 'spool->msg_status == MSG_INVALID' #534

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interprocess alerts in transit grows. Publishing hangs. #578

Interprocess alerts in transit grows. Publishing hangs. #578

gazugafan commented Jul 6, 2020

himulawang commented Aug 17, 2021

gazugafan commented Aug 18, 2021

gazugafan commented Aug 18, 2021

tpneumat commented Jan 12, 2022

Interprocess alerts in transit grows. Publishing hangs. #578

Interprocess alerts in transit grows. Publishing hangs. #578

Comments

gazugafan commented Jul 6, 2020

himulawang commented Aug 17, 2021

gazugafan commented Aug 18, 2021

gazugafan commented Aug 18, 2021

tpneumat commented Jan 12, 2022