Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interprocess alerts in transit grows. Publishing hangs. #578

Open
gazugafan opened this issue Jul 6, 2020 · 4 comments
Open

Interprocess alerts in transit grows. Publishing hangs. #578

gazugafan opened this issue Jul 6, 2020 · 4 comments

Comments

@gazugafan
Copy link

Running into some strange behavior with a new server setup. Everything seemed fine at first, but it seems that sometimes trying to publish a message hangs, and then interprocess alerts in transit keeps growing. Once this happens, it becomes impossible to publish to any channel. For example...

sudo systemctl restart nginx
curl --request POST --data "testing" http://127.0.0.1:8080/nchan_stub_status
total published messages: 0
stored messages: 0
shared memory used: 20K
shared memory limit: 1048576K
channels: 3
subscribers: 8
redis pending commands: 0
redis connected servers: 0
total interprocess alerts received: 14
interprocess alerts in transit: 3
interprocess queued alerts: 3
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.2.7
curl --request POST --data "testing" http://127.0.0.1:8080/pub/test
***no response here. Need to CTRL+C

curl --request POST --data "testing" http://127.0.0.1:8080/nchan_stub_status
total published messages: 1
stored messages: 0
shared memory used: 24K
shared memory limit: 1048576K
channels: 3
subscribers: 8
redis pending commands: 0
redis connected servers: 0
total interprocess alerts received: 14
interprocess alerts in transit: 5
interprocess queued alerts: 5
total interprocess send delay: 0
total interprocess receive delay: 0
nchan version: 1.2.7

Here's the NCHAN portion of the NGINX config...

nchan_shared_memory_size 1G;
nchan_message_buffer_length 500;

server {
        listen 127.0.0.1:8080;
        location ~ /pub/(.*)$ {
                nchan_publisher;
                nchan_channel_id "$1";
                nchan_channel_id_split_delimiter ",";
        }

        location /nchan_stub_status {
                nchan_stub_status;
        }
}

server {
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/mydomain.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/mydomain.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

    server_name  mydomain.com;
    root /var/www/public;
    index index.php index.html index.htm;

    location ~ /sub/(.*)$ {
        nchan_subscriber;
        #nchan_authorize_request /_nchan_auth;
        nchan_channel_id "$1";
        nchan_channel_id_split_delimiter ",";
        nchan_subscriber_first_message oldest;
    }
}

NGINX Version: nginx/1.17.10 (CentOS)

I think this might have something to do with the fact that we've migrated to a completely new server with a fresh new install of NGINX and NCHAN. The issue seems to only happen after first subscribing to a channel using a ?last_event_id= query parameter and then trying to publish a message on that channel. I suspect we're sending event IDs saved from the OLD server, which do not exist at all in the new server's NCHAN store.

Do you think this could lead to the issue I'm describing? I can't imagine that's really it, as that would mean the whole pub/sub system could be brought down by one bad subscription request. Any thoughts?

@himulawang
Copy link

I've got the same issue. Anyone any thoughts?

@gazugafan
Copy link
Author

I've since migrated from NCHAN to Centrifugo, which hasn't given me any such trouble. It was a mostly painless migration. Pretty close to a drop-in replacement.
https://github.com/centrifugal/centrifugo

@gazugafan
Copy link
Author

... it was missing one feature, which I've decided to live without for now. Looks like they're implementing it, though!
centrifugal/centrifugo#446

@tpneumat
Copy link

Same issue. We pass a last_event_id that does not exist because we want to start BEFORE our first known message ID so we can get all the data from start. If you pass the first actual last_event_id, it skips over that. but doing this causes channels to eventually lockup, while some other channels keep going. There is also a warning by doing this " Missed message for websocket subscriber". Please fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants