-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JetStream is discarding messages it likely shouldn't #5638
Comments
We'll deploy the latest nightly build and see how it behaves. I'll report back ASAP. |
Hi all. |
@chrisdelachal can you share the |
Two awesome colleagues of mine monitored the situation today. So as it seems, the behavior has changed a little bit with We still saw messages being It also looks like this happens suspiciously often when pull consumers hit their pull timeout (at least the timestamps seemed to indicate that the |
FIY, if it can help you, as it was a major issue for us in production, we have downgraded to release 2.10.9 and hopefully the issue is temporary fixed. |
Thanks :) Thankfully the only work queue we have (at the moment) is backed by another database (don't ask, please 🙄 ) so we can mitigate by having a cron job re-schedule suspiciously old work items again every few minutes. |
I have the same or similar issue. I also tried the latest RC3 candidate. In essence, 1 stream, many consumers with non wildcard subjects. When I push messages at a rate of about 10k, I see dropped messages. I tried several configurations: file vs memory, single replica vs 3 replicas, but non changed the outcome. The machine is a google n2 one and I see about 500-700micro cores, so it's busy but not exhausted. I can not repro this myself on my Mac, only within kubernetes. Is there any information I can further provide that might help to reproduce? |
@plucked But are we sure in your case that there is no limit in max number of redeliveries? Because even if a message is not explicitly ack'ed, it if reaches the max number of redeliveries it will get removed. As for providing information to help reproduce: you mentioned that you are not able to reproduce on your Mac, but do you have a test case (some code) that simulate what you are doing? This could help us figure out the chain of events and either be lucky in reproducing or at least check code paths and see if we find something. |
Update from my side, version 2.10.18 fixes it for us. I do not see any messages dropped anymore under load. I something comes up, I will create a ticket or update here if it is not closed. |
For us it's also still happening. We ran 2.10.18 for a few days and still saw terminated messages on our stream:
|
This one means the limit (max age, size etc) was hit for a message that was not yet processed by your app so I believe this is different. |
@ripienaar as you can see in the stream info posted in the bug report, the stream doesn't have a retry limit. The message is far younger than the 7d limit (only a few minutes) and the size of the stream is not exceeded either. There is no reason for JetStream to discard the message. |
@aksdb @ripienaar @neilalexander I wonder then if this could be related with the opened PR: #5697? |
Yes this should be fixed by #5697 in the next version. |
You would want to try nightly build from main which has the fix. This will be pulled into a 2.10.19 as well. |
Nevermind, I found it, it's under Synadias Docker Hub! https://hub.docker.com/r/synadia/nats-server/tags Unfortunately looks like there isn't a nightly for alpine. |
The first impression with the nightly is good. At least we didn't see a |
We have seen the issue cease on the nightl, the stream messages aren't being cleared anymore!! However, not sure if it's related, but our Deleted count on our Stream continues to rise, then randomly reset (We aren't deleting any messages manually, and all parameters are set extremely high (Max*)) |
So after a week of monitoring, it seems the initial problem is gone with the nightly build. We didn't get any unexpected |
Fixed via #5697 which is part v2.10.19 |
Observed behavior
We have a work queue stream that is being consumed by about 250 pull consumers and which requires ACKs. Recently (past weeks) we noticed, that a small fraction of messages get lost. Watching the ADVISORY subjects, it seems like JetStream drops the messages itself. Example:
The stream doesn't have a
MaxDeliver
set, and as you can see, it was just delivered once. So there should be no reason to drop it if an ACK was still pending. (We also saw messages being dropped with that reason that haddeliveries:0
.) From what I can tell, also limits were not exceeded - since the work queue only has very small messages (basically just headers without payload), the stream limit of 500 MB was certainly not reached. The message in question was also far younger than the maximum age (which is 7d; the message was barely a few minutes old).Even more suspicious: our workers log processing of messages and we were not seeing this message ever hitting our worker. Also we sniffed the $JS-API and the _INBOX, and from what we can tell, the messages in question also never were delivered. We basically only see them being ACK'ed by JetStream as being received and soon after we see them being terminated in the advisory subject.
Here is the stream config of the stream in question:
Just in case, this is how our pull consumers look like:
Expected behavior
The messages should not be dropped.
Server and client version
Server: 2.10.16 and 2.10.17
Client: Java SDK 2.19.1
Host environment
Kubernetes on Azure
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: