-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Azure Event hub | One partition suddenly stops receiving messages #15164
Comments
code.zip |
We recently released newer versions of the Event Hubs libraries that contain a fix for this issue. Could you please try updating the version and see if you still have this issue? azure-messaging-eventhubs - This issue is related to #13785 |
Thanks! Srnagar. We will try this version SDK. |
Hello @srnagar, before migrating to new version can you get a confirmation on if the issue I would request a confirmation on the subject from relevant product team. So that we can move ahead with confidence and |
@yuhaii we had similar issues on our setup and for now it seems resolved. So the update worked for us. |
Got it. Thanks for your confirmation. Vinceve |
Thanks for the confirmation @vinceve! Closing this issue. |
@srnagar this night it stopped working for us. I guess the bug is still persistent. The blue line are incoming messages. And the orange line is outgoing after a reboot. I will send you the logs. |
@vinceve, any updates on that? The same issue just occurred in one of my consumers. We are using version 5.2.0. |
Happy new year, @srnagar. This issue reproduced again on 5.2.0. We observed that the checkpointing of the partition was stuck for couple of days and it was reset by us manually. Please find the attached screenshot of the metric. When we use old SDK, we can break the lease of that checkpoint file to mitigate the issue. But in new SDK, the checkpoint file already been un-released. We have to restart the application. This is our production application, is there any good workaround if you can't fix this issue immediately? We don't want to restart the production application each time when such issue happens. Could you please help double checking this issue? Thanks in advance. |
@yuhaii as discussed offline, please use version 5.3.1 as it contains a fix for this issue. |
understand, let us try v5.3.1. thanks for your confirmation, @srnagar! |
Hello @srnagar , good day. Our customer reported that they did load testing with the latest event hub sdk version, still we are facing checkpoint related issue. compile group: 'com.azure', name: 'azure-messaging-eventhubs', version: '5.4.0' The checkpoint and ownership blobs not getting updated. PFB details for reference: Could you please help double us double check this issue? Thank you. |
@yuhaii could you please share logs when this issue happened? This is not the case when partitions stopped receiving events. In this case, the ownership is not updated which requires logs for further investigation. |
we started seeing same issue with java SDK (azure-eventhubs-eph v2.1.0), is this has been addressed in eph library too? @srnagar |
Describe the bug
com.azure azure-messaging-eventhubs 5.1.1 com.azure azure-messaging-eventhubs-checkpointstore-blob 1.1.1We use the below SDK to receiving message from event hub.
But one partition #3 suddenly stop receiving messages at 9/11 1:22 UTC. We can see its checkpoint didn't update.
The outgoing message would drop accordingly.
It recovered at 9/11 5:02 UTC. We can see the #3 partition checkpoint recover update at this time.
We checked the sending messages and confirmed that there were message continue sending to event hub partition #3 from 9/11 1:22 to 5:02 UTC. But we checked the log in customer code and confirmed that the partition #3 receive call back function processContext didn't been called at this time range.
__public EventProcessorClient eventProcessorClientBuilder(
@Autowired CheckpointStore checkpointStore,
@Autowired EventHubRecordProcessor eventHubRecordProcessor) {
}__
_public void processContext(EventBatchContext eventContext) {
}_
We tried to update SDK to following latest beta vesion. But issue still exits.
https://mvnrepository.com/artifact/com.azure/azure-messaging-eventhubs/5.2.0-beta.2
https://mvnrepository.com/artifact/com.azure/azure-messaging-eventhubs-checkpointstore-blob/1.2.0-beta.2
Exception or Stack Trace
No exception. When message sending to hub. The receiver callback processContext didn't been called for that specified partition #3. The issue partition number is random. According to latest reproduce, it is on partition #3
To Reproduce
Steps to reproduce the behavior: please run attached code for 2-3 days, it will reproduce.
Code Snippet
I attached the code snippet for reference.
Expected behavior
The call back should be called normally for all partitions
Screenshots
see those screenshot in description
Setup (please complete the following information):
The text was updated successfully, but these errors were encountered: