-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentry is missing some of cron check-ins when used with sentry-python #2617
Comments
Thanks for creating the issue @IevgeniiB ! |
We have plans to have propper support for Airflow for our Crons product soon. The goal is that you do not need to create checkins by hand but the Airflow integration will handle this. |
@antonpirker This is great news! I noticed issues like that outside of airflow too, do you think it may be related to airflow? |
I am not sure if it is related to Airflow. The tasks finish without errors, just the check-ins are not sent? Could you turn on |
Anton, thank you for the suggestion. I've tried it and saw the following:
The logs before disabling the integrations, init checkin:
The logs before disabling the integrations, success checkin:
The logs after disabling the integrations, init checkin:
The logs after disabling the integrations, success checkin:
|
This looks all good. So the init checkin (setting status |
Everything is correct:
Is there something else that I should try? Maybe use http api instead of sentry-python? |
Hey @IevgeniiB thanks for all the info, I asked our server side guys if they have an idea what can cause this. |
Hi @antonpirker, is it possible that this is related to the fix in #2598? I don't have enough knowledge about Sentry to say so myself. But I noticed that apparently there hasn't been a release that contains this fix yet. |
In our case I would expect more or less 100% green everywhere - I could see a few tasks being missed in cases where they are scheduled to run exactly when we deploy a new revision of the app or something, but mostly all the tasks seem to be running as they should, and cron monitoring is just mistaken in reporting errors. I created a debug task today and deployed it do our dev environment only. It runs once per minute, sleeps for 10 seconds during execution then returns. I deployed it with debug=True on sentry-sdk 1.40 first and we had three misses that you can see in the screenshot. Then I deployed the bump to 1.41 and the socket options, and it seems maybe to have improved, but we did get another miss later. Each miss seems to be accompanied by an exception during the send_envelope request. I notice there is some NewRelic instrumentation that is affecting the http requests, will have to dig a little to see if that could have anything to do with this.
|
Update: It does seem much more stable on the dev environment after the 1.41 upgrade with the socket options, so I will be making that upgrade to our staging and prod envs as well. The only task that had errors in that environment post-bump was the debug task that I added, so if not for that I would have read this as the problem being 100% solved. I guess the error happens much less with these connection options and the high frequency of the debug task is what reveals it, but it would eventually happen to other tasks as well. |
Thanks @mathiasose. I see two follow-ups for us here: making it easier to enable the alternative connection options (just having a single option, something like |
Now that we've had the new settings applied for a few days I wanted to confirm here that things seem to work much better now 🙏 There's been a couple hiccups that I still don't completely understand (especially the longish period of red in dev and production 6-ish days ago), but as long as this is pretty rare still then Crons is a much more helpful tool for us to view now 👍 These errors might be legitimate application errors that we need to investigate, and now they're not drowning in false alerts. The last row in the screenshot for example was red because of a database configuration issue and Crons reported the failure accurately, which pointed us towards the issue and we made a fix and got the monitor back to green. |
Awesome @mathiasose, thanks for following up. Starting with 1.43.0 you can swap the If you find out more about the remaining hiccups please let us know if they also look like SDK/network issues. |
getsentry/sentry-python#2617 (comment) Hoping enabling this will help cron monitors have fewer time-outs
The last couple of days we also experienced an increase rate of missed cron check-ins. |
Hi @timothegenzmer, please use A retry likely would not help here; the server typically immediately responds with status 200 to any requests from the Sentry SDK, processing is done later in the event ingestion pipeline, so errors would only be discovered after the server has issued the 200 response to the SDK. There is not really any way for the SDK to know whether it needs to retry a request. |
Based on @sentrivana's previous comment, it seems we should have already closed this issue earlier. We have introduced the @gaprl, have you checked whether there is anything you can do on the server-side to improve the user experience here? |
I don't get the response to be honest. We are experiencing the same TCP error that @mathiasose has posted. Which the This is a TCP error that can be caught on the client and a new TCP connection can be established for a retry, or am I missing something? |
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.1
Steps to Reproduce
I'm using this task to send check-ins from airflow:
I use this task in my airflow dag, it's set up to run on every execution: before the main logic, and after. It runs on every execution.
I used to have celery and django integrations included because I needed them in the past. Removing them improved the results.
Expected Result
All check-ins to sentry are visible in the cron tab in sentry UI. There are no missed checkins when the tasks are working as expected.
The job runs every 10 minutes, I expect evenly spaced successful check-ins.
Actual Result
Both initial check-ins and completion check-ins may be missing from time to time.
With celery and django integrations the history of check-ins looks like this:

Without celery and django integrations:

The text was updated successfully, but these errors were encountered: