-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align timers and timing metrics (ms) across all metrics loggers #39908
Conversation
b567276
to
9a5b95e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's likely very wrong as it is heavily backwards compatible. It will suddenly change almost all dashboards and alerts people built using those metrics.
Metrics are definitely "public" API of Airflow and we cannot change them without breaking compatibility.
100% -- this will break all the operational dashboards for companies, users and service providers |
Yes. But making things consistent with docs is one thing, while breaking existing deployments that baase on "actual" behaviour is another. In this case changing the metrics is a problem and if we want to make documentattion and implementation consistent, we should only update the documentation (even if it means that some metrics will have different and inconsistent behaviour). In this case - preserving current behaviour trumps consistency. I think we could have a new parameter that will say "consistent" metrics (disabled by default) where this might get fixed and we should swap the default value with Airflow 3 (to be consistent by default). Simply metrics is one of those things that is undervalued by most people who develop Airlfow but have the potential of wreakiing huge havoc if we change their behaviour, it's heavily public API when you think of breaking effects it might have and probability it will cause problems if you change metric values effectively by 3 orders of magnitude. |
Yeah, I'm with @potiuk and @kaxil here. We already ran around in circles on this topic in the previous seconds/milliseconds metrics PR just a couple weeks ago. It's a mess, but breaking everyone's dashboards isn't a viable solution IMHO. To add to that, even if we do standardize them, why break two (Datadog and Otel) to match one (StatsD)? If we were going to do this, shouldn't we change the one that doesn't match the other two? This feels like an Airflow3 "big breaking thing" to me. |
9a5b95e
to
6bed534
Compare
I think if we want to implement this consistency, it should be Airlfow 3 feature to have it on by default and possibly we should raise a future deprecation warning informing that it will change in Airflow 3 and that user can switch to consistent metrics by flipping that config. @ferruzzi - WDYT ? Also if we do that - I think that should be added to our Airflow 3 discussions as a topic to discuss - https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+Dev+call%3A+Meeting+Notes - > maybe you can add it it as command and make sure this is included in our Airflow 3 call at some point in time @dirrao ? |
WHY: This is generally a I think it would be good to have someone who will be "championing" that - i.e. making sure it's remembered, properly docuemented, good warnings raised and people are aware of this as incompatibility - each of the "workstreams" there will have a lead/champion so having someone who will be advocating and leading that through Airlfow 3 transition (with all the consequences/documentation/making sure that it's properly communicated to our users and that there is a transition plam, including some guidelines of what our users should do) is a good idea. |
Good idea. I have updated the warning. So, users can take a call on it when they switching to Airflow 3. warning: "Timer and timing metrics publish in seconds were deprecated. It is enabled by default from Airflow 3 onwards. Enable metrics consistency to publish all the timer and timing metrics in milliseconds." |
I don't have the access to wiki. I will remind you on this. |
If you want access, send me your email in Slack |
looks gpod (sorry for delay) |
@potiuk
|
Hmm.. seing it - I am not sure if warning is a good idea. The thing with warnings is that you should have an "easy" way to get rid of the warning - and changing your dashboard/metrics is likely not an easy task - and I think one of the worst cases with warnings is to not allow people to silence them easily (by fixing them) . I am just thiking whether we should allow a third state of the switch ("legacy" ?) to allow the users to deliberately set it and silence the warning ? BTW. The way how to treat warnings in tests is described in the docs: https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#handling-warnings |
573fc7f
to
d2d190b
Compare
I have updated the unit tests for the changes. I tried adding the ignore warning in |
I'd say with Airflow 3 work started - we should hold on with "fixing" it in Airflow 2 and simply follow whatever we decide on for Airflow 3. That needs someone to take a lead on driviing to consensus and agreeing our metrics/telemetry strategy for Airflow 3:
I think what @shubham22 mentioned at last dev call they are going to verify if that is something that they heavily would rely on and it will be discussed further at the dev calls. |
I'm fine with millis. My main problem is standardizing on the statsd and breaking everyone else. In fact, after the last Airflow 3 dev call where they were discussing dropping statsd entirely in favor of OTel, I'll double down on that objection. We should not be basing this decision on how statsd handles it, especially if we may not even be using statsd in the codebase in a few months. |
See discussions we have about OTEL at devlist - both separate discussion by Dennis and summary of the devcal discussion where we discussed it. |
and see this one as well. #40800 OTel will be the default starting in 3.0, StatsD will be included but secondary/backup/legacy. Given that context, Here's what I think: This can be added to the 3.0 changes and:
|
@ferruzzi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think in 4.0 - but also I think we will get to much faster "MAJOR" release cycle. We have not decided it yet, but we might want to do just that and say release new Airflow MAJOR release every year ? |
version_added: 2.10.0 | ||
type: string | ||
example: ~ | ||
default: "True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!!!! @potiuk I guess no one noticed, cos this made it out with a default of on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, except this hasn't made it to a release yet, and it wasn't marked for backporting. (Lucky miss there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't slip through, it was discussed above and intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied on #43966 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to enable it directly in Airflow 3 without backporting. However, updating this was overlooked.
Align timers and timing metrics (ms) across all metrics loggers
closed: #39818