You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataflow streaming metrics are delta metrics unlike batch which are cumulative. This means that in every periodic update Dataflow workers send a delta of metrics from last report.
StringSet metrics (used for lineage tracking) are being reported as cumulative metrics in streaming which causes the following issues:
Every periodic (10 seconds) reports took cumulative over and over and reported it hence every report was reporting the metric. Unlike batch job reporting where it filters to only take one which has changed (tracked by dirty bit).
Not reseting was using more memory as metrics remained in memory forever
In backend it lead to large memory consumption when tracking active workitem counters.
Reporting them as cumulative resets the timestamp of counter in backend. As they get overwritten in every report. This is troublesome because when counters are polled in backend to be dumped to monitoring state store this timestamp is used to determine whether the counter has changed or not hence they get dumped more often than they should be.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
What happened?
Dataflow streaming metrics are delta metrics unlike batch which are cumulative. This means that in every periodic update Dataflow workers send a delta of metrics from last report.
StringSet metrics (used for lineage tracking) are being reported as cumulative metrics in streaming which causes the following issues:
Reporting them as cumulative resets the timestamp of counter in backend. As they get overwritten in every report. This is troublesome because when counters are polled in backend to be dumped to monitoring state store this timestamp is used to determine whether the counter has changed or not hence they get dumped more often than they should be.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: