Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix that metrics aggregation @data_points.clear will cause hdps cleared #1532

Merged

Conversation

xuan-cao-swi
Copy link
Contributor

Description

In metrics aggregation explicit_bucket_histogram and sum, if the aggregation_temporality is delta, @data_points will be cleared at the end of appending time (L39), this will also clear the hdps since they are referenced (i.e. reference variables)

h = { "a" => 1, "b" => 2, "c" => 3 }
hdps = h.each_value do |hdp|
  #...
end

puts hdps # hdps => { "a" => 1, "b" => 2, "c" => 3 }
h.clear   # h => {}
puts hdps # hdps => {}

Changing the hdps = @data_points.each_value do |hdp| ... end to hdps = @data_points.values.map do |hdp| ... end also changes the data structure returned from collect (i.e. ExplicitBucketHistogram.collect) if aggregation_temporality is delta. The first one return the original @data_points structure and second one returns array of HistogramDataPoint).

I think changing the data structure should be fine because when aggregation_temporality is not delta, collect will also return array of HistogramDataPoint (code); furthermore, the test case for metrics sdk also expecting array of HistogramDataPoint from collect (test case).

I checked the specification, but couldn't find anything related to data structure of return object from collect; but if there is something, then @data_points needs deep copy, also the test case needs change.

# with hdps = @data_points.each_value do |hdp| ... end
{{}=>#<struct OpenTelemetry::SDK::Metrics::Aggregation::HistogramDataPoint attributes={}, start_time_unix_nano=1697815054853487387, time_unix_nano=1697815114853499679, count=5, sum=22, bucket_counts=[1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0], explicit_bounds=[0, 5, 10, 25, 50, 75, 100, 250, 500, 1000], exemplars=nil, min=0, max=10>, {"foo"=>"bar"}=>#<struct OpenTelemetry::SDK::Metrics::Aggregation::HistogramDataPoint attributes={"foo"=>"bar"}, start_time_unix_nano=1697815054853487387, time_unix_nano=1697815114853499679, count=5, sum=148, bucket_counts=[1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0], explicit_bounds=[0, 5, 10, 25, 50, 75, 100, 250, 500, 1000], exemplars=nil, min=-10, max=80>}

# with hdps = @data_points.values.map do |hdp| ... end
[#<struct OpenTelemetry::SDK::Metrics::Aggregation::HistogramDataPoint attributes={}, start_time_unix_nano=1697815557892022051, time_unix_nano=1697815617892032384, count=5, sum=22, bucket_counts=[1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0], explicit_bounds=[0, 5, 10, 25, 50, 75, 100, 250, 500, 1000], exemplars=nil, min=0, max=10>, #<struct OpenTelemetry::SDK::Metrics::Aggregation::HistogramDataPoint attributes={"foo"=>"bar"}, start_time_unix_nano=1697815557892022051, time_unix_nano=1697815617892032384, count=5, sum=148, bucket_counts=[1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0], explicit_bounds=[0, 5, 10, 25, 50, 75, 100, 250, 500, 1000], exemplars=nil, min=-10, max=80>]

Test

  1. default test suite
  2. ad-hoc test: simple sinatra app that sends metrics to otlp collector
require 'opentelemetry/sdk'
require 'opentelemetry-metrics-sdk'
require 'opentelemetry/exporter/otlp'

OpenTelemetry::SDK.configure do |c|
 c.service_name = 'sinatra-sample-2'
end

otlp_metric_exporter = OpenTelemetry::SDK::Metrics::Export::OTLPMetricExporter.new # this is experimental exporter that sends metrics to otlp collector through http
OpenTelemetry.meter_provider.add_metric_reader(otlp_metric_exporter)

meter = OpenTelemetry.meter_provider.meter('test')

histogram = meter.create_histogram('histogram', unit: 'smidgen', description: 'a small amount of something')
histogram.record(5, attributes: { 'foo' => 'bar' })

otlp_metric_exporter.pull

@robbkidd
Copy link
Member

robbkidd commented Dec 7, 2023

Failing lint test in jaeger exporter has a fix in #1550.

Copy link
Member

@robbkidd robbkidd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed locally that this change resolves the metrics_sdk test suite failures that appear in the PR adding the test suite to CI.

I can vouch for that.

bash-5.1$ git rev-parse --abbrev-ref HEAD
main

bash-5.1$ rake test
Run options: --seed 35051

# Running:

.......S.........S.................S....

Finished in 0.007356s, 5438.1075 runs/s, 19577.1871 assertions/s.

40 runs, 144 assertions, 0 failures, 0 errors, 3 skips

You have skipped tests. Run with --verbose for details.
The Diff Applied
diff --git a/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/explicit_bucket_histogram.rb b/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/explicit_bucket_histogram.rb
index 641fa51e..58d87cca 100644
--- a/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/explicit_bucket_histogram.rb
+++ b/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/explicit_bucket_histogram.rb
@@ -32,9 +32,10 @@ module OpenTelemetry
           def collect(start_time, end_time)
             if @aggregation_temporality == :delta
               # Set timestamps and 'move' data point values to result.
-              hdps = @data_points.each_value do |hdp|
+              hdps = @data_points.values.map! do |hdp|
                 hdp.start_time_unix_nano = start_time
                 hdp.time_unix_nano = end_time
+                hdp
               end
               @data_points.clear
               hdps
diff --git a/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/sum.rb b/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/sum.rb
index a4bb50f7..31f8ac66 100644
--- a/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/sum.rb
+++ b/metrics_sdk/lib/opentelemetry/sdk/metrics/aggregation/sum.rb
@@ -19,9 +19,10 @@ module OpenTelemetry
           def collect(start_time, end_time)
             if @aggregation_temporality == :delta
               # Set timestamps and 'move' data point values to result.
-              ndps = @data_points.each_value do |ndp|
+              ndps = @data_points.values.map! do |ndp|
                 ndp.start_time_unix_nano = start_time
                 ndp.time_unix_nano = end_time
+                ndp
               end
               @data_points.clear
               ndps

@robertlaurin robertlaurin merged commit dc215ff into open-telemetry:main Dec 7, 2023
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants