Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add Function to convert Exponential Histograms to normal Histograms #33827

Closed
daidokoro opened this issue Jul 1, 2024 · 10 comments · Fixed by #33824
Closed
Labels
enhancement New feature or request processor/transform Transform processor

Comments

@daidokoro
Copy link
Contributor

daidokoro commented Jul 1, 2024

Component(s)

processor/transform

Is your feature request related to a problem? Please describe.

The Coralogix platform presently does not support ingesting metrics in the form of Exponential Histograms. We have clients currently facing this limitation while ingesting metrics from receivers that specifically only support generating Exponential Histograms. For example, the statsdreceiver

Describe the solution you'd like

We have created a solution which adds a custom conversion function to the transform processor, which handles converting exponential histograms to normal histograms.

A brief description of the key features of this function:

  • Takes a single argument, the user-defined Explicit Boundaries to be used in the conversion.
  • Non-Exponential Histogram metrics passed to this function are ignored.
  • Only works within the metric context

Describe alternatives you've considered

We considered addressing the issue in the statsdreceiver and potentially add support for normal histograms there, however, this would only fix the issue for one receiver.

Having a dedicated function in the transform processor allows us to mitigate the issue for *all receivers and external metric sources.

Additional context

We've created a PR for this potential change: #33824

@daidokoro daidokoro added enhancement New feature or request needs triage New item requiring triage labels Jul 1, 2024
@github-actions github-actions bot added the processor/transform Transform processor label Jul 1, 2024
Copy link
Contributor

github-actions bot commented Jul 1, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@daidokoro daidokoro changed the title [transformprocessor] Add Function to convert Exponential Histograms to normal Histograms [processor/transform] Add Function to convert Exponential Histograms to normal Histograms Jul 1, 2024
@kentquirk
Copy link
Member

I think some people will find this very useful, although I think it should be covered in warnings that the conversion is lossy and should only be used when there is no alternative. The results will not be identical.

I took a quick look at the draft PR, and it seems plausible but it needs:

  • More comments on the algorithm used (for example, a note about how the bucket values are converted). I believe that all values in a bucket are assumed to be the max value of the bucket -- is that a good assumption, or would it be better to use the arithmetic or even geometric mean?
  • Many more test cases. A single test case with two values is definitely not sufficient to show this is correct.

@crobert-1
Copy link
Member

Removing needs triage based on code owner's response.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Jul 1, 2024
@daidokoro
Copy link
Contributor Author

daidokoro commented Jul 1, 2024

Hey @kentquirk

Thanks for your response and for having an initial look at the draft.

I'm currently working on adding more testing cases. I've also updated the transform processor README.md in the draft to reflect your recommendations for adding a usage warning.

To clarify the approach/algorithm:

Buckets are calculated based on a combination of the Explicit Boundaries that are passed to the function and the upper boundary of each exponential bucket.

calculateBucketCounts function calculates the bucket counts for a given exponential histogram data point. The algorithm is inspired by the logExponentialHistogramDataPoints function used to Print Exponential Histograms in Otel.

  • factor is calculated as
    $factor = \ln(2) \times 2^{-\text{scale}}$
    math.Ldexp(math.Ln2, -scale)

  • next we iterate the bucket counts and positions (pos) in the exponential histogram datapoint.

    • the index is calculated by adding the exponential offset to the positive bucket position (pos)
      $index = offset + pos$

    • the factor is then used to calculate the upper bound of the bucket which is calculated as
      upper = math.Exp((index+1) * factor)

At this point we know that the upper bound represents the highest value that can be in this bucket, so we take the upper bound and compare it to each of the explicit boundaries provided by the user until we find a boundary that fits, that is, the first instance where upper bound <= explicit boundary.

For eg.

If we have an explicit boundary of [0, 10, 20, 30] and an upper bound of 11, the count would be added to the explicit bound at 20, as it is the 1st value in which the upper bound is <= a given explicit boundary.

Technically, the explicit values of the histogram are never known in this conversion, we only calculate the upper boundaries and use them to determine the bucket based on the Explicit Boundaries defined by the user.

If the user provides Explicit Boundaries that do not fit the datapoints, this will result in imprecise conversions.

@daidokoro
Copy link
Contributor Author

daidokoro commented Jul 4, 2024

/label processor/transform needs-triage

Hey @kentquirk,

I've done the following:

  • Added a few more test cases to the PR, if you believe more should be added, could you recommend additional cases.
  • Added a description of the function to the README as well as a warning as you've described.
  • Tweaked the algorithm used to convert exponential histograms and added comments in the code describing how it is done.

The PR has been set to active from draft.

Let me know if anything else is required.

Thanks.

Copy link
Contributor

github-actions bot commented Jul 4, 2024

Pinging code owners for processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

github-actions bot commented Sep 3, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 3, 2024
@kentquirk
Copy link
Member

This is not stale, just waiting for the PR to be approved and merged.

@kentquirk
Copy link
Member

/label -Stale

@oertl
Copy link

oertl commented Sep 3, 2024

Maybe helpful:

In DynaHist, we have implemented a generic mapping between histograms with different bucket layouts by first considering the reconstruction of the value. We have implemented 4 strategies (lower, upper, midpoint, and uniform) there. We don't have any random reconstruction, as we require reproducibility. We first create an object describing the reconstructed values in ascending order depending on the chosen strategy. This can then be used to efficiently insert the reconstructed values (without explicitly calculating all of them) into the other histogram with a different layout. In this way, we could nicely decouple the reconstruction of values from the insertion code (see https://github.com/dynatrace-oss/dynahist/blob/94608772e16a1dbaed4594eeb38eaa240a89fbce/src/main/java/com/dynatrace/dynahist/AbstractMutableHistogram.java#L120).

The reconstruction strategy can also be used to specify quantile estimation. In Dynahist, quantile estimation is defined as a combination of a reconstruction strategy and a sample quantile estimation method (see https://github.com/dynatrace-oss/dynahist/blob/94608772e16a1dbaed4594eeb38eaa240a89fbce/src/main/java/com/dynatrace/dynahist/AbstractHistogram.java#L233). This enables better reproducibility of the reported quantiles as quantile estimation based on individual values (not to mention histograms) is already ambiguous enough (cf. https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample).

@crobert-1 crobert-1 removed the Stale label Sep 3, 2024
TylerHelmuth added a commit that referenced this issue Sep 21, 2024
… Histo --> Histogram (#33824)

## Description

This PR adds a custom metric function to the transformprocessor to
convert exponential histograms to explicit histograms.

Link to tracking issue: Resolves #33827

**Function Name**
```
convert_exponential_histogram_to_explicit_histogram
```

**Arguments:**

- `distribution` (_upper, midpoint, uniform, random_)
- `ExplicitBoundaries: []float64`

**Usage example:**

```yaml
processors:
  transform:
    error_mode: propagate
    metric_statements:
    - context: metric
      statements:
        - convert_exponential_histogram_to_explicit_histogram("random", [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]) 
```

**Converts:**

```
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: response_time
     -> Description: 
     -> Unit: 
     -> DataType: ExponentialHistogram
     -> AggregationTemporality: Delta
ExponentialHistogramDataPoints #0
Data point attributes:
     -> metric_type: Str(timing)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-07-31 09:35:25.212037 +0000 UTC
Count: 44
Sum: 999.000000
Min: 40.000000
Max: 245.000000
Bucket (32.000000, 64.000000], Count: 10
Bucket (64.000000, 128.000000], Count: 22
Bucket (128.000000, 256.000000], Count: 12
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
```

**To:**

```
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: response_time
     -> Description: 
     -> Unit: 
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
Data point attributes:
     -> metric_type: Str(timing)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-07-30 21:37:07.830902 +0000 UTC
Count: 44
Sum: 999.000000
Min: 40.000000
Max: 245.000000
ExplicitBounds #0: 10.000000
ExplicitBounds #1: 20.000000
ExplicitBounds #2: 30.000000
ExplicitBounds #3: 40.000000
ExplicitBounds #4: 50.000000
ExplicitBounds #5: 60.000000
ExplicitBounds #6: 70.000000
ExplicitBounds #7: 80.000000
ExplicitBounds #8: 90.000000
ExplicitBounds #9: 100.000000
Buckets #0, Count: 0
Buckets #1, Count: 0
Buckets #2, Count: 0
Buckets #3, Count: 2
Buckets #4, Count: 5
Buckets #5, Count: 0
Buckets #6, Count: 3
Buckets #7, Count: 7
Buckets #8, Count: 2
Buckets #9, Count: 4
Buckets #10, Count: 21
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
```

### Testing

- Several unit tests have been created. We have also tested by ingesting
and converting exponential histograms from the `statsdreceiver` as well
as directly via the `otlpreceiver` over grpc over several hours with a
large amount of data.

- We have clients that have been running this solution in production for
a number of weeks.

### Readme description:

### convert_exponential_hist_to_explicit_hist

`convert_exponential_hist_to_explicit_hist([ExplicitBounds])`

the `convert_exponential_hist_to_explicit_hist` function converts an
ExponentialHistogram to an Explicit (_normal_) Histogram.

`ExplicitBounds` is represents the list of bucket boundaries for the new
histogram. This argument is __required__ and __cannot be empty__.

__WARNING:__

The process of converting an ExponentialHistogram to an Explicit
Histogram is not perfect and may result in a loss of precision. It is
important to define an appropriate set of bucket boundaries to minimize
this loss. For example, selecting Boundaries that are too high or too
low may result histogram buckets that are too wide or too narrow,
respectively.

---------

Co-authored-by: Kent Quirk <kentquirk@gmail.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
… Histo --> Histogram (open-telemetry#33824)

## Description

This PR adds a custom metric function to the transformprocessor to
convert exponential histograms to explicit histograms.

Link to tracking issue: Resolves open-telemetry#33827

**Function Name**
```
convert_exponential_histogram_to_explicit_histogram
```

**Arguments:**

- `distribution` (_upper, midpoint, uniform, random_)
- `ExplicitBoundaries: []float64`

**Usage example:**

```yaml
processors:
  transform:
    error_mode: propagate
    metric_statements:
    - context: metric
      statements:
        - convert_exponential_histogram_to_explicit_histogram("random", [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]) 
```

**Converts:**

```
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: response_time
     -> Description: 
     -> Unit: 
     -> DataType: ExponentialHistogram
     -> AggregationTemporality: Delta
ExponentialHistogramDataPoints #0
Data point attributes:
     -> metric_type: Str(timing)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-07-31 09:35:25.212037 +0000 UTC
Count: 44
Sum: 999.000000
Min: 40.000000
Max: 245.000000
Bucket (32.000000, 64.000000], Count: 10
Bucket (64.000000, 128.000000], Count: 22
Bucket (128.000000, 256.000000], Count: 12
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
```

**To:**

```
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: response_time
     -> Description: 
     -> Unit: 
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
Data point attributes:
     -> metric_type: Str(timing)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-07-30 21:37:07.830902 +0000 UTC
Count: 44
Sum: 999.000000
Min: 40.000000
Max: 245.000000
ExplicitBounds #0: 10.000000
ExplicitBounds #1: 20.000000
ExplicitBounds #2: 30.000000
ExplicitBounds #3: 40.000000
ExplicitBounds #4: 50.000000
ExplicitBounds #5: 60.000000
ExplicitBounds #6: 70.000000
ExplicitBounds #7: 80.000000
ExplicitBounds #8: 90.000000
ExplicitBounds #9: 100.000000
Buckets #0, Count: 0
Buckets #1, Count: 0
Buckets #2, Count: 0
Buckets #3, Count: 2
Buckets #4, Count: 5
Buckets #5, Count: 0
Buckets #6, Count: 3
Buckets #7, Count: 7
Buckets #8, Count: 2
Buckets #9, Count: 4
Buckets #10, Count: 21
        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
```

### Testing

- Several unit tests have been created. We have also tested by ingesting
and converting exponential histograms from the `statsdreceiver` as well
as directly via the `otlpreceiver` over grpc over several hours with a
large amount of data.

- We have clients that have been running this solution in production for
a number of weeks.

### Readme description:

### convert_exponential_hist_to_explicit_hist

`convert_exponential_hist_to_explicit_hist([ExplicitBounds])`

the `convert_exponential_hist_to_explicit_hist` function converts an
ExponentialHistogram to an Explicit (_normal_) Histogram.

`ExplicitBounds` is represents the list of bucket boundaries for the new
histogram. This argument is __required__ and __cannot be empty__.

__WARNING:__

The process of converting an ExponentialHistogram to an Explicit
Histogram is not perfect and may result in a loss of precision. It is
important to define an appropriate set of bucket boundaries to minimize
this loss. For example, selecting Boundaries that are too high or too
low may result histogram buckets that are too wide or too narrow,
respectively.

---------

Co-authored-by: Kent Quirk <kentquirk@gmail.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/transform Transform processor
Projects
None yet
4 participants