Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: statefulaggregator #31097

Closed
2 tasks
djluck opened this issue Feb 7, 2024 · 7 comments
Closed
2 tasks

New component: statefulaggregator #31097

djluck opened this issue Feb 7, 2024 · 7 comments
Labels

Comments

@djluck
Copy link

djluck commented Feb 7, 2024

The purpose and use-cases of the new component

A known restriction in the metricstransformprocessor is that aggregation of labels happens within a batch of metrics submitted from a single instance of service. This means that it's impossible to aggregate over the service.instance.id label!

Why is this useful? Aggregating away the label attribute allows us to avoid cardinality explosions. In large scale deployments of services that can run thousands of instances, it's expensive to store the per-instance metrics. This is especially true if this metric has additional labels that have a large set of values.

While keeping per-instance metrics is often useful (e.g. CPU, memory, disk, etc.) there are times when it's not helpful to understand the per-instance breakdown (e.g. user adoption metrics, business KPI metrics) and so storing this information is redundant. This component will allow users to aggregate out the instance label and control the cost of expensive metrics.

Example configuration for the component

processor:
  statefulaggregator:
    metrics: 
        # Support regex matching
      - match: my_metric.*
        aggregate_labels: [ service.instance.id ]
        # Controls how frequently the resulting aggregated metric is published
         interval: 30s
        # TODO: enable different types of aggregation in the future
        # aggregation: sum

And given these two separate batches of metrics:
1.

my_metric{service.instance.id ="inst_1", value="a"} 1
my_metric{service.instance.id ="inst_1", value="b"} 3
my_metric{service.instance.id ="inst_2", value="a"} 2
my_metric{service.instance.id ="inst_2", value="b"} 4

It would produce the following values:

my_metric{value="a"} 3
my_metric{value="b"} 7

Telemetry data types supported

Metrics

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

I have prototyped a solution that has shown promise- this issue is to understand if the OpenTelemetry project would be keen to adopt it. The code is rough and so needs work but I would be happy to drive the development of this feature.

@atoulme
Copy link
Contributor

atoulme commented Apr 5, 2024

OK I'm game. I'll sponsor this.

@atoulme atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Apr 5, 2024
Copy link
Contributor

github-actions bot commented Jun 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jun 5, 2024
@strue36
Copy link

strue36 commented Jun 5, 2024

What needs to happen next with this? Does @djluck have a prototype that can be used as a starting point?

@github-actions github-actions bot removed the Stale label Jun 6, 2024
Copy link
Contributor

github-actions bot commented Aug 6, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Aug 6, 2024
@djluck
Copy link
Author

djluck commented Aug 6, 2024

Hey, I'm unfortunately very short on time over the next month. Hopefully I'll be able to clean up and contribute the prototype next in September.

@github-actions github-actions bot removed the Stale label Aug 7, 2024
Copy link
Contributor

github-actions bot commented Oct 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Oct 7, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants