Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Connector/Spanmetrics] Spanmetrics connector producing high datapoints ingestion rate to backend victoria metrics #35449

Open
VijayPatil872 opened this issue Sep 27, 2024 · 4 comments
Labels
bug Something isn't working connector/spanmetrics

Comments

@VijayPatil872
Copy link

Component(s)

connector/spanmetrics

What happened?

Description

Currently we are making use of spanmetrics connectors for generating metrics. we observed high volume of metrics being generated with spanmetrics connector. It causes high datapoints ingestion to Victoria metrics backend. To test it we compared with tempo metrics generator. For testing we use same data pipelines on both parallelly, and we recorded performance with same query on spanmetrics connector as well as tempo metric generator. so, with tempo metric generator with "traces_spanmetrics_calls_total" metrics it produces 6706 results series whereas spanmetrics connector with "calls_total" metrics produces 13363 results series with same timestamp. Also, if we compare the data points ingestion to Victoria metrics backend so, in this case tempo metrics generator produces 7.18K mean whereas spanmetris connector produces 53.6K mean which is quite high. this datapoint ingestion is with 14.2K mean span rate received. so, in this case spanmetrics produces 16K mean metrics point rate. So, with this, is it we are missing any configuration? or is it some internal configuration needed?

Steps to Reproduce

Expected Result

The metrics generation by spanmetrics connector should match with tempo metrics generator or it should near to what tempo metrics generator producing with same setup.

Actual Result

Span rate recieved
image

Metric points rate
image

Datapoints ingestion rate to victoria metrics
image

Collector version

0.104.0

Environment information

No response

OpenTelemetry Collector configuration

mode: "statefulset"
config:         
  exporters:
    otlphttp/spanmetrics:
      endpoint: 
      compression: gzip
      encoding: proto
      timeout: 30s
      tls:
        insecure: true

    otlphttp/servicegraph:
      endpoint: 
      compression: gzip
      encoding: proto
      timeout: 30s
      tls:
        insecure: true
  
  extensions:
    health_check:
      endpoint: ${env:MY_POD_IP}:***********

  connectors:
    spanmetrics:
      histogram:
        explicit:
          buckets: [10ms, 100ms, 250ms]
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
      metrics_flush_interval: 15s
      metrics_expiration: 5m
      dimensions_cache_size: 200 
      exemplars:
        enabled: true
        max_per_data_point: 5
      dimensions:
        - name: http.method
        - name: http.status_code
      events:
        enabled: true
        dimensions:
          - name: exception.type
      resource_metrics_key_attributes:
        - service.name
        - telemetry.sdk.language
        - telemetry.sdk.name
    servicegraph:
      latency_histogram_buckets: [100ms, 500ms, 1s, 5s, 10s]
      store:
        ttl: 2s
        max_items: 10

  processors:
    batch: {}

  receivers:
    otlp:
      protocols:
        http:
          endpoint: ${env:MY_POD_IP}:******
        grpc:
          endpoint: ${env:MY_POD_IP}:******
  service:
    extensions:
      - health_check

    pipelines:
      traces/connector-pipeline:
        exporters:
          - spanmetrics
          - servicegraph
        processors:
          - batch       
        receivers:
          - otlp    
      
      metrics/spanmetrics:
        exporters:
          - debug
          - otlphttp/spanmetrics
        processors:
          - batch
        receivers:
          - spanmetrics

      metrics/servicegraph:
        exporters:
          - debug
          - otlphttp/servicegraph
        processors:
          - batch
        receivers:
          - servicegraph        
                 
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:******

Log output

No response

Additional context

No response

@VijayPatil872 VijayPatil872 added bug Something isn't working needs triage New item requiring triage labels Sep 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Frapschen
Copy link
Contributor

@VijayPatil872 You can check the calls_total label set, maybe some labels have high cardinality.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 12, 2024
@portertech
Copy link
Contributor

To add to what @Frapschen already mentioned, I suspect it's an 🍎 to 🍊 comparison due to the configuration of dimensions and resource attributes. I would also check to make sure tempo (and the metric generator) isn't dropping traces which could reduce counts etc.

@github-actions github-actions bot removed the Stale label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/spanmetrics
Projects
None yet
Development

No branches or pull requests

4 participants