Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datadog] Payload too large #1925

Closed
cemo opened this issue Jan 2, 2021 · 19 comments
Closed

[Datadog] Payload too large #1925

cemo opened this issue Jan 2, 2021 · 19 comments

Comments

@cemo
Copy link

cemo commented Jan 2, 2021

Describe the bug
We are hitting "Payload too large"

What did you see instead?

2021-01-02T19:46:41.072Z	info	datadogexporter/traces_exporter.go:155	failed to send traces	{"component_kind": "exporter", "component_type": "datadog", "component_name": "datadog", "error": "failed to send trace payload to trace edge: Post \"https://trace.agent.datadoghq.eu/api/v0.2/traces\": write tcp 192.168.234.159:34974->35.241.39.98:443: write: broken pipe"}
2021-01-02T19:46:43.041Z	info	datadogexporter/traces_exporter.go:155	failed to send traces	{"component_kind": "exporter", "component_type": "datadog", "component_name": "datadog", "error": "failed to send trace payload to trace edge: request to https://trace.agent.datadoghq.eu/api/v0.2/traces responded with 413 Payload too large"}
2021-01-02T19:46:45.027Z	info	datadogexporter/traces_exporter.go:155	failed to send traces	{"component_kind": "exporter", "component_type": "datadog", "component_name": "datadog", "error": "failed to send trace payload to trace edge: request to https://trace.agent.datadoghq.eu/api/v0.2/traces responded with 413 Payload too large"}
2021-01-02T19:46:47.602Z	info	datadogexporter/traces_exporter.go:155	failed to send traces	{"component_kind": "exporter", "component_type": "datadog", "component_name": "datadog", "error": "failed to send trace payload to trace edge: request to https://trace.agent.datadoghq.eu/api/v0.2/traces responded with 413 Payload too large"}

What version did you use?
0.17.0

We are hitting these errors. How can we configure it?

@cemo cemo added the bug Something isn't working label Jan 2, 2021
@ericmustin
Copy link
Contributor

@cemo 👋 hello there again, thanks for creating this issue. Would you be able to provide a copy of your configuration yaml? I believe the batch processor can be configured via the processor's send_batch_size and send_batch_max_size configuration options and added to your trace pipeline to prevent payload size issues, although in the long term we'd like to add payload splitting to the exporter. The API payload size limit is 10mb I believe so it may be necessary to reduce the send_batch_size from it's default (8192)

@cemo
Copy link
Author

cemo commented Jan 5, 2021

we reduced size by batch processor. It is working now. However if the payload limit is 10mb, why send_batch_size default value, 8mb, was causing errors?

@ericmustin
Copy link
Contributor

@cemo I believe the send_batch_size defaults are based on merely span count, https://github.com/open-telemetry/opentelemetry-collector/blob/8b82e953a3140f79ab14758c46f48a8f11eb3525/processor/batchprocessor/batch_processor.go#L138.

Long term were looking to improve this within the datadog exporter

@mx-psi
Copy link
Member

mx-psi commented Jan 7, 2021

Hi, generally the same comments apply here #1909 (comment) Traces issues are (for the most part) handled by Eric

@ericmustin
Copy link
Contributor

I think for this specific user the issue is solved but this will continue to pop up for future users until addressed, and the above mitigation is imprecise. let's keep this open until the work for

long term we'd like to add payload splitting to the exporter

is complete.

dyladan referenced this issue in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
This test is very unstable on windows.

Issue for proper fix is filed:
open-telemetry/opentelemetry-collector#1923
@grzn
Copy link
Contributor

grzn commented Aug 24, 2021

We encountered this issue as well; with the default send_batch_size/send_batch_max_size of 8192/0 we're hitting this error, with values of 256/512 everything is working, but we're not sure if these are the optimal values for datadog.

What are the recommended values / max values that datadog support?

@ericmustin
Copy link
Contributor

@grzn I think at this time the recommendation is, because send_batch_size is based on span count, but the datadog api intake is based on payload size, that you will need to tune this specific to your environment and size of spans being emitted by your instrumentation. There's some roadmap work to split payloads automatically that we'll update this issue with when it's being worked on, but nothing to share at the moment and tuning this to fit your specific span constraints/sizing for your environment is the best course of action.

@etiennejournet
Copy link

Hello !

Could you display in the error the number of events/metrics that the plugin tried sending ? That makes things easier for fine tuning ;)

Thanks,

ljmsc referenced this issue in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.40.0 to 1.40.1.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](golangci/golangci-lint@v1.40.0...v1.40.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@gvangeel
Copy link

gvangeel commented Dec 3, 2022

I'm still hitting this issue. Any fix in the making? The reconfiguration of #1925 (comment) is not helping in my case.

@gvangeel
Copy link

gvangeel commented Dec 3, 2022

After some quick version testing, I found that the bug got introduced in the 0.64.0 version of the collector. Version 0.63.1 seems to work just fine.

@mx-psi
Copy link
Member

mx-psi commented Dec 5, 2022

After some quick version testing, I found that the bug got introduced in the 0.64.0 version of the collector. Version 0.63.1 seems to work just fine.

@gvangeel could you open a separate issue for this? Please, also include the signal type you are experience this on the new issue (metrics, logs, traces). Thanks!

@ThijSlim
Copy link

DataDog have documented this.
https://www.datadoghq.com/blog/opentelemetry-logs-datadog-exporter

The max payload size is 3.2MB, and by setting some batch setting, this should never be reached.

@p-csrni
Copy link

p-csrni commented Mar 3, 2023

We started using**_otelcol-contrib_0.72.0_**( as Collector-gatway mode) to ingest metrics to Datadog., getting below error with minimum number of metrics.

error exporterhelper/queued_retry.go:367 Exporting failed. Try enabling retry_on_failure config option to retry on retryable errors {"kind": "exporter", "data_type": "metrics", "name": "datadog", "error": "Permanent error: 413 Payload too large"}
clientutil/retrier.go:83 Request failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "datadog", "error": "502 Bad Gateway", "interval": "4.607845748s"}

```
processors:
memory_limiter:
check_interval: 10s
limit_percentage: 70
spike_limit_percentage: 10
batch:
send_batch_max_size: 1000
send_batch_size: 100
timeout: 10s

metrics:
  receivers: [otlp, prometheus]
  processors: [memory_limiter, batch]
  exporters: [datadog]

@github-actions github-actions bot removed the Stale label Mar 14, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label May 15, 2023
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 14, 2023
@asweet-confluent
Copy link

asweet-confluent commented Mar 27, 2024

@cemo I believe the send_batch_size defaults are based on merely span count, https://github.com/open-telemetry/opentelemetry-collector/blob/8b82e953a3140f79ab14758c46f48a8f11eb3525/processor/batchprocessor/batch_processor.go#L138.

Long term were looking to improve this within the datadog exporter

@ericmustin @mx-psi Given the nature of the API limits (on both max message size and max decompressed message size), I'm guessing this work should be done directly in datadog-agent itself, essentially re-implementing this?

@ericmustin
Copy link
Contributor

Hey @asweet-confluent , sorry to hear your experiencing issues here. This is an older issue iirc, and unfort I'm no longer at datadog nor am I involved with support for this component currently, so I can't provide answers here. @mx-psi may be able to chime in.

Would encourage you to supplement this with a ticket via your orgs traditional supports channels if you have not already done so, esp if you're referencing components outside the cncf. Anecdotally that will help you get eyes / resources on your issues fastest.

Also, in this way, all stakeholders will be aligned 😀

Hope that helps, cheers.

@mx-psi
Copy link
Member

mx-psi commented Apr 1, 2024

Thanks Eric :) @asweet-confluent I would also suggest filing a ticket besides commenting here. There is some upstream work on open-telemetry/opentelemetry-collector/issues/8122 that we are following and may be able to leverage here to address this, but it's still being tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants