Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDog APM Trace Agent Receiver Support #1852

Closed
boostchicken opened this issue Dec 16, 2020 · 33 comments · Fixed by #5836
Closed

DataDog APM Trace Agent Receiver Support #1852

boostchicken opened this issue Dec 16, 2020 · 33 comments · Fixed by #5836
Labels
Accepted Component New component has been sponsored enhancement New feature or request

Comments

@boostchicken
Copy link
Member

boostchicken commented Dec 16, 2020

Is your feature request related to a problem? Please describe.
I need to intake traces from Datadog tracing libraries. This format is defined and implemented in the Datadog APM Trace Agent.

Describe the solution you'd like
Create a Datadog Receiver that translates the Datadog v0.3, v0.4, and v0.5 API traces to OpenTelemetry format that is suitable for any exporter

Describe alternatives you've considered
Asking Datadog to add OTEL support to their Client libraries

Additional context
Will be using Datadog Go library for API support. This makes serialization of their protocols easy.

EDIT: Updates for clarity of what I am asking for.

@ericmustin
Copy link
Contributor

👋 Hey @boostchicken, thanks for filing this. glad to see folks interested in OTEL. A few quick thoughts here, as we're running a bit light over the holidays but I didn't want to leave you hanging.

  • I'm an IC over at datadog but i'll try to speak from the position of datadog as an org here. Apologies in advance if I sound like a politician, not my intention. Anyway.

  • Datadog definitely understands the value of having interoperability between Vendor specific tracing clients (like Datadog's) and OpenTelemetry. We're currently discussing approaches here to make this easier for current ddog users, and it's more a question of timing (when can we get this shipped and what will have the greatest impact) than it is a question of posture (does datadog want it's existing users to use the most active open source tracing formats and have flexibility in choosing their backend? 💯 yes) . These approaches may include

    • OTEL Export from DD Clients
    • a Datadog Receiver in Otel-Collector-Contrib that converts traces emitted by a datadog client into otlp
    • an OTEL Receiver in the Datadog-Agent
  • We're discussing which of these approaches would be optimal here for the broadest number of users, and should have more concrete action items here ...soon?(very early Q12021 ideally). Of these options, I don't know if the one suggested by this FR ( DD-Client => Datadog-Agent => OTEL Collector) makes the most sense, since that would require users to run multiple agents, and the Agent has some tight coupling with Datadog as a vendor (valid api keys would be needed to run the agent, etc) that may make this approach impractical long term for vendor agnostic OpenTelemetry users.

  • I don't want to discourage you from contributing to this project, of course, and I think i'd be keen to hear more about your use case. I'm just not sure the above FR is something that Datadog as an organization would want to maintain in specifically the form you've described. That being said, is there a link to this project you mention publicly available? I think being able to leverage any pre-existing work here would help greatly in being able to contribute one of the above options mentioned, and any feedback you have on use case, preferred options of the above, etc, is definitely something I'd love to hear.

Anyway, hope that helps, I will try to update this issue when I have more here regarding next steps.

@boostchicken
Copy link
Member Author

boostchicken commented Dec 25, 2020

@ericmustin it's pretty critical our use case. Would it be valuable to reach out through my Account Rep to get this conversation started?

I don't want to go DD-client-> DD Agent-> OTEL. I would like to go DD-Client -> OTEL -> Any Supported exporter (DD exporter to your SaaS). The client speaks a variety of APIs (v0.3, v0.4, v0.5) so the OTEL receiver needs to support them all.

My timeline on this is Q1 2021 wrap (CY not FY), I have some of it started already but certainly would rather leverage something supported or get tips from you all directly.

So far I have....

  1. Created a new base Receiver on port 8126
  2. Ripped off your trace ingest API and have v0.3, 0.4, and 0.5 deserializing
  3. Converts to OTEL Format
  4. Sends to new consumer

The one big "problem" I have now is moving your 64bit TraceIds to 128bit. I have considered.....

  1. Padding with 0s. Based on decodeAPMId in the exporter this may get the same result?
  2. Just duplicating it with concatenation (64bit+64bit). This may have the same effect as above since it just throws away the first 16 bytes
  3. Doing a 128bit hash of the original TraceId

The only one I like is 1, if it works. If someone logged a Trace ID they should be able to find that same Trace ID in your SaaS. How are you handling this?

The other small problem is just the responses. I am probably just going to rip off more APM agent code to fake the format. But since the request isn't sent to the SaaS until it hits the exporter, I wont have a proper result. Right now I just output "OK"

Also, there are other API's in your Trace Agent. The Datadog client config only accepts one URL for the Trace Agent. From everything I can tell those API's are really not that important and certainly wont prevent Trace ingest if they are not present. If they are missing how does this impact the app though?

Other than that, not much is going to stop me from getting a very basic, unconfigurable setup going here once we are back from break. Certainly it will work for my requirements, it will not be ready for open source

I was intending on open sourcing it and contributing it back to the contrib repo here after polish, unit tets, and load tests. However, if you are already writing the code for this I don't feel the need to compete with you. It's just a question of timing. I may need to finish this in a couple weeks, then adopt the official library later.

Also, moving a little beyond this. I have found some of the DD libraries use the DD APM Trace Agent API (Java), while others connect directly to the SaaS like your exporter (AWS Lambda Layer). It seems we will also need to support that API for ingestion. For my current requirement, this is not as urgent and can come a few months later.

@boostchicken
Copy link
Member Author

https://github.com/boostchicken/opentelemetry-collector-contrib/tree/datadog-receiver

Feel free to poke around, it's super raw right now, the code needs lots of organization and tests, however this works. It will take DDAPM and handle it appropriately.

@boostchicken
Copy link
Member Author

boostchicken commented Dec 25, 2020

Also, note your pb/exportable pkg does not include the functions needed to deserialize v0.5 format requests. This led to me copy and pasting that code and slightly modifying it. Would be great to see an adjustment on your side with that.

@boostchicken
Copy link
Member Author

Also an other question, will all the trace.* metrics still get made if it is not going through an APM agent?

@ericmustin
Copy link
Contributor

@boostchicken thanks for the additional info, will try to get you full answers to everything when I’m back from holidays and back at laptop (am on mobile so apologies for any typos).

Thanks for sending over the repo, I’ll take a look. And also yes please do chat with ur rep here, having that additional context helps us as an org understand client needs and prioritize this work accordingly.

real quickly, some partial answers to above Q’s:

  1. Our otel-collector-contrib datadog exporter generates the trace.* metrics and sends them to our stats api intake endpoint. Shipping them dd-agent => otel-dd exporter may dupe metrics.
  2. I’m not 100% sure v0.5 is something we’ve GA’d for all clients, I have to sync internally on where that stands.
  3. Re 64bit 128bit, I think the first option is the suggested approach from the larger otel community, though I’ll have to take a look at what other 64bit receivers are doing here (jaeger/zipkin receivers would probably have similar logic)
  4. I think what you’re saying regarding preferred trace export flow makes sense I just don’t want to give a misleading answer here regarding what we plan to prioritize , so bear with me there as I try to get some consensus internally and follow up. What I can say is our timeline around additional work here is similar (shipped in q1 2021)

Hope that helps, feel free to reference this issue or myself on any comms with your rep. And merry Christmas and happy holidays as well.

cheers,
Eric

@boostchicken
Copy link
Member Author

boostchicken commented Dec 29, 2020

Eric,

Merry Christmas and Happy Holidays. Thank you for the response so fast. Your answers are extremely helpful. I will begin tackling this again in the 2nd week in Jan. I will be reaching out to my Datadog account rep also to start a more formal discussion, do you mind if I mention you by name and reference this issue?

Thanks!

John

@ericmustin
Copy link
Contributor

@boostchicken feel free to reference me and link to this issue as well.

@boostchicken
Copy link
Member Author

@ericmustin will do! I am going to reach out today as we are back at work. Thanks!

@andrewhsu andrewhsu added enhancement New feature or request and removed feature request labels Jan 6, 2021
dyladan referenced this issue in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
…from partitions (#1858)

Collect additional labels (fs.type, mount.mode, mount.point) in the filesystems scraper. Currently the receiver does not honor multiple mount points for a single device. This change will capture all mount points in such cases and also collects the mode (ro or rw). This PR also adds a label called fs.type to capture filesystem type.

In a followup PR, will add ability to filter by mount points as well.

Link to tracking Issue: #1852
@ericmustin
Copy link
Contributor

@boostchicken i apologize for the delay here. This fell through the cracks for me, as a few other more urgent issues arose this quarter. i'm sorry about the drop in communication though, that's just a mistake on my part.

Circling back, unfortunately at this time I don't believe we plan to prioritise contributing or maintaining a receiver in the opentelemetry-collector. I believe our next focus will be on an OTLP receiver in the datadog-agent, but I don't have a firm timeline on when that will be available.

@boostchicken
Copy link
Member Author

@ericmustin No worries! We ended up having to put this on the backburner, but now we are picking it up again. Would love some eyes on from any of your engineers if they can for a code-review

https://github.com/boostchicken/opentelemetry-collector-contrib/tree/datadog-receiver/receiver/datadogreceiver

The code is functional complete with all APIs, just needs unit-tests finished, integration testing, benchmarking and then I will be sending the PR

@mx-psi mx-psi assigned ericmustin and unassigned mx-psi May 14, 2021
@boostchicken
Copy link
Member Author

@ericmustin I changed the datadogexporter to treat span names correctly, also I updated the resourceName generation to work with your libraries

@ericmustin
Copy link
Contributor

ericmustin commented May 16, 2021

@boostchicken This is great work and I think it's a great start. It sounds like this is being used by your organization already in production?

  • I can find some time this week/sprint to review your branch and offer some feedback. I did a quick pass, seems like a reasonable implementation. I generally think, if this is working for you on your branch, while there's a few bits getting dropped in the span translation, it's fine, you can define that how you feel appropriate for your org, it's your data, etc. Some things like error metadata, and container info, probably are getting lost in translation right now. Like i said, I'll try to schedule some time and add some comments and suggestions where appropriate.

  • For an upstream pr, I can try to define and share a more formal mapping between trace payload formats, we have something internally so working on making that shareable. Also there would be details around the receiver component itself that I think an approver or maintainer on the collector-contrib repo would need to review. Per my last comment, while I don't think a trace receiver component is something datadog has immediate plans to maintain, I don't think that precludes you from getting code review and feedback from contrib maintainers, either by making a PR to the contrib repo or to the registry. Are you a member of the OpenTelemetry org/want to maintaining this component? I don't recall off the top of my head how that process works, a (draft?) pull request can be a good way to move those conversations forward though.

@boostchicken
Copy link
Member Author

boostchicken commented May 17, 2021

@ericmustin quick question, why do you always generate the span name in the datadogexporter, if one is provided why not just use it? I ask because let's say I point dd-trace-java at my reciever and have it going to a datadogexporter, the span name (which was perfectly final to begin with) is lost.

Also yes, I am willing to maintain this, also I am working on getting hooked in with the group.

@ericmustin
Copy link
Contributor

@boostchicken There are a few reasons, this thread is about a slightly different subject but has some overlap you may find relevant. That being said, I'm not 100% sure what the answer is because you're describing a currently unsupported use case and I haven't reviewed your fork (like i said, scheduling time for that 😄 ). At a high level, a datadog span operation name and an opentelemetry span name have differences in semantic meaning and use that makes them non interoperable, but your receiver and a datadog exporter could perhaps leverage some features within opentelemetry, like traceState or instrumentation Library Name, to convey some signals to a downstream exporter about where the span originated, and specialised handling if so.

Hope that helps.

@ericmustin
Copy link
Contributor

@boostchicken hey, just circling back as I still owe you a review here. Sorry for the delay 😓 , bandwidth has been stretched a bit thinner than i anticipated. That branch you linked to still good to leave comments on?

@ericmustin
Copy link
Contributor

following up here, apologies for not updating things earlier, the fork that's linked to here seems serviceable and i think this issue should get left open should folks want to make an upstream contribution to the -contrib repo. Otherwise I would say the best course of action is to reach out to your CSM or support@datadoghq.com if you have interest in this feature so folks here can prioritise appropriately.

@alolita alolita added comp:aws AWS components exporter/datadog Datadog components and removed comp:aws AWS components labels Sep 2, 2021
@ericmustin
Copy link
Contributor

@boostchicken Hey, I don't want to mislead you on next steps as I am no longer a Datadog employee, so I would need to defer to the Datadog team on this. I'm forwarding this to my former colleagues there and I'm sure they will be able to reply promptly. All the best.

@codeboten
Copy link
Contributor

@mx-psi is this something you could help with?

@mx-psi
Copy link
Member

mx-psi commented Oct 29, 2021

@codeboten thanks for the ping, I believe @boostchicken already opened a PR for this over at #5836; I will review once they sign the CLA

@boostchicken
Copy link
Member Author

@mx-psi I am working on the CLA. I was going to do individual but some of it was on company time and resources. FYI, I work over at Sony Interactive Entertainment (PlayStation). I finalized the request to our OSS board today and will keep you posted. If any more review comments come in the PR I will update them.

@mx-psi
Copy link
Member

mx-psi commented Nov 5, 2021

Thanks for letting me know @boostchicken! I will give a look to the PR in the near future then

@boostchicken
Copy link
Member Author

@mx-psi it's all done and all checks are passing, please give the PR some love and we can merge it in :)

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 7, 2022
@moonheart
Copy link
Member

moonheart commented Jan 8, 2023

Sorry for interrupt. Since the original PR #5836 got closed by boostchicken, will this repo accept another PR that add Datadog receiver support?

@jpkrohling jpkrohling added the Sponsor Needed New component seeking sponsor label Jan 24, 2023
@jpkrohling
Copy link
Member

On behalf of the OpenTelemetry Squad at Grafana Labs, I would like to volunteer as a sponsor for this component. In addition to the component's author, the code owners (as sponsors) will be:

While we would very much prefer the affected vendor to support this receiver, we understand they might not be interested in translating their proprietary formats into OTLP, only the other way around.

We believe that the community wants this receiver, and if community members are willing to contribute code for this, we are willing to support the community.

@jpkrohling jpkrohling added Accepted Component New component has been sponsored and removed priority:p2 Medium spec:trace release:allowed-for-ga exporter/datadog Datadog components Sponsor Needed New component seeking sponsor labels Jan 25, 2023
@moonheart
Copy link
Member

I'm currently working on this component at https://github.com/moonheart/opentelemetry-collector-contrib/tree/dev-feat-datadog-receiver, most code is finished expect for some testing code and code that add this component to otel collector . If someone else is working on this too, please let me know, so we can work together.

@boostchicken
Copy link
Member Author

boostchicken commented Jan 26, 2023

@jpkrohling your post above inspired me to open my PR again, more details are over there.

@boostchicken
Copy link
Member Author

boostchicken commented Jan 26, 2023

#5836 (comment)

Let's do this one more time with feeling. I have re-opened my PR

@mvadu
Copy link

mvadu commented Feb 7, 2023

@boostchicken thank you for persevering through this PR and getting it merged. But looking at the PR the documentation link seems to be dead already since you deleted that branch. May be you can post the documentation linked in the original PR (https://github.com/boostchicken/opentelemetry-collector-contrib/blob/datadog-receiver/receiver/datadogreceiver/README.md) as a gist and link in the PR?

@boostchicken
Copy link
Member Author

The documentation is in the main branch now, jut goto the datadogreceiver folder and you should see it just fine https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/datadogreceiver/README.md

@mvadu
Copy link

mvadu commented Feb 7, 2023

@boostchicken ah, i didn't realize the readme was also in the PR! that makes sense.. thank you again. this PR got merged at the very best of times for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants