Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datadogexporter] Improve hostname resolution #1285

Merged
merged 13 commits into from
Oct 20, 2020

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Oct 16, 2020

Description:

Improve system hostname detection for the Datadog exporter.
This PR:

  • Moves config and host code to their own packages to avoid dependency cycles
  • Adds hostname validation
  • Adds fully qualified domain name hostname resolution on some platforms
  • Adds support for caching hostname

Link to tracking Issue: n/a

Testing:
Added unit tests, tested on an end to end test environment with the component activated.

Documentation:
Documentation was added to all public functions.

@mx-psi mx-psi force-pushed the improve-host-detection branch from 5095cc3 to a5fae93 Compare October 16, 2020 12:06
@mx-psi mx-psi marked this pull request as ready for review October 16, 2020 12:22
@mx-psi mx-psi requested a review from a team October 16, 2020 12:22
Copy link
Member

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Pablo :-)

}

if err := validHostname(cfg.Hostname); err == nil {
cache.SetNoExpire(cache.CanonicalHostnameKey, &cfg.Hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CanonicalHostnameKey always just refer to the host name specified in the config? If so, what do you think about naming it something like ConfiguredHostNameKey?

Copy link
Member Author

@mx-psi mx-psi Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CanonicalHostnameKey always just refer to the host name specified in the config?

Only if it is valid. If it is invalid (for example, it is unset), the canonical host name will be the one fetched from the system (either the FQDN or the OS hostname) and that will be the one saved in the cache (we always save to the cache before returning):

cache.SetNoExpire(cache.CanonicalHostnameKey, &hostname)

exporter/datadogexporter/metadata/system/host.go Outdated Show resolved Hide resolved
exporter/datadogexporter/metadata/host.go Outdated Show resolved Hide resolved
}

// Get system hostname
hostInfo := system.GetHostInfo(logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if, since the current function, GetHost, is the top level function that returns the hostname to the caller, it could be made smaller, with a lot of the fallback logic encapsulated at a lower level, such as methods on HostInfo. Maybe GetHost could then just 1) get the cached value 2) if there's no value, get the hostname 3) put the hostname into the cache if necessary and 4) return it? The configured hostname could also be passed into the HostInfo so it could do the full fallback of configuredValue -> systemValue -> osValue? Just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me to have a method on HostInfo to handle which one to return (I will change that), but I think passing the configured hostname will be a little cumbersome in the future. The end goal is to have other metadata sources (like EC2 or GCE metadata) that are considered in a certain order (a very simplified version of this code in our Datadog Agent) and I would prefer to avoid passing the configured hostname to each of them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorganized the code so that some of the fallback logic is handled in a HostInfo method

func (hi *HostInfo) GetHostname(logger *zap.Logger) string {
if err := valid.ValidHostname(hi.FQDN); err != nil {
logger.Info("FQDN is not valid", zap.Error(err))
return hi.OS
} else {
return hi.FQDN
}
}

exporter/datadogexporter/utils/cache/cache.go Outdated Show resolved Hide resolved
@mx-psi
Copy link
Member Author

mx-psi commented Oct 20, 2020

Hi Pablo 😄 ! I tried to address all your comments, feel free to have a second look

Copy link
Member

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Pablo :-)

I do think the global cache could be made a field of the exporter and then passed in to GetHost just to make the dependency more explicit, but not a requirement for this PR from my perspective. 👍

@mx-psi
Copy link
Member Author

mx-psi commented Oct 20, 2020

Awesome, thanks!

I do think the global cache could be made a field of the exporter and then passed in to GetHost just to make the dependency more explicit, but not a requirement for this PR from my perspective. 👍

Thanks for the feedback, I will reconsider changing that in a future PR: first I want to see the whole picture on where we use the cache after we add some extra metadata features that are still missing, so I think it's best if the cache stays as it is for now.

@tigrannajaryan tigrannajaryan merged commit 83a77ab into open-telemetry:master Oct 20, 2020
@mx-psi mx-psi deleted the improve-host-detection branch October 21, 2020 07:41
kohrapha referenced this pull request in hdj630/opentelemetry-collector-contrib Oct 26, 2020
* Add API key validation (#1216)

Adds API key validation to the Datadog metrics exporter.
When created, the Datadog metrics exporter now sends a requests to the `/api/v1/validate` endpoint of the Datadog backend to check that the configured API key is valid. If it's not, a warning log is emitted.

Tests were amended to take into account that validation call. Test utils were added to mock an HTTP server that performs validation.

* sapmexporter: make span source attribute and destination dimension names configurable (#1286)

If dimension names are being translated in the signalfxexporter then the map values
should be set to the signalfx names. Ideally we can sync to OT dimension names
with translation being done on the backend (the default).

* Update README (#1294)

* Release v0.13.0 (#1295)

* Remove duplicate definition of cloud providers with core conventions (#1288)

* Remove duplicate definition of cloud providers

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Fix more duplicate usage of the cloud providers semconv

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Splunkhec receiver metrics (#1276)

Adds the ability for Splunk HEC to ingest metrics.
This is a follow up to #1268 which adds the ability to ingest logs.

* Add jpkrohling as an approver (#1296)

* Remove pjanotti from maintainers (#1300)

* Auto assign approver and maintainers to PRs (#1301)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Add codeowners to ensure components are assigned to the appropriate reviewers (#1304)

This is the initial list extracted from README.

* Moved the groupbytrace processor to contrib (#1179)

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Add codeowners for interanl components (#1307)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Small fixes to CODEOWNERS (#1312)

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

**Description:** This PR changes the CODEOWNERS in a couple of aspects:

1. Fixed the order of the directories, so that 'internal' comes after 'extension'
1. Fixed the name of a few components
1. Added missing components and directories

Verified with:

```
for component in exporter extension processor receiver; 
do 
  ls ${component}/ -1 > /tmp/${component}.txt
  grep ${component} .github/CODEOWNERS | awk -F\/ '{print $2}' > /tmp/${component}-codeowners.txt
  diff /tmp/${component}.txt /tmp/${component}-codeowners.txt
done
```

Result of the script before this PR:

```diff
11d10
< loadbalancingexporter
2c2
< jmxmetricsextension
---
> jmxmetrics
1d0
< groupbytraceprocessor
5c4
< routingprocessor
---
> routing
```

* Update collector version in groupbytraceprocessor (#1309)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Update dependabot to ensure all projects are added (#1303)

* Update dependabot to ensure all projects are added

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Update dependabot.yml

* Do not run tests/lint/etc for all component tags (e.g. tag testbed/v0.13.0) (#1298)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* tests: increase TestTrace10kSPS memory limits (#1314)

* Bump k8s.io/client-go in /receiver/k8sclusterreceiver (#1323)

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.19.2 to 0.19.3.
- [Release notes](https://github.com/kubernetes/client-go/releases)
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.19.2...v0.19.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/aliyun/aliyun-log-go-sdk (#1321)

Bumps [github.com/aliyun/aliyun-log-go-sdk](https://github.com/aliyun/aliyun-log-go-sdk) from 0.1.13 to 0.1.14.
- [Release notes](https://github.com/aliyun/aliyun-log-go-sdk/releases)
- [Commits](aliyun/aliyun-log-go-sdk@v0.1.13...v0.1.14)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump go.opencensus.io in /processor/groupbytraceprocessor (#1320)

Bumps [go.opencensus.io](https://github.com/census-instrumentation/opencensus-go) from 0.22.4 to 0.22.5.
- [Release notes](https://github.com/census-instrumentation/opencensus-go/releases)
- [Commits](census-instrumentation/opencensus-go@v0.22.4...v0.22.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump k8s.io/client-go from 0.19.2 to 0.19.3 in /internal/k8sconfig (#1318)

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.19.2 to 0.19.3.
- [Release notes](https://github.com/kubernetes/client-go/releases)
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.19.2...v0.19.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add more @elastic folks to codeowners (#1313)

In answer to open-telemetry#1304 (comment)
add two more codeowners for the Elastic exporter.

* Add contrib approvers as owners to all the components. (#1325)

Without this change if there is a listed owner with write permission in the
component owners list, the contrib approvers will lose their power see #1316.

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Bump github.com/aws/aws-sdk-go in /internal/awsxray (#1316)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.35.9 to 1.35.10.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-go@v1.35.9...v1.35.10)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/aws/aws-sdk-go in /internal/awsxray/testdata/sampleapp (#1317)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.35.9 to 1.35.10.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-go@v1.35.9...v1.35.10)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Clarify PR reviewing and facilitating (#1315)

We recently introduced the automatic assignments of PRs to reviewers
and to facilitators. This change explains the process.

* Handle nil references from the kubelet API (#1326)

* Update to latest collector, update deprecated calls (#1308)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* signalfx Receiver: Better Pipeline Error Handling (#1329)

If logs aren't configured and events are sent, return a clear error response
instead of panicing.  Vice-versa for metrics.

* [datadogexporter] Improve hostname resolution (#1285)

Improve system hostname detection for the Datadog exporter.
This PR:

- Moves config and host code to their own packages to avoid dependency cycles
- Adds hostname validation
- Adds fully qualified domain name hostname resolution on some platforms
- Adds support for caching hostname

Added unit tests, tested on an end to end test environment with the component activated.

Documentation was added to all public functions.

* Temporarily remove dmitryax from PR facilitators (#1330)

dmitryax will be unavailable for a while, removing him from the list of PR facilitators.

* Update otel collector, fix breaking change for renaming TracesConsumer (#1328)

* Update otel collector, fix breaking change for renaming TracesConsumer

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* More fixes of usages

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

* Add batchpertrace library (#1257)

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

Adds a library that will split the incoming batch into several batches, one per trace.

**Link to tracking Issue:** Closes #1235.

* Fix the link to the release notes (#1327)

* Datadog trace flushing/export (#1266)

This PR adds flushing+export of traces and trace-related statistics to the `datadogexporter`, as well as some very minor changes to the translation of internal traces into Datadog format. It represents the second of two PRs for the work contained in open-telemetry#1203. It builds on top of current master branch, and follows up to the work [done here](open-telemetry#1208).

The final PR explicitly enabling The Datadog exporter will follow, and will allow users to export traces to Datadog's API Intake. 

This PR Split was requested by @tigrannajaryan and hopefully should make code review a bit less cumbersome. However if there are any questions or changes to the PR format needed, please let me know.

**Testing:** There are unit tests for the different methods and helper methods within the export code.
 
**Documentation:**  Appropriate usage, including best practices for which processors to also enable, has been documented in the README, `testdata/config.yaml` and `example/config.yaml` samples.

**Notes**: This PR includes a trace exporter for non-windows environments only (metrics are fine in windows, just traces that are the issue), due to reasons explained in this pr open-telemetry#1274 . tl;dr is our trace export code for windows env would rely on CGO for now, which is not permitted in the collector

* Logzio exporter impl (#1161)

Added a logz.io traces exporter

**Link to tracking Issue**: #686

**Testing**: Added test for each of the components in the new exporter

**Documentation**: Added a readme specifying how to use the exporter and its parameters with an example.

* Add the notion of unstable components and unstable executable (#1299)

The list of experimental components is defined in unstable_components_enabled.go.
These components are only enabled when enable_unstable build tag is defined.
We define this tag and produce an executable named otelcontribcol_unstable_$(GOOS)_$(GOARCH)$(EXTENSION)
when `make otelcontribcol-unstable` is invoked.

For now the new executable is not used anywhere. Next I will look into modifying
the testbed to call the new unstable executable for certain tests.

To verify that the unstable build functionality is enabled I added
stanzareceiver to the list of unstable components and manually verified
that it is indeed enabled in the unstable executable but is not available
in the regular otelcontribcol executable.

Contributes to open-telemetry#873

* JMX Metric Extension: Initial implementation (#1182)

* Add JMX Metric Extension implementation

* rename package to jmxmetricextension

* jmxmetricextension s/metrics/metric

* jmx metrics: fix prometheus typo

* jmx metrics: capitalize acronyms

* jmx metrics: clarify interval

* Enable stale PR action (#1341)

To help reviewers and authors remember to make progress on PR
this action will mark PRs as stale after inactivity of 7 days
and will close the PR after 7 more days of inactivity.

* [datadogexporter] Enable traces on Windows (#1340)

* Re-enable traces code on Windows
Use a custom-made version of the Datadog Agent repository
that greatly reduces the number of dependencies needed and removes
the osext one that depends on CGo

* Address linter issue

* Empty commit to retrigger CI

* Build traces flush/export code on Windows

* Add kind type to root span to fix the empty parentID problem (#1338)

* Add kind type to root span to fix the empty parentID problem

* Set kind type for root span in Xray receiver

* Update receiver/awsxrayreceiver/internal/translator/translator.go

Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>

Co-authored-by: Bogdan Drutu <lazy@splunk.com>
Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>

* [awsecscontainermetrics] receiver- Update README (#1358)

* [awsecscontainermetrics] receiver- Update README

Signed-off-by: Rayhan Hossain <hossain.rayhan@outlook.com>

* Use full form of metric units

Signed-off-by: Rayhan Hossain <hossain.rayhan@outlook.com>

* Add timer support for statsD receiver (#1335)

* [datadogexporter] Add Datadog exporter to the otelcontribcol binary (#1352)

* Add datadogexporter to the binary

* Disable environment variables
They don't work; we will revisit it in the future

* [datadogexporter] Update go-datadog-api.v2 dependency to v2.30.0 (#1365)

* [signalfx_correlation] Add signalfx_correlation exporter skeleton (#1332)

* [signalfx_correlation] Add signalfx_correlation exporter skeleton

This is for moving the correlation out of sapmexporter into a dedicated
exporter so that the correlation can be used even when sapm isn't (for example,
on an agent that is exporting in otlp to a gateway instead of sapm.)

* fix readme

* [awsemfexporter] Restructure Metric Translator Logic (#1353)

* Restructure buildCWMetric logic (#1)

* Restructure code to remove duplicated logic

* Update format

* Improve function and variable names

* Extract logic for dimension creation and add test

* Implement minor fixes

* Remove changes to go.sum

* Implement tests for getCWMetrics

* Implement tests for buildCWMetric

* Format metric_translator_test.go

* Run with gofmt -s

* Disregard ordering of dimensions in test case

* Perform dimension equality checking as a helper function

* Setting the tlsconfig InsecureSkipVerify using NoVerifySSL (#1350)

Co-authored-by: Kylian Serrania <kylian.serrania@datadoghq.com>
Co-authored-by: Jay Camp <jay.r.camp@gmail.com>
Co-authored-by: Steve Flanders <sflanders@splunk.com>
Co-authored-by: Jeff Cheng <jcheng@signalfx.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
Co-authored-by: Antoine Toulme <atoulme@users.noreply.github.com>
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Paulo Janotti <pjanotti@splunk.com>
Co-authored-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Eric Mustin <mustin.eric@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Wilkins <axw@elastic.co>
Co-authored-by: Pablo Collins <pablo.collins@gmail.com>
Co-authored-by: Ben Keith <benkeith@splunk.com>
Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
Co-authored-by: Yogev Mets <yogev.metzuyanim@logz.io>
Co-authored-by: Ryan Fitzpatrick <rmfitzpatrick@users.noreply.github.com>
Co-authored-by: John <59711343+JohnWu20@users.noreply.github.com>
Co-authored-by: Bogdan Drutu <lazy@splunk.com>
Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>
Co-authored-by: Rayhan Hossain (Mukla.C) <hossain.rayhan@outlook.com>
Co-authored-by: Gavin Zhang (Kunyuan Zhang) <31523962+gavindoudou@users.noreply.github.com>
Co-authored-by: shreyas Darwhatkar <67069455+darwhs@users.noreply.github.com>
kohrapha referenced this pull request in hdj630/opentelemetry-collector-contrib Oct 29, 2020
Improve system hostname detection for the Datadog exporter.
This PR:

- Moves config and host code to their own packages to avoid dependency cycles
- Adds hostname validation
- Adds fully qualified domain name hostname resolution on some platforms
- Adds support for caching hostname

Added unit tests, tested on an end to end test environment with the component activated.

Documentation was added to all public functions.
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants