forked from open-telemetry/opentelemetry-collector-contrib
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/v0.91.x #11
Merged
Merged
Release/v0.91.x #11
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ixes (open-telemetry#28682) **Description:** Part 2 of open-telemetry#28679 these are the tests that can be re-enabled without requiring any code changes after open-telemetry#28680 is merged. **Link to tracking Issue:** Related to open-telemetry#28679
…trol (open-telemetry#29095) **Description:** Added support for more control over TTL configuration. Currently, it supports timelines only in days, and now also in hours, minutes and seconds. **Link to tracking Issue:** [28675](open-telemetry#28675)
…leased in contrib (open-telemetry#29275) An issue was opened recently wondering why this processory was not available in the contrib release. Since this processor is experimental and temporary there's no plan to support it long term, so I've added a note that makes it clear why it's not included in the contrib distribution releases. Fixes open-telemetry#29150
…s and contexts (open-telemetry#29241) **Description:** Updates the OTTL readme to make it easier to find functions and paths, which is what most people are looking for. --------- Co-authored-by: Curtis Robert <crobert@splunk.com> Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
…negotiation (open-telemetry#29153) The code needs some basic tests that can be later expanded with tests for native histograms use cases. Changes: Refactored `testComponent` function to be easier to customize the configuration of the scrape. Expanded `compareHistogram` to assert on the explicit boundaries as well. Added function `prometheusMetricFamilyToProtoBuf` to helpers to be able to turn serialize a Prometheus metric family into Protobuf. Added simple test of Protobuf based scrape of counters, gauges, summaries and histograms. open-telemetry#26555 Followup to open-telemetry#27030 Related to open-telemetry#28663 **Testing:** Adding simple e2e test for scraping over Protobuf. **Documentation:** Not applicable. --------- Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: David Ashpole <dashpole@google.com>
**Description:** This is a namedpipe input operator, which will read from a named pipe and send the data to the pipeline. It pretty closely mimics the file input operator, but with a few differences. In particular, named pipes have an interesting property that they receive EOFs when a writer closes the pipe, but that _doesn't_ mean that the pipe is closed. To solve this issue, we crib from existing `tail -f` implementations and use an inotify watcher to detect whenever the pipe receives new data, and then read it using the standard `bufio.Scanner` reader. **Link to tracking Issue:** open-telemetry#27234 **Testing:** We add a couple of tests for the new operator. The first tests simply the creation of the named pipe - checking that it's created as a pipe, with the right permissions. The second goes further by inserting logs over several different `Open`s into the pipe, testing that the logs are read, and that the operator is able to handle the named pipe behavior of skipping over EOFs. **Documentation:** None, at the moment /cc @djaglowski --------- Signed-off-by: sinkingpoint <colin@quirl.co.nz>
This is the Part 1 PR for the Failover Connector (split according to the CONTRIBUTING.md doc) Link to tracking Issue: open-telemetry#20766 Testing: Added factory test Note: Full functionality PR exists [here](open-telemetry#27641) and will likely be refactored to serve as the part 2 PR cc: @djaglowski @sethallen @MovieStoreGuy
…coder when nop encoding is defined (open-telemetry#28901) **Description:** Enhancement - In udp receiver (stanza operator), change handleMessage not to call decode method in case nop encoding is defined, as it's unnecessary. This improves performance in high scale scenarios by reducing memory allocations. **Link to tracking Issue:** <28899 **Testing:** Ran existing unitests. Ran ran stress tests (sending 250k udp packets per second) - memory allocation reduced by 10-20%. **Documentation:** None
This updates the githubgen allowlist to reflect membership of current codeowners.
Signed-off-by: Alex Boten <aboten@lightstep.com>
…exporters. (open-telemetry#29284) This relates to open-telemetry#27849
open-telemetry#29116) **Description:** As originally proposed in open-telemetry#26991 before I got distracted Exposes the duration of generated spans as a command line parameter. It uses a `DurationVar` flag so units can be easily provided and are automatically applied. Example usage: ```bash telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ns # nanoseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10us # microseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ms # milliseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10s # seconds ``` **Testing:** Ran without the argument provided `telemetrygen traces --traces 1 --otlp-insecure` and seen spans publishing with the default value. Ran again with the argument provided: `telemetrygen traces --traces 1 --otlp-insecure --span-duration 1s` And observed the expected output: ``` Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : 8b441587ffa5820688b87a6b511d634c Parent ID : 39faad428638791b ID : 88f0886894bd4ee2 Name : okey-dokey Kind : Server Start time : 2023-11-12 02:05:07.97443 +0000 UTC End time : 2023-11-12 02:05:08.97443 +0000 UTC Status code : Unset Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : 8b441587ffa5820688b87a6b511d634c Parent ID : ID : 39faad428638791b Name : lets-go Kind : Client Start time : 2023-11-12 02:05:07.97443 +0000 UTC End time : 2023-11-12 02:05:08.97443 +0000 UTC Status code : Unset Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) {"kind": "exporter", "data_type": "traces", "name": "debug"} ``` **Documentation:** No documentation added. --------- Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
…n-telemetry#29121) **Description:** Link to tracking Issue: Fixes open-telemetry#29120 Fixing a bug - panic happens during stop method in async mode only (didn't affect the default non-async mode). When stop is called, it closes the messageQueue channel, signaling to processMessagesAsync to stop running. However, readMessagesAsync sometimes tries to write into the closed channel (depends whether the method is currently reading from the closed connection or currently trying to write to the channel), and as a result, panic error happens. Separated between wg (waitGroup that serves non-async code and processMessagesAsync) & the new wg_reader (waitGroup serving readMessagesAsync only). This allows us to first stop readMessagesAsync, wait for it to finish, before closing the channel. Instead, stop (in async mode) should do the following: 1. Close the connection - signaling readMessagesAsync to stop - the messageQueue channel will remain open until that method is done so there's no risk of panic (due to writing to a closed channel). 2. Wait for readMessagesAsync to finish (wait for new wg_reader). 3. Close messageQueue channel (signaling processMessagesAsync to stop) 4. Wait for processMessagesAsync to finish (wait sg). **Link to tracking Issue:** 29120 **Testing:** Unitests ran. Ran concrete strato, stopped & restarted multiple times, didn't see any panic (and stop completed successfully as expected) **Documentation:** None.
**Description:** This add logic to filter logs based on log conditions and sent desired logs as event markers to the honeycomb marker api. **Link to tracking Issue:** open-telemetry#27666 **Testing:** Unit testing for log exporter and config. Added component testing to `otelcontribcol`. **Documentation:** README describing component usage Screenshot of exported markers showing up in Honeycomb <img width="1225" alt="Screenshot 2023-11-14 at 1 27 49 PM" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/35741033/128d689a-cf1e-4959-9df3-6c88248a7fdb"> --------- Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…n-telemetry#29022) **Description:** - Added a new watch to the k8s_observer extension for k8s services, which can be enabled using a new flag "observe_services". - Discovered entities are transformed into a new endpoint type `k8s.service`. - Adjusted the receivercreator to support the new type `k8s.service` **Link to tracking Issue:** [open-telemetry#29021](open-telemetry#29021) **Testing:** Added unit tests analogue to the available tests **Documentation:** Adjusted readme's of k8s_observer and receivercreator. Added description of new flags and typers. **Note:** Current implementation is working as described in the linked ticket. Please check the potential discussion points mentioned in the ticket: open-telemetry#29021 (comment) --------- Co-authored-by: Antoine Toulme <antoine@toulme.name>
**Description:** Adds new a new `IsDouble` function to facilitate type checking. Most useful when checking the type of a body to determine if it needs to be parsed or not. **Link to tracking Issue:** open-telemetry#27895 **Testing:** Added unit test **Documentation:** Updated the func readme. Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…try#28894) **Description:** I have observed some behavior on a personal collector deployment where the EMF Exporter is still returning errors for `NaN` json marshalling. This was in a prometheus -> emf exporter metrics pipeline. I could not find the specific NaN value in the metrics when troubleshooting the error. I curled the `/metrics` endpoint and also tried using the logging exporter to try to get more information. I could not find where the NaN value was coming from so I took another look into the unit tests and found some possible code paths in which NaNs could slip though. **Link to tracking Issue:** Original issue open-telemetry#26267 **Testing:** Added more unit tests. The summary unit tests got a slight refactor for two reasons. So I could get ride of the unnecessary typecasting and so that we could more easily test out different combinations of quantile values. I have also added a few more histogram unit tests to just verify that all combinations of NaN values are being checked on their own.
) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> * Update AAD documentation to use connection string instead of instrumentation key. Follow up to open-telemetry#28854 * Modified the ingestion version from 2.0 to 2.1 **Link to tracking Issue:** <Issue number if applicable> **Testing:** <Describe what testing was performed and which tests were added.> Existing tests. Output from manual run ``` json --------- Transmitting 30 items --------- {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"} 2023-11-13T10:50:23.886-0800 debug azuremonitorexporter@v0.88.0/factory.go:139 Telemetry transmitted in 378.439395ms {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"} 2023-11-13T10:50:23.886-0800 debug azuremonitorexporter@v0.88.0/factory.go:139 Response: 200 {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"} 2023-11-13T10:50:23.886-0800 debug azuremonitorexporter@v0.88.0/factory.go:139 Items accepted/received: 30/30 {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"} ``` **Documentation:** <Describe the documentation added.> * Updated Authentication.md
…emetry#29309) **Description:** Fixes an issue with an incorrect default url. Also fixes issue where dataset slug was required. **Link to tracking Issue:** <Issue number if applicable> Related to open-telemetry#27666 **Testing:** <Describe what testing was performed and which tests were added.> Added new tests and tested manually. **Documentation:** <Describe the documentation added.> Updated up README
**Description:** Update Honeycomb Marker Exporter to alpha status **Link to tracking Issue:** open-telemetry#27666 **Testing:** **Documentation:** --------- Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…telemetry#28651) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This fixes inconsistency introduced with the creation of this package. In open-telemetry#25096 @cparkins was added as a code owner in the [metadata.yaml](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/azure/metadata.yaml) but not the top level `CODEOWNERS` file. Co-authored-by: Alex Boten <aboten@lightstep.com>
When InfluxDB v1 compatibility is enabled AND username&password are set, the exporter panics. Not any more! Fixes open-telemetry#27084 **Testing:** I've added one regression test.
open-telemetry/opentelemetry-collector#8939 Co-authored-by: Alex Boten <aboten@lightstep.com>
workflows have been failing and then trying to use `issuegenerator` to create issues, but the path for the tool was incorrect. see https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6895702499/job/18761957296 as an example Signed-off-by: Alex Boten <aboten@lightstep.com>
…metry#28866) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This feature adds a Project Config for the metrics to filter by Project name and or clusters. **Link to tracking Issue:** <Issue number if applicable> open-telemetry#28865 **Testing:** <Describe what testing was performed and which tests were added.> - Added test for cluster filtering - Tested project name alone, project name with IncludeClusters and project name with ExcludeClusters on a live environment with success. **Documentation:** <Describe the documentation added.> Added optional project config fields to README --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
…one function (open-telemetry#28886) If no functions are exposed, exit with no error. This change allows to remove `extension/encoding` from the allowlist.
**Description:** Using the mysqlreceiver, we were getting the following error as our MySQL server on AWS RDS requires secure transport for all connections by setting `require_secure_transport=ON` per https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql-ssl-connections.html#mysql-ssl-connections.require-ssl **Example log message** `2023-10-31T10:53:30.239Z error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "mysql", "data_type": "metrics", "error": "Error 3159 (HY000): Connections using insecure transport are prohibited while --require_secure_transport=ON.; ", "scraper": "mysql"}`
**Description:** We now have pdata 1.0.0 🎉. After open-telemetry/opentelemetry-collector/pull/8975, we decided not to have RC releases, so there is no need to have the RC block.
**Description:** Drawing inspiration from https://github.com/bazelbuild/starlark#design-principles and https://github.com/google/cel-spec/blob/master/doc/langdef.md#overview, add a brief section about design principles. The aim of this is to ensure OTTL is and remains safe for execution of untrusted programs in multi-tenant systems, where tenants can provide their own OTTL programs. --------- Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
**Link to tracking Issue:** Fixed open-telemetry#29568 **Testing:** demo-client and demo-server service show up in Jaeger
…ry#29692) @marctc is the original component proposer and author, and is now a [member of the OpenTelemetry community](open-telemetry/community#1761). He also [expressed interest in being a code owner](open-telemetry#24409 (comment)) when I asked.
) Fixes open-telemetry#28647 After this is merged contributors can finally use go workspaces in this repo. Fixes open-telemetry#26567 --------- Signed-off-by: Alex Boten <aboten@lightstep.com> Signed-off-by: Yuri Shkuro <github@ysh.us> Co-authored-by: Yuri Shkuro <github@ysh.us>
…try#29658) I need to resign from a few components, as I'm not doing a good job in keeping track of what needs to be done for them. Asking around, @yurishkuro volunteered to take over the Jaeger related ones. Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de> --------- Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
**Description:** <Describe what has changed.> The configuration for logs can be seen as stable, as we have many users using coralogix exporter in production with logs. Last changes to it were mainly documentation updates (open-telemetry@d5d6480) Skipping changelog as this is documentation update. **Link to tracking Issue:** <Issue number if applicable> **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.> - Update docs
…st metrics. (open-telemetry#27299) **Description:** The `node_<cpu|memory>_request` metrics and metrics derived from them (`node_<cpu|memory>_reserved_capacity`) differ from the output of `kubectl describe node <node_name>`. This is because kubectl [filters out terminated pods](https://github.com/kubernetes/kubectl/blob/302f330c8712e717ee45bbeff27e1d3008da9f00/pkg/describe/describe.go#L3624). See linked issue for more details. Adds a filter for terminated (succeeded/failed state) pods. **Link to tracking Issue:** open-telemetry#27262 **Testing:** Added unit test to validate pod state filtering. Built and deployed changes to cluster. Deployed `cpu-test` pod. ![image](https://github.com/amazon-contributing/opentelemetry-collector-contrib/assets/84729962/b557be2d-e14e-428a-895a-761f7724d9bd) The gap is when the change was deployed. The metric drops after the deployment due to the filter. The metric can be seen spiking up while the `cpu-test` pod is running (~19:15) and then returns to the previous request size after it has terminated. **Documentation:** N/A
This file is not referenced by any tests.
) The prometheus exporter hit a panic when accumulating `Delta` metrics into `Cumulative` sums. This is because the exporter does not enable mutating data in its capability. This change enables the exporter to mutate data in a safe and supported way. Fixed open-telemetry#29574 **Testing** There are existing tests that hit the logic that was panicking, but the metrics are set to `StateMutable` in testing (which is the only way they can be created and setup for testing). I believe that means that before this change the tests were invalid (didn't represent reality), but after this change they'll properly represent the exporter's functionality.
…y#29625) **Description:** Logstash format compatibility. Traces or Logs data can be written into an index in logstash format. <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> **Link to tracking Issue:** <Issue number if applicable> close open-telemetry#29624 **Documentation:** added some descriptions for `logstash_format ` configurations. 1. otel-col.yaml ```yaml receivers: otlp: protocols: grpc: filelog: include: [ ./examples/kubernetes/varlogpods/containerd_logs-0_000011112222333344445555666677778888/logs/0.log ] start_at: beginning operators: # Find out which format is used by kubernetes - type: router id: get-format routes: - output: parser-docker expr: 'body matches "^\\{"' - output: parser-crio expr: 'body matches "^[^ Z]+ "' - output: parser-containerd expr: 'body matches "^[^ Z]+Z"' # Parse CRI-O format - type: regex_parser id: parser-crio regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$' output: extract_metadata_from_filepath timestamp: parse_from: attributes.time layout_type: gotime layout: '2006-01-02T15:04:05.999999999Z07:00' # Parse CRI-Containerd format - type: regex_parser id: parser-containerd regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$' output: extract_metadata_from_filepath timestamp: parse_from: attributes.time layout: '%Y-%m-%dT%H:%M:%S.%LZ' # Parse Docker format - type: json_parser id: parser-docker output: extract_metadata_from_filepath timestamp: parse_from: attributes.time layout: '%Y-%m-%dT%H:%M:%S.%LZ' # Extract metadata from file path - type: regex_parser id: extract_metadata_from_filepath regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$' parse_from: attributes["log.file.path"] cache: size: 128 # default maximum amount of Pods per Node is 110 # Update body field after finishing all parsing - type: move from: attributes.log to: body # Rename attributes - type: move from: attributes.stream to: attributes["log.iostream"] - type: move from: attributes.container_name to: resource["k8s.container.name"] - type: move from: attributes.namespace to: resource["k8s.namespace.name"] - type: move from: attributes.pod_name to: resource["k8s.pod.name"] - type: move from: attributes.restart_count to: resource["k8s.container.restart_count"] - type: move from: attributes.uid to: resource["k8s.pod.uid"] exporters: prometheus: endpoint: "0.0.0.0:8889" const_labels: label1: value1 elasticsearch/log: tls: insecure: false endpoints: [http://localhost:9200] logs_index: otlp-logs logstash_format: enabled: true timeout: 2m flush: bytes: 10485760 retry: max_requests: 5 sending_queue: enabled: true elasticsearch/traces: tls: insecure: false endpoints: [http://localhost:9200] traces_index: otlp-traces logstash_format: enabled: true timeout: 2m flush: bytes: 10485760 retry: max_requests: 5 sending_queue: enabled: true debug: processors: batch: extensions: health_check: pprof: endpoint: :1888 zpages: endpoint: :55679 service: extensions: [pprof, zpages, health_check] pipelines: logs: receivers: [otlp,filelog] processors: [batch] exporters: [debug, elasticsearch/log] traces: receivers: [otlp] processors: [batch] exporters: [debug, elasticsearch/traces] ``` 3. es index created when `otel-col` write traces and logs: <img width="913" alt="image" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/0ede0fd7-ed85-4fd4-b843-093c13edc1e3"> 4. query index data: <img width="743" alt="image" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/1e89a44c-cead-4aab-8b3a-284a8b573d3b"> <img width="817" alt="image" src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/429c25bc-336e-4850-9d83-ed7423f38e90"> --------- Signed-off-by: Jared Tan <jian.tan@daocloud.io>
) There were some linting failures introduced in open-telemetry#27247. These are Windows and any non-Linux OS-specific linting failures.
…n-telemetry#29573) Fixing regression error in open-telemetry#29095 The name of the `timeField` in `generateTTLExpr` was ignored and defaulted to `Timestamp`. The problem is that different tables have different names for this field. Now it is specified at each table creation. --------- Co-authored-by: Alex Boten <aboten@lightstep.com>
**Description:** Adds a new ErrorMode, `silent`, that `StatementSequence` and `ConditionSequence` can use to disable logging when ignoring errors. **Link to tracking Issue:** Closes open-telemetry#22743 **Testing:** Updated unit tests **Documentation:** Updated READMEs and godoc comments.
Signed-off-by: Dmitrii Anoshin <anoshindx@gmail.com>
…tibility (open-telemetry#29662) **Description:** This PR supplements the receiver `influxdbreceiver` with an implementation of the `/ping` [endpoint](https://docs.influxdata.com/influxdb/v2/api/#operation/GetPing). Various third-party applications use this to check the availability of the receiver before sending metrics, e.g. checkmk. **Link to tracking Issue:** open-telemetry#29594 **Testing:** Basic tests and end to end testing with the third party application [checkmk](https://docs.checkmk.com/latest/en/metrics_exporter.html). **Documentation:** No additional documentation has been added. - The user does not interact directly with this endpoint. - There are no configuration options.
**Description:** @bryan-aguilar has been showing good judgement while helping out as a triager, codeowner, and community member. He has [authored](https://github.com/open-telemetry/opentelemetry-collector-contrib/pulls/bryan-aguilar) and [reviewed](https://github.com/open-telemetry/opentelemetry-collector-contrib/pulls?q=is%3Apr+is%3Aopen+reviewed-by%3Abryan-aguilar+) lots of PRs and would be a big help as an Approver. @bryan-aguilar please approve this PR if you'd like to be an Approver for Collector Contrib
…wn (open-telemetry#29707) **Description:** <Describe what has changed.> This change allows passing validation even some of K8S APIs is down, we will look thru the groups and resources for the ones available. <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> **Link to tracking Issue:** <Issue number if applicable> open-telemetry#29706 **Testing:** <Describe what testing was performed and which tests were added.> - manually in a kind cluster with metrics-server being down. **Documentation:** <Describe the documentation added.>
Regenerate CODEOWNERS manually: * add @braydonk to allowlist * remove @eedorenko from allowlist since he is now a member * fix typos, remove extra text that is no longer valid as the file is generated.
Same description as in open-telemetry/opentelemetry-collector#9022 This PR enables the HTTP2 health check to workaround the issue described here open-telemetry/opentelemetry-collector#9022 As to why I chose 10 seconds for `HTTP2ReadIdleTimeout` and 10 seconds for `HTTP2PingTimeout` Those values have been tested in production and they will result, in an active env (with default http timeout of 10 seconds and default retry settings), of a single export failure or (2 max) before the health check detects the corrupted tcp connection and closes it. The only drawback is if the connection was not used for over 10 seconds, we might end up sending unnecessary ping frames, which should not be an issue and if it became an issue, then we can tune those settings. The SFX exporter has multiples http clients: - Metric client, Trace client and Event client . Those client will have the http2 health check enabled by default as they share the same default config - Correlation client and Dimension client will NOT have the http2 health check enabled. We can revisit this if needed. **Link to tracking Issue:** <Issue number if applicable> **Testing:** <Describe what testing was performed and which tests were added.> - Run OTEL with one of the exporters that uses HTTP/2 client, example `signalfx` exporter - For simplicity use a single pipeline/exporter - In a different shell, run this to watch the tcp state of the established connection ``` while (true); do echo date; sudo netstat -anp | grep -E '<endpoin_ip_address(es)>' | sort -k 5; sleep 2; done ``` - From the netstat, take a note of the source port and the source IP address - replace <> from previous step `sudo iptables -A OUTPUT -s <source_IP> -p tcp --sport <source_Port> -j DROP` - Note how the OTEL exporter export starts timing out Expected Result: - A new connection should be established, similarly to http/1 and exports should succeed Actual Result: - The exports keep failing for ~ 15 minutes or for whatever the OS `tcp_retries2` is configured to - After 15 minutes, a new tcp connection is created and exports start working **Documentation:** <Describe the documentation added.> Readme is updated Signed-off-by: Dani Louca <dlouca@splunk.com>
Fixes open-telemetry#29723 Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>
**Description:** This PR enables the HTTP2 health check to workaround the issue described here open-telemetry/opentelemetry-collector#9022 As to why I chose 10 seconds for `HTTP2ReadIdleTimeout` and ~~5 seconds~~ 10 seconds (see review comment) for `HTTP2PingTimeout` Those values have been tested in production and they will result, in an active env (with default http timeout of 10 seconds and default retry settings), of a single export failure at max before the health check detects the corrupted tcp connection and closes it. The only drawback is if the connection was not used for over 10 seconds, we might end up sending unnecessary ping frames, which should not be an issue and if it became an issue, then we can tune those settings. The SFX exporter has multiples http clients: - Metric client, Trace client and Event client . Those client will have the http2 health check enabled by default as they share the same default config - Correlation client and Dimension client will NOT have the http2 health check enabled. We can revisit this if needed. **Testing:** - Run OTEL with one of the exporters that uses HTTP/2 client, example `signalfx` exporter - For simplicity use a single pipeline/exporter - In a different shell, run this to watch the tcp state of the established connection ``` while (true); do echo date; sudo netstat -anp | grep -E '<endpoin_ip_address(es)>' | sort -k 5; sleep 2; done ``` - From the netstat, take a note of the source port and the source IP address - replace <> from previous step `sudo iptables -A OUTPUT -s <source_IP> -p tcp --sport <source_Port> -j DROP` - Note how the OTEL exporter export starts timing out Expected Result: - A new connection should be established, similarly to http/1 and exports should succeed Actual Result: - The exports keep failing for ~ 15 minutes or for whatever the OS `tcp_retries2` is configured to - After 15 minutes, a new tcp connection is created and exports start working **Documentation:** <Describe the documentation added.> Readme is updated **Disclaimer:** Not all HTTP/2 servers support H2 Ping, however, this should not be a concern as our ingest servers do support H2 ping. But if you are routing you can check if H2 ping is supported using this script golang/go#60818 (comment) Signed-off-by: Dani Louca <dlouca@splunk.com>
…metry#29725) Adds the extension remotetapextension to cmd/otelcontribcol.
…#29735) This change is part of open-telemetry#27849
Preparing for 0.91.0 release --------- Signed-off-by: Dmitrii Anoshin <anoshindx@gmail.com>
The following commands were run to prepare this release: - make chlog-update VERSION=v0.91.0 - sed -i.bak s/0.90.1/0.91.0/g versions.yaml - make multimod-prerelease - make multimod-sync
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Link to tracking Issue:
Testing:
Documentation: