Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/v0.91.x #11

Merged
merged 513 commits into from
Feb 7, 2024
Merged

Release/v0.91.x #11

merged 513 commits into from
Feb 7, 2024

Conversation

lokesh-balla
Copy link

Description:

Link to tracking Issue:

Testing:

Documentation:

pjanotti and others added 30 commits November 15, 2023 14:09
…ixes (open-telemetry#28682)

**Description:**
Part 2 of open-telemetry#28679 these are the tests that can be re-enabled without
requiring any code changes after open-telemetry#28680 is merged.

**Link to tracking Issue:**
Related to open-telemetry#28679
…trol (open-telemetry#29095)

**Description:**
Added support for more control over TTL configuration. Currently, it
supports timelines only in days, and now also in hours, minutes and
seconds.

**Link to tracking Issue:**
[28675](open-telemetry#28675)
…leased in contrib (open-telemetry#29275)

An issue was opened recently wondering why this processory was not
available in the contrib release. Since this processor is experimental
and temporary there's no plan to support it long term, so I've added a
note that makes it clear why it's not included in the contrib
distribution releases.

Fixes open-telemetry#29150
…s and contexts (open-telemetry#29241)

**Description:** 
Updates the OTTL readme to make it easier to find functions and paths,
which is what most people are looking for.

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
…negotiation (open-telemetry#29153)

The code needs some basic tests that can be later expanded with tests
for native histograms use cases.

Changes:
Refactored `testComponent` function to be easier to customize the
configuration of the scrape.
Expanded `compareHistogram` to assert on the explicit boundaries as
well.
Added function `prometheusMetricFamilyToProtoBuf` to helpers to be able
to turn serialize a Prometheus metric family
into Protobuf.
Added simple test of Protobuf based scrape of counters, gauges,
summaries and histograms.

open-telemetry#26555  

Followup to open-telemetry#27030 
Related to open-telemetry#28663 

**Testing:**

Adding simple e2e test for scraping over Protobuf. 

**Documentation:** 

Not applicable.

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: David Ashpole <dashpole@google.com>
**Description:** 

This is a namedpipe input operator, which will read from a named pipe
and send the data to the pipeline. It pretty closely mimics the file
input operator, but with a few differences.

In particular, named pipes have an interesting property that they
receive EOFs when a writer closes the pipe, but that _doesn't_ mean that
the pipe is closed. To solve this issue, we crib from existing `tail -f`
implementations and use an inotify watcher to detect whenever the pipe
receives new data, and then read it using the standard `bufio.Scanner`
reader.

**Link to tracking Issue:** open-telemetry#27234

**Testing:**

We add a couple of tests for the new operator. The first tests simply
the creation of the named pipe - checking that it's created as a pipe,
with the right permissions. The second goes further by inserting logs
over several different `Open`s into the pipe, testing that the logs are
read, and that the operator is able to handle the named pipe behavior of
skipping over EOFs.

**Documentation:**

None, at the moment

/cc @djaglowski

---------

Signed-off-by: sinkingpoint <colin@quirl.co.nz>
This is the Part 1 PR for the Failover Connector (split according to the
CONTRIBUTING.md doc)

Link to tracking Issue: open-telemetry#20766 

Testing: Added factory test

Note: Full functionality PR exists
[here](open-telemetry#27641)
and will likely be refactored to serve as the part 2 PR

cc: @djaglowski @sethallen @MovieStoreGuy
…coder when nop encoding is defined (open-telemetry#28901)

**Description:** Enhancement - In udp receiver (stanza operator), change
handleMessage not to call decode method in case nop encoding is defined,
as it's unnecessary.
This improves performance in high scale scenarios by reducing memory
allocations.

**Link to tracking Issue:** <28899

**Testing:** Ran existing unitests.
Ran ran stress tests (sending 250k udp packets per second) - memory
allocation reduced by 10-20%.

**Documentation:** None
This updates the githubgen allowlist to reflect membership of current
codeowners.
Signed-off-by: Alex Boten <aboten@lightstep.com>
open-telemetry#29116)

**Description:** 

As originally proposed in open-telemetry#26991 before I got distracted

Exposes the duration of generated spans as a command line parameter. It
uses a `DurationVar` flag so units can be easily provided and are
automatically applied.

Example usage:

```bash
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ns # nanoseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10us # microseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ms # milliseconds
telemetrygen traces --traces 100 --otlp-insecure --span-duration 10s # seconds
```

**Testing:** 

Ran without the argument provided `telemetrygen traces --traces 1
--otlp-insecure` and seen spans publishing with the default value.

Ran again with the argument provided: `telemetrygen traces --traces 1
--otlp-insecure --span-duration 1s`

And observed the expected output:

```
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
     -> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL: 
InstrumentationScope telemetrygen 
Span #0
    Trace ID       : 8b441587ffa5820688b87a6b511d634c
    Parent ID      : 39faad428638791b
    ID             : 88f0886894bd4ee2
    Name           : okey-dokey
    Kind           : Server
    Start time     : 2023-11-12 02:05:07.97443 +0000 UTC
    End time       : 2023-11-12 02:05:08.97443 +0000 UTC
    Status code    : Unset
    Status message : 
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-client)
Span #1
    Trace ID       : 8b441587ffa5820688b87a6b511d634c
    Parent ID      : 
    ID             : 39faad428638791b
    Name           : lets-go
    Kind           : Client
    Start time     : 2023-11-12 02:05:07.97443 +0000 UTC
    End time       : 2023-11-12 02:05:08.97443 +0000 UTC
    Status code    : Unset
    Status message : 
Attributes:
     -> net.peer.ip: Str(1.2.3.4)
     -> peer.service: Str(telemetrygen-server)
	{"kind": "exporter", "data_type": "traces", "name": "debug"}
```

**Documentation:** No documentation added.

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
…n-telemetry#29121)

**Description:** 
Link to tracking Issue: Fixes open-telemetry#29120
Fixing a bug - panic happens during stop method in async mode only
(didn't affect the default non-async mode).
When stop is called, it closes the messageQueue channel, signaling to
processMessagesAsync to stop running. However, readMessagesAsync
sometimes tries to write into the closed channel (depends whether the
method is currently reading from the closed connection or currently
trying to write to the channel), and as a result, panic error happens.

Separated between wg (waitGroup that serves non-async code and
processMessagesAsync) & the new wg_reader (waitGroup serving
readMessagesAsync only). This allows us to first stop readMessagesAsync,
wait for it to finish, before closing the channel.
Instead, stop (in async mode) should do the following:
1. Close the connection - signaling readMessagesAsync to stop - the
messageQueue channel will remain open until that method is done so
there's no risk of panic (due to writing to a closed channel).
2. Wait for readMessagesAsync to finish (wait for new wg_reader).
3. Close messageQueue channel (signaling processMessagesAsync to stop)
4. Wait for processMessagesAsync to finish (wait sg).

**Link to tracking Issue:** 29120

**Testing:** Unitests ran. Ran concrete strato, stopped & restarted
multiple times, didn't see any panic (and stop completed successfully as
expected)

**Documentation:** None.
**Description:** This add logic to filter logs based on log conditions
and sent desired logs as event markers to the honeycomb marker api.

**Link to tracking Issue:**
open-telemetry#27666

**Testing:** Unit testing for log exporter and config. Added component
testing to `otelcontribcol`.

**Documentation:** README describing component usage

Screenshot of exported markers showing up in Honeycomb
<img width="1225" alt="Screenshot 2023-11-14 at 1 27 49 PM"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/35741033/128d689a-cf1e-4959-9df3-6c88248a7fdb">

---------

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…n-telemetry#29022)

**Description:**
- Added a new watch to the k8s_observer extension for k8s services,
which can be enabled using a new flag "observe_services".
- Discovered entities are transformed into a new endpoint type
`k8s.service`.
- Adjusted the receivercreator to support the new type `k8s.service`


**Link to tracking Issue:**
[open-telemetry#29021](open-telemetry#29021)

**Testing:** Added unit tests analogue to the available tests

**Documentation:** Adjusted readme's of k8s_observer and
receivercreator. Added description of new flags and typers.

**Note:**
Current implementation is working as described in the linked ticket.
Please check the potential discussion points mentioned in the ticket:
open-telemetry#29021 (comment)

---------

Co-authored-by: Antoine Toulme <antoine@toulme.name>
**Description:** Adds new a new `IsDouble` function to facilitate type
checking. Most useful when checking the type of a body to determine if
it needs to be parsed or not.

**Link to tracking Issue:**
open-telemetry#27895

**Testing:** Added unit test

**Documentation:** Updated the func readme.

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…try#28894)

**Description:** I have observed some behavior on a personal collector
deployment where the EMF Exporter is still returning errors for `NaN`
json marshalling. This was in a prometheus -> emf exporter metrics
pipeline.

I could not find the specific NaN value in the metrics when
troubleshooting the error. I curled the `/metrics` endpoint and also
tried using the logging exporter to try to get more information. I could
not find where the NaN value was coming from so I took another look into
the unit tests and found some possible code paths in which NaNs could
slip though.

**Link to tracking Issue:** Original issue
open-telemetry#26267

**Testing:** Added more unit tests. The summary unit tests got a slight
refactor for two reasons. So I could get ride of the unnecessary
typecasting and so that we could more easily test out different
combinations of quantile values.

I have also added a few more histogram unit tests to just verify that
all combinations of NaN values are being checked on their own.
)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
* Update AAD documentation to use connection string instead of
instrumentation key. Follow up to open-telemetry#28854
* Modified the ingestion version from 2.0 to 2.1

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

Existing tests.

Output from manual run

``` json
--------- Transmitting 30 items ---------       {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"}
2023-11-13T10:50:23.886-0800    debug   azuremonitorexporter@v0.88.0/factory.go:139     Telemetry transmitted in 378.439395ms   {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"}
2023-11-13T10:50:23.886-0800    debug   azuremonitorexporter@v0.88.0/factory.go:139     Response: 200   {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"}
2023-11-13T10:50:23.886-0800    debug   azuremonitorexporter@v0.88.0/factory.go:139     Items accepted/received: 30/30 {"kind": "exporter", "data_type": "logs", "name": "azuremonitor"}
```

**Documentation:** <Describe the documentation added.>
* Updated Authentication.md
…emetry#29309)

**Description:** 
Fixes an issue with an incorrect default url. Also fixes issue where
dataset slug was required.

**Link to tracking Issue:** <Issue number if applicable>
Related to
open-telemetry#27666

**Testing:** <Describe what testing was performed and which tests were
added.>
Added new tests and tested manually.

**Documentation:** <Describe the documentation added.>
 Updated up README
**Description:** Update Honeycomb Marker Exporter to alpha status

**Link to tracking Issue:** open-telemetry#27666 

**Testing:** 

**Documentation:**

---------

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
…telemetry#28651)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This fixes inconsistency introduced with the creation of this package.
In open-telemetry#25096 @cparkins was added as a code owner in the
[metadata.yaml](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/azure/metadata.yaml)
but not the top level `CODEOWNERS` file.

Co-authored-by: Alex Boten <aboten@lightstep.com>
When InfluxDB v1 compatibility is enabled AND username&password are set,
the exporter panics. Not any more!

Fixes open-telemetry#27084

**Testing:** I've added one regression test.
workflows have been failing and then trying to use `issuegenerator` to
create issues, but the path for the tool was incorrect. see
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6895702499/job/18761957296
as an example

Signed-off-by: Alex Boten <aboten@lightstep.com>
…metry#28866)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This feature adds a Project Config for the metrics to filter by Project
name and or clusters.

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#28865 

**Testing:** <Describe what testing was performed and which tests were
added.>
- Added test for cluster filtering
- Tested project name alone, project name with IncludeClusters and
project name with ExcludeClusters on a live environment with success.

**Documentation:** <Describe the documentation added.>
Added optional project config fields to README

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
…one function (open-telemetry#28886)

If no functions are exposed, exit with no error.

This change allows to remove `extension/encoding` from the allowlist.
**Description:** 
Using the mysqlreceiver, we were getting the following error as our
MySQL server on AWS RDS requires secure transport for all connections by
setting `require_secure_transport=ON` per
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql-ssl-connections.html#mysql-ssl-connections.require-ssl

**Example log message**
`2023-10-31T10:53:30.239Z error scraperhelper/scrapercontroller.go:200
Error scraping metrics {"kind": "receiver", "name": "mysql",
"data_type": "metrics", "error": "Error 3159 (HY000): Connections using
insecure transport are prohibited while --require_secure_transport=ON.;
", "scraper": "mysql"}`
mx-psi and others added 29 commits December 7, 2023 18:21
**Description:** 

We now have pdata 1.0.0 🎉. After
open-telemetry/opentelemetry-collector/pull/8975, we decided not to have
RC releases, so there is no need to have the RC block.
**Description:**

Drawing inspiration from
https://github.com/bazelbuild/starlark#design-principles and
https://github.com/google/cel-spec/blob/master/doc/langdef.md#overview,
add a brief section about design principles.

The aim of this is to ensure OTTL is and remains safe for execution of
untrusted programs in multi-tenant systems, where tenants can provide
their own OTTL programs.

---------

Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
**Link to tracking Issue:**
Fixed
open-telemetry#29568

**Testing:**
demo-client and demo-server service show up in Jaeger
…ry#29692)

@marctc is the original component proposer and author, and is now a
[member of the OpenTelemetry
community](open-telemetry/community#1761). He
also [expressed interest in being a code
owner](open-telemetry#24409 (comment))
when I asked.
)

Fixes open-telemetry#28647

After this is merged contributors can finally use go workspaces in this
repo.

Fixes open-telemetry#26567

---------

Signed-off-by: Alex Boten <aboten@lightstep.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
…try#29658)

I need to resign from a few components, as I'm not doing a good job in
keeping track of what needs to be done for them. Asking around,
@yurishkuro volunteered to take over the Jaeger related ones.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
**Description:** <Describe what has changed.>

The configuration for logs can be seen as stable, as we have many users
using coralogix exporter in production with logs. Last changes to it
were mainly documentation updates
(open-telemetry@d5d6480)

Skipping changelog as this is documentation update.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>

- Update docs
…st metrics. (open-telemetry#27299)

**Description:** The `node_<cpu|memory>_request` metrics and metrics
derived from them (`node_<cpu|memory>_reserved_capacity`) differ from
the output of `kubectl describe node <node_name>`. This is because
kubectl [filters out terminated
pods](https://github.com/kubernetes/kubectl/blob/302f330c8712e717ee45bbeff27e1d3008da9f00/pkg/describe/describe.go#L3624).
See linked issue for more details.

Adds a filter for terminated (succeeded/failed state) pods. 

**Link to tracking Issue:**
open-telemetry#27262

**Testing:** Added unit test to validate pod state filtering. Built and
deployed changes to cluster. Deployed `cpu-test` pod.


![image](https://github.com/amazon-contributing/opentelemetry-collector-contrib/assets/84729962/b557be2d-e14e-428a-895a-761f7724d9bd)


The gap is when the change was deployed. The metric drops after the
deployment due to the filter. The metric can be seen spiking up while
the `cpu-test` pod is running (~19:15) and then returns to the previous
request size after it has terminated.

**Documentation:** N/A
This file is not referenced by any tests.
)

The prometheus exporter hit a panic when accumulating `Delta` metrics
into `Cumulative` sums. This is because the exporter does not enable
mutating data in its capability. This change enables the exporter to
mutate data in a safe and supported way.

Fixed open-telemetry#29574

**Testing**
There are existing tests that hit the logic that was panicking, but the
metrics are set to `StateMutable` in testing (which is the only way they
can be created and setup for testing). I believe that means that before
this change the tests were invalid (didn't represent reality), but after
this change they'll properly represent the exporter's functionality.
…y#29625)

**Description:** Logstash format compatibility. Traces or Logs data can
be written into an index in logstash format.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

**Link to tracking Issue:** <Issue number if applicable>
close
open-telemetry#29624

**Documentation:** added some descriptions for `logstash_format `
configurations.
1. otel-col.yaml
```yaml
receivers:
  otlp:
    protocols:
      grpc:
  filelog:
    include: [ ./examples/kubernetes/varlogpods/containerd_logs-0_000011112222333344445555666677778888/logs/0.log ]
    start_at: beginning
    operators:
      # Find out which format is used by kubernetes
      - type: router
        id: get-format
        routes:
          - output: parser-docker
            expr: 'body matches "^\\{"'
          - output: parser-crio
            expr: 'body matches "^[^ Z]+ "'
          - output: parser-containerd
            expr: 'body matches "^[^ Z]+Z"'
      # Parse CRI-O format
      - type: regex_parser
        id: parser-crio
        regex: '^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
        output: extract_metadata_from_filepath
        timestamp:
          parse_from: attributes.time
          layout_type: gotime
          layout: '2006-01-02T15:04:05.999999999Z07:00'
      # Parse CRI-Containerd format
      - type: regex_parser
        id: parser-containerd
        regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
        output: extract_metadata_from_filepath
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      # Parse Docker format
      - type: json_parser
        id: parser-docker
        output: extract_metadata_from_filepath
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      # Extract metadata from file path
      - type: regex_parser
        id: extract_metadata_from_filepath
        regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
        parse_from: attributes["log.file.path"]
        cache:
          size: 128  # default maximum amount of Pods per Node is 110
      # Update body field after finishing all parsing
      - type: move
        from: attributes.log
        to: body
      # Rename attributes
      - type: move
        from: attributes.stream
        to: attributes["log.iostream"]
      - type: move
        from: attributes.container_name
        to: resource["k8s.container.name"]
      - type: move
        from: attributes.namespace
        to: resource["k8s.namespace.name"]
      - type: move
        from: attributes.pod_name
        to: resource["k8s.pod.name"]
      - type: move
        from: attributes.restart_count
        to: resource["k8s.container.restart_count"]
      - type: move
        from: attributes.uid
        to: resource["k8s.pod.uid"]
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    const_labels:
      label1: value1

  elasticsearch/log:
    tls:
      insecure: false
    endpoints: [http://localhost:9200]
    logs_index: otlp-logs
    logstash_format:
      enabled: true
    timeout: 2m
    flush:
      bytes: 10485760
    retry:
      max_requests: 5
    sending_queue:
      enabled: true
  elasticsearch/traces:
    tls:
      insecure: false
    endpoints: [http://localhost:9200]
    traces_index: otlp-traces
    logstash_format:
      enabled: true
    timeout: 2m
    flush:
      bytes: 10485760
    retry:
      max_requests: 5
    sending_queue:
      enabled: true

  debug:

processors:
  batch:

extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    logs:
      receivers: [otlp,filelog]
      processors: [batch]
      exporters: [debug, elasticsearch/log]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, elasticsearch/traces]

```
3. es index created when `otel-col` write traces and logs:
<img width="913" alt="image"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/0ede0fd7-ed85-4fd4-b843-093c13edc1e3">

4. query index data:
<img width="743" alt="image"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/1e89a44c-cead-4aab-8b3a-284a8b573d3b">
<img width="817" alt="image"
src="https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/12468337/429c25bc-336e-4850-9d83-ed7423f38e90">

---------

Signed-off-by: Jared Tan <jian.tan@daocloud.io>
)

There were some linting failures introduced in open-telemetry#27247. These are Windows
and any non-Linux OS-specific linting failures.
…n-telemetry#29573)

Fixing regression error in
open-telemetry#29095

The name of the `timeField` in `generateTTLExpr` was ignored and
defaulted to `Timestamp`. The problem is that different tables have
different names for this field. Now it is specified at each table
creation.

---------

Co-authored-by: Alex Boten <aboten@lightstep.com>
**Description:**
Adds a new ErrorMode, `silent`, that `StatementSequence` and
`ConditionSequence` can use to disable logging when ignoring errors.

**Link to tracking Issue:** 

Closes
open-telemetry#22743

**Testing:**
Updated unit tests

**Documentation:** 
Updated READMEs and godoc comments.
Signed-off-by: Dmitrii Anoshin <anoshindx@gmail.com>
…tibility (open-telemetry#29662)

**Description:** This PR supplements the receiver `influxdbreceiver`
with an implementation of the `/ping`
[endpoint](https://docs.influxdata.com/influxdb/v2/api/#operation/GetPing).
Various third-party applications use this to check the availability of
the receiver before sending metrics, e.g. checkmk.

**Link to tracking Issue:** open-telemetry#29594

**Testing:** Basic tests and end to end testing with the third party
application
[checkmk](https://docs.checkmk.com/latest/en/metrics_exporter.html).

**Documentation:** No additional documentation has been added.
- The user does not interact directly with this endpoint.
- There are no configuration options.
**Description:** 
@bryan-aguilar has been showing good judgement while helping out as a
triager, codeowner, and community member. He has
[authored](https://github.com/open-telemetry/opentelemetry-collector-contrib/pulls/bryan-aguilar)
and
[reviewed](https://github.com/open-telemetry/opentelemetry-collector-contrib/pulls?q=is%3Apr+is%3Aopen+reviewed-by%3Abryan-aguilar+)
lots of PRs and would be a big help as an Approver.

@bryan-aguilar please approve this PR if you'd like to be an Approver
for Collector Contrib
…wn (open-telemetry#29707)

**Description:** <Describe what has changed.>

This change allows passing validation even some of K8S APIs is down, we
will look thru the groups and resources for the ones available.


<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

**Link to tracking Issue:** <Issue number if applicable>


open-telemetry#29706

**Testing:** <Describe what testing was performed and which tests were
added.>
- manually in a kind cluster with metrics-server being down.

**Documentation:** <Describe the documentation added.>
Regenerate CODEOWNERS manually:
* add @braydonk to allowlist
* remove @eedorenko from allowlist since he is now a member
* fix typos, remove extra text that is no longer valid as the file is
generated.
Same description as in
open-telemetry/opentelemetry-collector#9022

This PR enables the HTTP2 health check to workaround the issue described
here open-telemetry/opentelemetry-collector#9022

As to why I chose 10 seconds for `HTTP2ReadIdleTimeout` and 10 seconds
for `HTTP2PingTimeout`
Those values have been tested in production and they will result, in an
active env (with default http timeout of 10 seconds and default retry
settings), of a single export failure or (2 max) before the health check
detects the corrupted tcp connection and closes it.
The only drawback is if the connection was not used for over 10 seconds,
we might end up sending unnecessary ping frames, which should not be an
issue and if it became an issue, then we can tune those settings.

The SFX exporter has multiples http clients:
- Metric client, Trace client and Event client . Those client will have
the http2 health check enabled by default as they share the same default
config
- Correlation client and Dimension client will NOT have the http2 health
check enabled. We can revisit this if needed.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>
- Run OTEL with one of the exporters that uses HTTP/2 client, example
`signalfx` exporter
- For simplicity use a single pipeline/exporter
- In a different shell, run this to watch the tcp state of the
established connection
```
 while (true); do echo date; sudo netstat -anp | grep -E '<endpoin_ip_address(es)>' | sort -k 5; sleep 2; done
 ```  
- From the netstat, take a note of the source port and the source IP address
- replace <> from previous step
`sudo iptables -A OUTPUT -s <source_IP> -p tcp --sport <source_Port> -j DROP`
- Note how the OTEL exporter export starts timing out

Expected Result:
- A new connection should be established, similarly to http/1 and exports should succeed

Actual Result: 
- The exports keep failing for  ~ 15 minutes or for whatever the OS `tcp_retries2` is configured to
- After 15 minutes, a new tcp connection is created and exports start working

**Documentation:** <Describe the documentation added.>
Readme is updated

Signed-off-by: Dani Louca <dlouca@splunk.com>
Fixes
open-telemetry#29723

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>
**Description:**
This PR enables the HTTP2 health check to workaround the issue described
here open-telemetry/opentelemetry-collector#9022

As to why I chose 10 seconds for `HTTP2ReadIdleTimeout` and ~~5
seconds~~ 10 seconds (see review comment) for `HTTP2PingTimeout`
Those values have been tested in production and they will result, in an
active env (with default http timeout of 10 seconds and default retry
settings), of a single export failure at max before the health check
detects the corrupted tcp connection and closes it.
The only drawback is if the connection was not used for over 10 seconds,
we might end up sending unnecessary ping frames, which should not be an
issue and if it became an issue, then we can tune those settings.

The SFX exporter has multiples http clients:
- Metric client, Trace client and Event client . Those client will have
the http2 health check enabled by default as they share the same default
config
- Correlation client and Dimension client will NOT have the http2 health
check enabled. We can revisit this if needed.

**Testing:** 
- Run OTEL with one of the exporters that uses HTTP/2 client, example
`signalfx` exporter
- For simplicity use a single pipeline/exporter
- In a different shell, run this to watch the tcp state of the
established connection
```
 while (true); do echo date; sudo netstat -anp | grep -E '<endpoin_ip_address(es)>' | sort -k 5; sleep 2; done
 ```  
- From the netstat, take a note of the source port and the source IP address
- replace <> from previous step
`sudo iptables -A OUTPUT -s <source_IP> -p tcp --sport <source_Port> -j DROP`
- Note how the OTEL exporter export starts timing out

Expected Result:
- A new connection should be established, similarly to http/1 and exports should succeed

Actual Result: 
- The exports keep failing for  ~ 15 minutes or for whatever the OS `tcp_retries2` is configured to
- After 15 minutes, a new tcp connection is created and exports start working

**Documentation:** <Describe the documentation added.>
Readme is updated

**Disclaimer:**
Not all HTTP/2 servers support H2 Ping, however, this should not be a concern as our ingest servers do support H2 ping.
But if you are routing you can check if H2 ping is supported using this script golang/go#60818 (comment)

Signed-off-by: Dani Louca <dlouca@splunk.com>
…metry#29725)

Adds the extension remotetapextension to cmd/otelcontribcol.
Preparing for 0.91.0 release

---------

Signed-off-by: Dmitrii Anoshin <anoshindx@gmail.com>
The following commands were run to prepare this release:
- make chlog-update VERSION=v0.91.0
- sed -i.bak s/0.90.1/0.91.0/g versions.yaml
- make multimod-prerelease
- make multimod-sync
@lokesh-balla lokesh-balla merged commit a4e6819 into main Feb 7, 2024
15 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.