Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added timeout to the exporter path #1201

Closed

Conversation

jpkrohling
Copy link
Member

@jpkrohling jpkrohling commented Jun 26, 2020

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

Description:

  1. added context with timeout to the fan-out processor
  2. wrapped errors recorded during the consume functions to include context about the error (where it happened, what kind of data failed to be consumed)
  3. changed batch processor to not ignore errors, logging them as Warn instead, in line with the queued processor

Link to tracking Issue: Resolves #1193

Testing: manual tests, as per #1193

Documentation: pending

@jpkrohling
Copy link
Member Author

This PR is to start a discussion about the right way to approach this problem. Should the timeout be configurable? If so, how?

@codecov
Copy link

codecov bot commented Jun 26, 2020

Codecov Report

Merging #1201 into master will decrease coverage by 0.04%.
The diff coverage is 39.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1201      +/-   ##
==========================================
- Coverage   87.56%   87.52%   -0.05%     
==========================================
  Files         203      203              
  Lines       14555    14568      +13     
==========================================
+ Hits        12745    12750       +5     
- Misses       1370     1376       +6     
- Partials      440      442       +2     
Impacted Files Coverage Δ
exporter/opencensusexporter/opencensus.go 45.05% <0.00%> (ø)
exporter/otlpexporter/otlp.go 63.24% <16.66%> (ø)
exporter/zipkinexporter/zipkin.go 72.54% <25.00%> (-2.97%) ⬇️
processor/batchprocessor/batch_processor.go 95.77% <33.33%> (-1.39%) ⬇️
exporter/jaegerexporter/exporter.go 86.20% <50.00%> (ø)
processor/fanoutconnector.go 82.50% <66.66%> (-0.36%) ⬇️
service/service.go 51.37% <0.00%> (-0.69%) ⬇️
translator/internaldata/resource_to_oc.go 73.52% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eca960...5c5a2f5. Read the comment docs.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling jpkrohling force-pushed the jpkrohling/issue1193 branch from cfcc05c to 5c5a2f5 Compare June 26, 2020 09:30
@@ -71,7 +72,7 @@ func (s *protoGRPCSender) pushTraceData(

batches, err := jaegertranslator.InternalTracesToJaegerProto(td)
if err != nil {
return td.SpanCount(), consumererror.Permanent(err)
return td.SpanCount(), consumererror.Permanent(errors.Wrap(err, "failed to push trace data via Jaeger exporter"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC more used is now fmt.Errorf within the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like people prefer the Wrap, we should converge to that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had fmt.Errorf but noticed that the code base seems to use errors.Wrap. I don't have a preference, to be honest :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use fmt.Errorf with %w as we've moved to Go 1.14 and the new errors package is available now. We can actually get rid of pkg/errors package.

Copy link
Member

@nilebox nilebox Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about errors.Wrap is it's type safe and you don't need to add ...: %w" to every single fmt.Errorf call wrapping another error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the use of timeout in fanoutconnector. I think we should have this logic probably in queued_retry or in every exporter.

@@ -25,6 +26,8 @@ import (
"go.opentelemetry.io/collector/internal/data"
)

const timeout = 5 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fanoutconnector is used in multiple places (between different components) and we should not have the logic of timeout here.

@jpkrohling
Copy link
Member Author

I think we should have this logic probably in queued_retry or in every exporter.

Adding it to every exporter might still cause misbehaving exporters to block the pipeline. Adding to the queued retry could be a solution, but this processor isn't applied by default, right? I think the collector needs a protection against deadlocks/misbehaving processors/exporters.

Ideally, as someone managing a collector instance, I would probably prefer to set a deadline for the entire pipeline (or processors, or exporters), as opposed to having a timeout per exporter.

@bogdandrutu
Copy link
Member

@jpkrohling let's try to make progress on this:

  1. Split the first PR (all changes except processor/fanoutconnector.go)
  2. Analyze the possibility to add deadline in the context - I will thing about the right solution as well.

What do you think?

@jpkrohling
Copy link
Member Author

Done: #1259

@jpkrohling
Copy link
Member Author

@bogdandrutu, any ideas on how to move forward with the deadline?

@bogdandrutu
Copy link
Member

@jpkrohling if I add it to the queue retry is that good enough? Also currently this is not a deadlock, just a stuck operation :)

@jpkrohling
Copy link
Member Author

if I add it to the queue retry is that good enough?

Are all users instructed to use the queued retry? If so, sure. The main idea is to cancel the execution of a pipeline component taking an unreasonable amount of time to complete.

Also currently this is not a deadlock, just a stuck operation :)

Right, I meant cases in the future where we might get affected by new bugs ;-)

@bogdandrutu
Copy link
Member

@jpkrohling in #1386 I added timeout (5s) for all the exporters that use the new interface (only one that does not is zipkin).

@jpkrohling
Copy link
Member Author

Closing in favor of #1386 .

@jpkrohling jpkrohling closed this Jul 17, 2020
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this pull request Nov 11, 2021
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this pull request Apr 27, 2023
…y#1201)

Bumps [boto3](https://github.com/boto/boto3) from 1.20.48 to 1.20.49.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.20.48...1.20.49)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this pull request Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gRPC clients need to always set a deadline/timeout
5 participants