Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEP-0124] Proposal to add distributed tracing for tekton tasks and pipelines #839

Merged
merged 5 commits into from
Oct 12, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add references and requirements
Signed-off-by: Jayadeep KM <kmjayadeep@gmail.com>
  • Loading branch information
kmjayadeep committed Oct 4, 2022
commit b870b694cca49a5bd1de2e050c44e3ab0a3d3557
18 changes: 18 additions & 0 deletions teps/0124-distributed-tracing-for-tasks-and-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@ authors:
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Use Cases](#use-cases)
- [Requirements](#requirements)
- [Proposal](#proposal)
- [PipelineRun controller](#pipelinerun-controller)
- [TaskRun controller](#taskrun-controller)
- [Test Plan](#test-plan)
- [References](#references)
<!-- /toc -->

## Summary
Expand Down Expand Up @@ -50,6 +52,15 @@ Tekton Developer:
* I would like to understand the duration of each reconciliation step, so that I can optimize the code to improve reconciliation performance
* When the pipelines are failing due to a bug, I would like to understand which reconciliation logic caused the issue so that I can easily fix the problem

### Requirements

* Trace all functions in the PipelineRun controller
* Trace all functions in the TaskRun controller
* Support Jaeger backend
* Propagate traces so that subsequent reconciles of the same resource belong to the same trace
* Propagate traces so that reconciles of a resource owned by a parent resource belong a parent span from the parent resource
* Reconcile of different resources must belong to separate traces

## Proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small architecture diagram of how jaeger integrates with tekton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to come up with an architecture diagram for this. There are only two relevant components, Tekton operator and Jaeger. Tekton operator just records the spans during reconcile process and flush them to jaeger on specific intervals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Just wanted to make sure if there was anything out of the ordinary being introduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pxp928 - a diagram would be nice indeed.
We could add a diagram in the next PR when we propose this as implementable if that's ok


Initialize a tracer provider with jaeger as the backend for each reconciler. The jaeger collector URL can be passed as an argument to the controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the URL as an argument to the controller should be enough for development environments.
Eventually, it can be an enhancement out of scope for this initial TEP, we might want to implement reading configuration from a config map, so that configuration may be altered at runtime, similar to https://pkg.go.dev/knative.dev/pkg/tracing. knative/pkg relies on OpenCensus for now, and we would like to to start using OpenTelemetry right from the beginning.

Expand All @@ -70,3 +81,10 @@ The spancontext should be also made available as environment variables container
### Test Plan

There must be unit tests for recording of spans and e2e tests for context propogation through custom resources.

## References

[Instrument Tekton resources for tracing](https://github.com/tektoncd/pipeline/issues/2814)
[OpenTelemetry](https://opentelemetry.io/)
[OpenTelemetry instrumentation in GO](https://opentelemetry.io/docs/instrumentation/go/manual/)
[Jaeger Tracing](https://www.jaegertracing.io/)