Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auth data to pdata resource #3745

Closed

Conversation

jpkrohling
Copy link
Member

This PR resolves #2733 by building on top of a previous PoC by @tigrannajaryan, but adapted a bit to allow the inject/extract of auth data to be performed by the configauth package. Other packages, such as the client ones, could use the same approach to inject/extract data.

I tested this with a configuration like this:

extensions:
  oidc:
    # see the blog post on securing the otelcol for information
    # on how to setup an OIDC server and how to generate the TLS certs
    # required for this example
    # https://medium.com/opentelemetry/securing-your-opentelemetry-collector-1a4f9fa5bd6f
    issuer_url: http://localhost:8080/auth/realms/opentelemetry
    audience: collector
    groups_claim: tenant

  bearertokenauth:
    token: "..."

receivers:
  # this is a receiver for the local agent, no auth required
  otlp/noauth:
    protocols:
      grpc:
        endpoint: localhost:4317

  # this is the remote otelcol, auth required
  otlp/auth:
    protocols:
      grpc:
        endpoint: localhost:4318
        tls_settings:
          cert_file: /home/jpkroehling/Projects/src/github.com/open-telemetry/opentelemetry-collector/local/certs/cert.pem
          key_file: /home/jpkroehling/Projects/src/github.com/open-telemetry/opentelemetry-collector/local/certs/cert-key.pem
        auth:
          authenticator: oidc

processors:

exporters:
  # the exporter for the agent pipeline
  # refer to the blog post mentioned above for more info on the token and TLS config
  otlp/auth:
    endpoint: localhost:4318
    ca_file: /home/jpkroehling/Projects/src/github.com/open-telemetry/opentelemetry-collector/local/certs/ca.pem
    retry_on_failure:
      enabled: false
    auth:
      authenticator: bearertokenauth

  logging:

  jaeger:
    endpoint: localhost:14250
    insecure: true

service:
  extensions:
    - oidc
    - bearertokenauth
  pipelines:
    # this represents an otelcol instance running as an agent, perhaps in a sidecar
    # the agent has a bearer token, to authenticate with the remote collector
    traces/agent:
      receivers:
        - otlp/noauth
      processors: []
      exporters:
        - otlp/auth

    # this represents a remote otelcol, requiring auth
    traces/collector:
      receivers:
        - otlp/auth
      processors: []
      exporters:
        - logging
        - jaeger # if data is visible in Jaeger, auth worked

When sending a trace to the agent pipeline, it eventually reaches the collector pipeline, which prints the following to the console:

2021-07-30T15:10:28.703+0200	INFO	loggingexporter/logging_exporter.go:54	TracesExporter	{"#spans": 2, "sub": "c67217c1-1278-4da7-b272-916624d9aa13", "groups": ["acme"]}

This confirms that the data is propagated down the pipeline to the exporter.

There's still a couple of points to work on:

  • The original proposal by @tigrannajaryan changed files like model/internal/data/protogen/trace/v1/trace.pb.go, which should not be changed. What's the appropriate way to handle this?
  • Tests and documentation. Once the point above is clarified, I'll work on this.

Comment on lines 25 to 29
type AuthContext struct {
sub string
raw string
group []string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like that this imposes a rigid structure on what is an authentication information.
In the PoC I gave a an example of a passthrough authenticator that has neither sub, nor raw nor group, but is a header value. Sure, we could shoehorn it into raw, but then virtually anything can be argued to be represented in some raw string form.
IMO, AuthContext should be an interface here and each concrete auth extension will have a specific way to implement it. OIDC auth will have (sub,raw,group) tuple, other auth extensions will have something else instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a contract with basic fields so that processors can use basic auth primitives, like "username" or "group membership". Otherwise, the contract between components will be on keys within the map, which is way weaker. Usernames (or principal, subject) and group membership (or roles) are the basic constructs in auth/z systems.

Copy link
Member

@tigrannajaryan tigrannajaryan Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, the contract between components will be on keys within the map, which is way weaker.

@jpkrohling actually a map may be a good approach. The interaction between the processors and PDataContext is unlikely to be hard-coded to benefit from strong typing. It is most likely going to be config-driven anyway, with the user specifying in the configuration the name of the PDataContext's attribute to use (e.g. in the example of routing processor you posted, the config refers to "membership" attribute directly).

I think we can define AuthContext interface like this:

type AuthContext interface {
  Equal(other interface{}) bool
  GetAttribute(attrName string) interface{} 
}

oidcauthextension.AuthContext then can define GetAttribute such that it return the right values for the 3 supported special attribute names ("principal", "membership", whatever is the third one).

This will allow processors to query for AuthContext attributes.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Let me give it a try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it look now? It does require more work from users to tie the components together, but I don't think it's too bad. We just need to make sure every auth component documents what it produces.

exporter/loggingexporter/logging_exporter.go Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member

  • The original proposal by @tigrannajaryan changed files like model/internal/data/protogen/trace/v1/trace.pb.go, which should not be changed. What's the appropriate way to handle this?

Unfortunately I don't see any other way except patching generated *.pb.go files after Protobuf compiler generates them. Ideally the compiler would have a capability like this (i.e. declare a field that is not serializable), but there is no such thing.

Comment on lines +156 to +158
func (ms ResourceSpans) PDataContext() PDataContext {
return newPDataContext(&(*ms.orig).Context)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be auto-generated.

Comment on lines +197 to 262
// This is expected to be ignored by marshaling/unmarshaling. Double-check that this is true.
Context go_opentelemetry_io_collector_model_internal_data.PDataContext
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be auto-generated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the part that we don't know how to do nicely. Should we use sed to patch this file as part of make genproto target, after Protobuf compiler does its part of the job? Do we have any other options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu, I'll need your help here.

Copy link
Member Author

@jpkrohling jpkrohling Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package data

type PDataContext struct {
List []PDataContextKeyValue
Copy link
Member

@bogdandrutu bogdandrutu Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the already defined "AnyValue"/Attributes?

Copy link
Member

@tigrannajaryan tigrannajaryan Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the Key, it needs to be an arbitrary interface{} (and not just a string), like it is done for context.Context, but for the Value it is an interesting idea, since it may make some code that needs to deal with both telemetry attributes and PDataContext easier to write. It is worth exploring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean pdata.AttributeValue? If so, that would cause an import cycle that is not allowed.

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpkrohling !
I think I like it, with some minor comments/questions.

config/configauth/pdatacontext.go Outdated Show resolved Hide resolved
config/configauth/pdatacontext.go Outdated Show resolved Hide resolved
config/configauth/pdatacontext.go Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@jpkrohling
Copy link
Member Author

This is not stale. Some things are in my queue and I need input from @bogdandrutu, which is likely busy with the moving of the components.

@bputt-e
Copy link

bputt-e commented Aug 27, 2021

@jpkrohling how do you plan on making the auth data available to where we can add it to traces/logs/metrics?

We're planning on ingesting multiple tenants from one API endpoint and passing the data off to kafka and taking the auth reference and associating it back to the tenant so we can segregate their data.

Also not sure if relevant or not, but we took inspiration from your medium article and decided to use ory hydra instead, seems to work thus far :)

@jpkrohling
Copy link
Member Author

how do you plan on making the auth data available to where we can add it to traces/logs/metrics?

The current idea is to create a new context within the pdata which will contain a map with the auth data. I haven't had much time to work in this, but this is high priority in my queue.

Also not sure if relevant or not, but we took inspiration from your medium article and decided to use ory hydra instead, seems to work thus far :)

Thanks for letting me know!

@bputt-e
Copy link

bputt-e commented Sep 1, 2021

@jpkrohling also if it helps provide context, our plans are to use otel collector that ships to kafka that has the grafana-agent read from kafka and post the spans to loki, cortex(prometheus), and tempo. We're hoping we can abstract the clientId and use that as the x-scope-orgId to support multitenancy (for loki/cortext) and still share spans in tempo so that each team can see how their system interacts with others in the ecosystem.

@tigrannajaryan tigrannajaryan mentioned this pull request Sep 2, 2021
3 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Sep 9, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 9, 2021
@bogdandrutu bogdandrutu removed the Stale label Sep 9, 2021
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 6, 2021
@jpkrohling jpkrohling removed the Stale label Oct 6, 2021
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@tigrannajaryan
Copy link
Member

@jpkrohling do you plan to continue working on this?

@jpkrohling
Copy link
Member Author

Yes, I'm waiting on @bogdandrutu on this one. Whenever he's got some time, I would like to get some pair programming session with him to sort out the questions and problems behind this.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@CoderPoet
Copy link

Is there any progress on this PR?

@jpkrohling
Copy link
Member Author

@CoderPoet, I'll be doing a pair programming session with @bogdandrutu this Wed to unblock this.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling jpkrohling force-pushed the jpkrohling/issue2733 branch from 87cde06 to 126164a Compare October 25, 2021 13:07
@codecov
Copy link

codecov bot commented Oct 25, 2021

Codecov Report

Merging #3745 (126164a) into main (b7e2e33) will decrease coverage by 0.26%.
The diff coverage is 29.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3745      +/-   ##
==========================================
- Coverage   88.07%   87.80%   -0.27%     
==========================================
  Files         173      176       +3     
  Lines       10188    10235      +47     
==========================================
+ Hits         8973     8987      +14     
- Misses        975     1006      +31     
- Partials      240      242       +2     
Impacted Files Coverage Δ
model/pdata/pdatacontext.go 0.00% <0.00%> (ø)
model/pdata/traces.go 88.88% <0.00%> (-5.23%) ⬇️
internal/otlptext/databuffer.go 98.29% <40.00%> (-1.71%) ⬇️
config/configauth/pdatacontext.go 41.66% <41.66%> (ø)
service/internal/authcontext/authcontext.go 41.66% <41.66%> (ø)
internal/otlptext/traces.go 100.00% <100.00%> (ø)
service/internal/builder/receivers_builder.go 75.12% <100.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7e2e33...126164a. Read the comment docs.

@jpkrohling
Copy link
Member Author

jpkrohling commented Oct 25, 2021

PR updated, and tested with a branch with the required changes to the OIDC authenticator (as an example, as other server authenticators would need similar changes) with the following config file:

extensions:
  oidc:
    issuer_url: http://localhost:8080/auth/realms/opentelemetry
    audience: account
  oauth2client:
    client_id: agent
    client_secret: 0b5fab57-9b0e-449f-9da7-9ff7e1735df6
    token_url: http://localhost:8080/auth/realms/opentelemetry/protocol/openid-connect/token

receivers:
  otlp:
    protocols:
      grpc:
  otlp/withauth:
    protocols:
      endpoint: localhost:5317
      grpc:
        tls:
          cert_file: /tmp/certs/cert.pem
          key_file: /tmp/certs/cert-key.pem
        auth:
          authenticator: oidc

processors:

exporters:
  logging:
    logLevel: debug
  otlp:
    endpoint: localhost:5317
    ca_file: /tmp/certs/ca.pem
    auth:
      authenticator: oauth2client

service:
  extensions: [oidc, oauth2client]
  pipelines:
    traces/agent:
      receivers: [otlp]
      processors: []
      exporters: [otlp/withauth]
    traces/collector:
      receivers: [otlp/withauth]
      processors: []
      exporters: [logging]

The OAuth server referenced in the OIDC extension is similar to the one used in the blog post.

The logging exporter shows this:

2021-10-25T15:05:06.037+0200	DEBUG	loggingexporter/logging_exporter.go:51	ResourceSpans #0
Pipeline Data Context:
     -> &{eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJIb3pxTUYzSGlVX2VPTWEySXBUcHlOSk1ZVWFoWXd4WjNCcy1idGxJaGpnIn0.eyJleHAiOjE2MzUxNjc0MDMsImlhdCI6MTYzNTE2NzEwMywianRpIjoiZDc2MTVhZmMtNGM0Zi00MzMxLTljZjgtZTE0NDlmYjBjNzk1IiwiaXNzIjoiaHR0cDovL2xvY2FsaG9zdDo4MDgwL2F1dGgvcmVhbG1zL29wZW50ZWxlbWV0cnkiLCJhdWQiOlsiY29sbGVjdG9yIiwiYWNjb3VudCJdLCJzdWIiOiI4YmE2MTM3Ny03ZTBkLTQyYmMtYmQwMi0wN2VmNThkYWYxMzAiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJhZ2VudCIsImFjciI6IjEiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsib2ZmbGluZV9hY2Nlc3MiLCJ1bWFfYXV0aG9yaXphdGlvbiIsImRlZmF1bHQtcm9sZXMtb3BlbnRlbGVtZXRyeSJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoicHJvZmlsZSBlbWFpbCIsImNsaWVudElkIjoiYWdlbnQiLCJjbGllbnRIb3N0IjoiMTAuMC4yLjEwMCIsImVtYWlsX3ZlcmlmaWVkIjpmYWxzZSwicHJlZmVycmVkX3VzZXJuYW1lIjoic2VydmljZS1hY2NvdW50LWFnZW50IiwiY2xpZW50QWRkcmVzcyI6IjEwLjAuMi4xMDAifQ.vOI6e7V2NWxWyjedf818CYUnFyhYWZ42bGPC2zHVPtVWAMxL7pvifKHhktkKF5STsLTFW8enLIbdiIPqNEY8av2anRyVhVdKduJYt3wPWFvLl8VA3tirZthltva5ITIkz0rLCaD09tp-j1wr0i9EyxdUmze3vt0P8tzqJeNxJpMKvOHCWgyGJsVJ3WZxsJkmbecXC1H_dMBxHOqtciu6WwW5jzyKRibfgw69xgCLhWINYg87xcLR-NFVFdAaJ4UW5Dv_7OFswbTlRGRgBTYZQB7DRglu88u9cLFJsxAGZ6m3YviLTKULSzqipBvW2VJGk_7UXaint51tfNxop9XERA 8ba61377-7e0d-42bc-bd02-07ef58daf130 []}

@@ -182,6 +183,7 @@ func attachReceiverToPipelines(
switch dataType {
case config.TracesDataType:
junction := buildFanoutTraceConsumer(builtPipelines)
junction = authcontext.NewTracesInjector(junction)
createdReceiver, err = factory.CreateTracesReceiver(ctx, set, cfg, junction)

case config.MetricsDataType:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support only TraceDataType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a first iteration, yes. Once we validate the approach and get it out for testing, we might (will) expand it to other data types.

@CoderPoet
Copy link

Is it necessary to implement a tenantgroupprocessor to implement that each consumer request is tenant-isolated?

@CoderPoet
Copy link

Is it necessary to implement a tenantgroupprocessor to implement that each consumer request is tenant-isolated?

Or to implement a batch processor that supports tenant isolation

@jpkrohling
Copy link
Member Author

Or to implement a batch processor that supports tenant isolation

That is indeed something we talked about in the past. @tigrannajaryan mentioned this problem explicitly, and I think the consensus was that the existing processors that perform grouping (like the batch processor) should at least provide an option that allows users to tell the attribute to use to group the spans.

@CoderPoet
Copy link

Or to implement a batch processor that supports tenant isolation

That is indeed something we talked about in the past. @tigrannajaryan mentioned this problem explicitly, and I think the consensus was that the existing processors that perform grouping (like the batch processor) should at least provide an option that allows users to tell the attribute to use to group the spans.

Is there a requirement for pipeline tenant isolation? In this way, the retry queue is also tenant isolation and does not block each other

@jpkrohling
Copy link
Member Author

jpkrohling commented Oct 27, 2021

Is there a requirement for pipeline tenant isolation?

There's no such requirement as of now. I would really recommend taking a look at the routing processor for more complex multi-tenant requirements, as you can split each tenant into its own pipeline, or even to its own collector instance.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 4, 2021
@jpkrohling
Copy link
Member Author

I'm closing this, as it's clear that we will follow the route of #4288/#4319 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auth data as part of the resource
5 participants