Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/routing] When matching multiple conditions, build a new consumer each time. #29882

Closed
djaglowski opened this issue Dec 14, 2023 · 17 comments · Fixed by #36824
Closed
Labels
bug Something isn't working connector/routing help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed

Comments

@djaglowski
Copy link
Member

Related question which can be a separate issue. Is the other (match multiple routes) approach implemented correctly? I am wondering if we are not properly fanning out the data.

For example, if we match routes 1 and 2, we cannot send the same copy of the data to both consumers. We need to use a fanout. In other words, we should build a consumer which contains all the pipelines in both 1 and 2, and send to that.

Originally posted by @djaglowski in #28888 (comment)

@djaglowski djaglowski added bug Something isn't working connector/routing labels Dec 14, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for connector/routing: @jpkrohling @mwear. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling
Copy link
Member

@mwear, are you interested in taking a look at this one?

Copy link
Contributor

github-actions bot commented Apr 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 1, 2024
@djaglowski djaglowski added never stale Issues marked with this label will be never staled and automatically removed help wanted Extra attention is needed and removed Stale labels Apr 1, 2024
@djaglowski
Copy link
Member Author

djaglowski commented Oct 24, 2024

I've been working on the routing connector lately and revisiting this issue I believe it is a very difficult one to solve performantly. When using match_once: false, we currently evaluate all logs against all routes and track the results of the evaluation by grouping matches together as we find them. In order to properly fan out the data, we need to completely rework the tracking mechanism so that it also remembers which set of routes were matched for each log entry.

I'm a bit skeptical that this is really worth it but I'd like to hear from others. Does anyone actually rely on match_once: false? Can we deprecate the option and have the router always act like a proper switch statement?

For what it's worth, I believe it's technically possible to achieve the match_once: false behavior by composing routers with match_once: true. Consider a case where logs are routed based on two independent dimensions:

routing:
  match_once: false
  table:
    - condition: attributes["env"] == "prod"
       pipelines: [ logs/prod ]
    - condition: attributes["env"] == "dev"
       pipelines: [ logs/dev ]
    - condition: attributes["region"] == "east"
       pipelines: [ logs/east ]
    - condition: attributes["region"] == "west"
       pipelines: [ logs/west ]

If there is no default case that needs to be handled, than the user can route the logs to two separate routers:

routing/env:
  match_once: true
  table:
    - condition: attributes["env"] == "prod"
       pipelines: [ logs/prod ]
    - condition: attributes["env"] == "dev"
       pipelines: [ logs/dev ]
routing/region:
  match_once: true
  table:
    - condition: attributes["region"] == "east"
       pipelines: [ logs/east ]
    - condition: attributes["region"] == "west"
       pipelines: [ logs/west ]

If there is a default case, then it is a bit trickier, but the user can enumerate the combination of cases:

routing:
  match_once: true
  table:
    - condition: attributes["env"] == "prod" and attributes["region"] == "east"
       pipelines: [ logs/prod, logs/east ]
    - condition: attributes["env"] == "prod" and attributes["region"] == "west"
       pipelines: [ logs/prod, logs/west ]
    - condition: attributes["env"] == "dev" and attributes["region"] == "east"
       pipelines: [ logs/dev, logs/east ]
    - condition: attributes["env"] == "dev" and attributes["region"] == "west"
       pipelines: [ logs/dev, logs/west ]
  default_pipelines: [ logs/default ]

This may not be scaleable if there are many dimensions, but another way to solve would be to place a "default router" before the handoff to the one-dimension routers. This "default router" would have almost exactly the same config as the the original match_once: false router:

routing:
  match_once: true
  table:
    - condition: attributes["env"] == "prod"
       pipelines: [ logs/env, logs/region ] # forward to both "routing/env" and "routing/region"
    - condition: attributes["env"] == "dev"
       pipelines: [ logs/env, logs/region ]
    - condition: attributes["region"] == "east"
       pipelines: [ logs/env, logs/region ]
    - condition: attributes["region"] == "west"
       pipelines: [ logs/env, logs/region ]
  default_pipelines: [ logs/default ] # handle logs that matched no dimensions
routing/env:
  match_once: true
  table:
    - condition: attributes["env"] == "prod"
       pipelines: [ logs/prod ]
    - condition: attributes["env"] == "dev"
       pipelines: [ logs/dev ]
routing/region:
  match_once: true
  table:
    - condition: attributes["region"] == "east"
       pipelines: [ logs/east ]
    - condition: attributes["region"] == "west"
       pipelines: [ logs/west ]

Let's hear from others first, but if there are no objections I propose we should deprecate the option by approximating a feature gate:

  • vN: Document the deprecation and add a warning log to the connector.
  • vN+1: Switch the default to true.
  • vN+2: Remove the setting and associated logic.

@sirianni
Copy link
Contributor

Does anyone actually rely on match_once: false?

My team relies on match_once: false. We have many use cases for telemetry in our platform and different "sinks" supporting those use cases. We use the routingconnector to selectively send telemetry to those sinks. A single resource may provide telemetry that's useful in multiple sinks.

@djaglowski
Copy link
Member Author

Thanks @sirianni. What do you think about the alternatives that I proposed above? If you want to share an example configuration, I would be interested in trying to convert it to produce the same result as match_once: true.

@sirianni
Copy link
Contributor

sirianni commented Oct 24, 2024

What do you think about the alternatives that I proposed above?

It's doable, but, as you said, forces the user to enumerate all of the possible combinations explicitly which increases configuration complexity.

As you pointed out, the alternative would be to have the code itself build a consumer for each unique set of matched routes which is quite complex on the implementation side.

@djaglowski
Copy link
Member Author

the alternative would be to have the code itself build a consumer for each unique set of matched routes which is quite complex on the implementation side.

It's worse than that actually, because we also need to create a mechanism for determining which of these compound consumers needs to be used.

@jpkrohling
Copy link
Member

For reference, this was added as part of this PR and does NOT exist as part of the processor:

#28888

@djaglowski
Copy link
Member Author

As much as I don't want to break anyone and would like to support the match_once: false use case, I think we have a serious problem with that implementation and no capacity to solve it.

At a minimum, I think we should proceed with deprecation and changing the default to match_once: true before we are able to declare the component beta. However, we can keep the setting in place for quite a while and make a final decision later.

Thoughts?

@jpkrohling
Copy link
Member

I agree that the best course of action is to deprecate/remove that feature, at least for the moment.

djaglowski added a commit that referenced this issue Dec 13, 2024
This PR deprecates the `match_once` parameter. It defines a multi-step
process which hopefully gives users plenty of time to make necessary
changes. It also provides several detailed examples of how to migrate a
configuration.

Resolves #29882
@sirianni
Copy link
Contributor

@djaglowski thanks for your work on this component.

I'm thinking through this and going around in circles a bit 🙂 . Is there a practical difference between
(A) using the routing connector
-- vs --
(B) using the forward connector combined with a filtering processor as the first step in the destination pipeline that filters out telemetry not for itself?

routing/region:
  match_once: true
  table:
    - condition: attributes["region"] == "east"
       pipelines: [ logs/east ]
    - condition: attributes["region"] == "west"
       pipelines: [ logs/west ]

vs.

...
exporters: [ forward/logs ]

logs_east:
  receivers: [ foward/logs ]
  processors: [ filter/logs_east ]
  exporters: [ ... ]
logs_west:
  receivers: [ foward/logs ]
  processors: [ filter/logs_west ]
  exporters: [ ... ]

sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…36824)

This PR deprecates the `match_once` parameter. It defines a multi-step
process which hopefully gives users plenty of time to make necessary
changes. It also provides several detailed examples of how to migrate a
configuration.

Resolves open-telemetry#29882
@djaglowski
Copy link
Member Author

@sirianni, thanks for understanding the need here.

There is a slight difference between using routing vs forward/filter, in that forward/filter will cause data to be copied. (N-1 copies will be made, where N is the number of pipelines you are forwarding to.)

This may be less of a concern if you are migrating from match_once: false, since that would also have made copies in many cases. But still it's worth noting that forward/filter will always copy the data, even though it is about to be dropped by the filter.

@sirianni
Copy link
Contributor

match_once: false, since that would also have made copies in many cases

🤔 I thought it didn't make copies which is this bug...?

@sirianni
Copy link
Contributor

I guess I'm poking a bit on this part from your README

Listing both routers in the pipeline configuration will result in each receiving an independent handle to the data.

service:
  pipelines:
    logs/in::exporters: [routing/env, routing/region]

Why would this happen given that the routingconnector specifies MutatesData: false?

return consumer.Capabilities{MutatesData: false}

mterhar pushed a commit to mterhar/opentelemetry-collector-contrib that referenced this issue Dec 19, 2024
…36824)

This PR deprecates the `match_once` parameter. It defines a multi-step
process which hopefully gives users plenty of time to make necessary
changes. It also provides several detailed examples of how to migrate a
configuration.

Resolves open-telemetry#29882
@sirianni
Copy link
Contributor

sirianni commented Jan 2, 2025

There is a slight difference between using routing vs forward/filter, in that forward/filter will cause data to be copied. (N-1 copies will be made, where N is the number of pipelines you are forwarding to.)

I'm also confused about this. The forwardconnector also sets MutatesData: false so how does it ensure that each destination pipeline gets its own copy of the data?

AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this issue Jan 13, 2025
…36824)

This PR deprecates the `match_once` parameter. It defines a multi-step
process which hopefully gives users plenty of time to make necessary
changes. It also provides several detailed examples of how to migrate a
configuration.

Resolves open-telemetry#29882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/routing help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants