Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to specify dataset and namespace for rerouting data collected by kubernetes.container_logs datastream #6845

Closed
gsantoro opened this issue Jul 6, 2023 · 1 comment · Fixed by #7118
Assignees

Comments

@gsantoro
Copy link
Contributor

gsantoro commented Jul 6, 2023

As a follow-up of #5988 we want to avoid adding default routing rules in our integrations. Instead, we would like to give the user the ability to specify labels in Kubernetes containers that would be used to reroute the traffic.

In contrast to previous considerations, we’re not creating a data stream per service name as this would lead to a lot of overhead for customers with thousands of services.

The reroute processor was introduced in this PR elastic/elasticsearch#76511 and it has been available since 8.8.0.

Currently, you can add the reroute processor manually to an ingest pipeline with an if condition to reroute traffic to a destination dataset and namespace.

This is a quite manual process at the moment.

In order to improve the experience for the end user we could:

  1. Define some standard Kubernetes labels (for example elastic.co/dataset and elastic.co/namespace) that if present could be used to reroute the traffic automatically without the need to define a custom pipeline defined by the user. The values from those labels would end up into the fields data_stream.dataset and data_stream.namespace and a default routing rule will use them to reroute the traffic. Since the reroute processor has to be added to an ingest pipeline, that means that integrations that use those Kubernetes labels should have an ingest pipeline that checks for the presence of those container labels and reroute if those are present. We will have to evaluate what's the performance hit of having these extra steps always running. Since the benefits are quite significant, maybe that's worth it.
  2. we should also extract the fields service.name and service.version from the well knows Kubernetes labels app.kubernetes.io/name and app.kubernetes.io/version. Alternatively if those are not provided we should infer the service name from the field container.name and leave out the service.version field.
@gsantoro gsantoro self-assigned this Jul 6, 2023
@gsantoro gsantoro changed the title Allow users to add custom routing rules in an integration Allow users to specify dataset and namespace for rerouting data collected by kubernetes.container_logs datastream Jul 7, 2023
@felixbarny
Copy link
Member

This is unblocked now that specifying local routing rules are supported in the package spec and in Fleet

The minimum stack version has to be set to 8.10, though, as that's the version where Fleet supports that feature.

@felixbarny felixbarny linked a pull request Jul 28, 2023 that will close this issue
14 tasks
zmoog added a commit to zmoog/integrations that referenced this issue Sep 5, 2023
If available, the reroute processor uses the pod's dataset and
namespace labels, fallback to the values configured in the agent
policy.

refs: elastic#6845
zmoog added a commit to zmoog/integrations that referenced this issue Sep 5, 2023
`service.name` should use value from the label `app.kubernetes.io/name`
first, and then fallback to the `kubernetes.container.name` if not
present. I need to double-check if I can use the container name as is
of I need to parse it in some form.

`service.version` use value from the label `app.kubernetes.io/version`,
if present.

refs: elastic#6845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants