Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Select output index based on the source input #14010

Merged
merged 21 commits into from
Nov 7, 2019

Conversation

faec
Copy link
Contributor

@faec faec commented Oct 10, 2019

Adds configuration to Filebeat inputs to override the output index of their events (#13255)

@faec faec requested a review from urso October 10, 2019 21:22
@urso
Copy link

urso commented Oct 11, 2019

Passing a fmtstr via Meta indeed is hacky. Meta should be treated as a document itself, only storing serializable/final values. We send Meta as @metadata to Kafka/Logstash, so users can mimic the behavior Beats->ES in the Beats->LS->ES scenario as well.

About Wrapper. The pipelineConnector in filebeat/channel/connector.go already is a wrapper for the beat.Pipeline. It reimplements ConnectWith, adjusting user settings so to apply common settings. E.g. fields, tags, processors.

The pipelineConnector creates a list of processors, depending on user settings. Next libbeat merges the local processors with global ones. The final processors pipeline is established here: https://github.com/elastic/beats/blob/master/libbeat/publisher/processing/default.go#L223

I think as of now building the processor pipeline is much too complex, and something I'd like to clean up in the future. Settings like fields, tags should be replaced by processors.

But having processors capabilities, I'm thinking to have the pipelineConnector add an index-name-processor if "index" is configured. In this case the index name will be created by the processors and passed along as plain string to each output.

The only limitation would be that global processors follow global processors. Users configuring parsing on a global level might run into issues (but we don't support event fields anyways, and we do not parse timestamps). This is indeed some technical debt, that I hope we can reduce by simplifying the processor setup overall.

Just thinking loud (no need to implement this):
A more generic approach would be a "format" processor (e.g. kafka/file/console output have a format codec), that can create a string based on the event contents. In this case the format.target: "@metadata.index" could be used to configure the target index.

// This helper mimicks applyStaticFmtstr in ilm.go, creating a placeholder
// event for the restricted set of fields we allow here. It might be worth
// making this a shared helper function or otherwise specializing for this case.
func expandIndexPattern(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this pattern in a few places. When 7.0 was in the making, devs did forget to adapt one or the other place. Maybe we can provide a common formatter type/instance that can provide these default fields?

@dedemorton dedemorton mentioned this pull request Oct 16, 2019
6 tasks
filebeat/channel/connector.go Outdated Show resolved Hide resolved
filebeat/channel/connector.go Outdated Show resolved Hide resolved
type TimestampFormatString struct {
eventFormatString *EventFormatString
fields common.MapStr
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if idx, ok := tmp.(string); ok {
return idx
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... where is alias used? The name looks very specific to some use-case. I guess it was added to work around the timestamp addition when index is used. Should we go with 'alias' or 'raw-index' in general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know who might be using alias or how, which is why I was reluctant to overload it (that, and because of the potentially confusing name)... my preference is for raw-index or something similar (raw-index-name?) since the functional behavior is independent of whether aliases are being used, though I could be convinced to go with alias instead to avoid proliferating magic Metadata fields...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to grep the source. I guess this comes from Metricbeat or Heartbeat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is odd I don't remember that?

Copy link

@urso urso Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I copied it from the original ILM PR. It was introduced in 7.0. Not sure if anyone still uses it. Removing could break things. We should at least create a follow up issue (technical debt) to investigate and remove it in the future.

+1 on not relying on alias here.

@faec faec changed the title [WIP] [Filebeat] Select output index based on the source input [Filebeat] Select output index based on the source input Oct 29, 2019
@faec
Copy link
Contributor Author

faec commented Oct 29, 2019

Finished the last pieces including new unit tests confirming the order of application of all the processor settings. I think the only question left is whether to keep the metadata name "raw-index" (or "raw_index"? I'm not sure which is more in line with our conventions), or overload the "alias" field.

filebeat/channel/connector.go Outdated Show resolved Hide resolved
@urso
Copy link

urso commented Oct 30, 2019

I think the only question left is whether to keep the metadata name "raw-index" (or "raw_index"? I'm not sure which is more in line with our conventions), or overload the "alias" field.

Let's not overload "alias". We actually have to figure if it is in use, or some debt.

Everything in Meta should be treated like normal events. We even send Meta to Logstash and others (@metadata namespace).

For events we normally prefer to use namespaces. For example @metadata.index.prefix and @metadata.index.name might be appropriate names. Unfortuntaly index is gone and it's functionality doesn't reflect the actual name.

In case ECS does not create a namespace it uses _ to separate words. To be consistent with ECS we should use raw_index.

@faec
Copy link
Contributor Author

faec commented Oct 30, 2019

Ok, all comments addressed, and docs + changelog updated, so I think I'm just waiting on CI now

@urso
Copy link

urso commented Oct 30, 2019

Looks like the intake job doesn't like the formatting of libbeat/common/fmtstr/formattimestamp_test.go.

@faec faec added Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. labels Nov 7, 2019
@faec faec merged commit 6a03478 into elastic:master Nov 7, 2019
@faec faec deleted the input-index branch November 7, 2019 21:38
faec added a commit to faec/beats that referenced this pull request Nov 20, 2019
faec added a commit to faec/beats that referenced this pull request Nov 20, 2019
@faec faec added v7.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Nov 20, 2019
faec added a commit to faec/beats that referenced this pull request Nov 20, 2019
faec added a commit that referenced this pull request Nov 20, 2019
…e source input (#14664)

* [Filebeat] Select output index based on the source input (#14010)


(cherry picked from commit 6a03478)

* Delete debug file, fix changelog merge
@faec faec added the v7.6.0 label Dec 13, 2019
faec added a commit to faec/beats that referenced this pull request Dec 13, 2019
faec added a commit that referenced this pull request Dec 13, 2019
@dedemorton dedemorton mentioned this pull request Jan 15, 2020
14 tasks
jorgemarey pushed a commit to jorgemarey/beats that referenced this pull request Jun 8, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…d on the source input (elastic#14664)

* [Filebeat] Select output index based on the source input (elastic#14010)


(cherry picked from commit 82c0bde)

* Delete debug file, fix changelog merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants