-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codec Replacement Plan #5124
Comments
I believe that the Tahoe decision to move the decode part of Codecs to being filters after the queue needs a rethink. I would like to keep compatibility with the current config but bets are off. We need to understand how the various functions of the input and codec will decompose. These functions are:
All inputs compose some of these functions to work. Before discussing the pragmatic moving of these functions (some/all) to after the PQ, I want to establish a guiding principle: We seek to minimise the amount of processing applied to the source data before a generated event is persisted to the PQ and that we defer the work the input had previously done until after the PQ. Unfortunately, after looking at the inputs, I feel that the amount of Faced with this variation, can we design a system that allows for this spectrum i.e. allows for some inputs to defer work and others not to, where it makes sense to do one or the other? I believe so. Imagine an event put into PQ with minimal processing, what would it look like and what metadata would it need to have, to enable processing after the PQ. Imagine that this minimal event was generated by a TCP input. {
"message": "<some json>",
"@metadata": {
"global-decorator" : {
"add_fields": [["env" , "dev"], ["sys" , "undef"]],
"add_tags": ["conf-v1"]
},
"local-decorator": {
"key": "logstash-input-tcp/decorator",
"host": "<some ip>",
"port": "<some port>",
"ssl_enable": true,
"ssl_subject": "<some subject>"
},
"charset" : "UTF-8",
"decoder": {
"key": "logstash-input-tcp/json",
"source": "message"
}
}
} In any batch, these events may have the same My initial thoughts are these:
Problems:
Advantages:
|
Raw notes on a new plan:
Plan for codecs: Figure out if we need Raw event types New flow input -> mill -> charset processor -> queue -> filters -> outputs If they are using multiline codecs or other removed features display message:
will edit and reformat soon |
Logstash currently uses codecs. The codec abstraction externally makes (some) sense
but has a number of problems and limitations.
events, and 2.) deserializing data.
declare that it only does one or the other.
tokenization of events happens before the codec stage.
when this work could be done by the pipeline with greater efficiency.
Moving forward it makes more sense to take each codec and split it up into a filter
and a new Serializer object that encodes data. So, the JSON codec would become:
We could keep the existing syntax but have it work with these new internals. The execution model would move from
To keep compatibility with the current config syntax the 'codec' directive would
do the following:
asking that a special filter be applied to that event before the normal filter chain.
The output would need to implement a new interface as follows:
The text was updated successfully, but these errors were encountered: