-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle reads by the input module that are not aligned to a newline #38
Conversation
…newline The input plugin might not align the reads by the newlines (i.e. stdin). For example stdin reads 16384 bytes and passes them to the codec, regardless if that read ends with a newline. This commit fixes this by keeping the last 'partial' line and adds it with the next decode call. In our environment this change didn't have any noticable effect on the performance. Fixes Issue logstash-plugins#37.
@retoo Can you please perform step 2 of https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md#contribution-steps |
I feel this should use BufferedTokenizer since it solve this problem and is On Tuesday, June 14, 2016, Reto Schüttel notifications@github.com wrote:
|
This has to take auto-flush into account. I would prefer not to do this as I am sure that Event Milling will take this into account. |
Yes, the autoflush part is completely broken today. On the other hand the codec is completely useless for us without the fix. Is stdin the only input plugin having this problem? Yesterday I briefly tried to fix the tests with something simple, but failed. I'll give it another shot sometime today. |
@retoo - Do you know how to use LS with your own fork of a plugin? It may be easier to fork stdin and add Buffered Tokeniser like the line codec does today then use the fork. Both stdin and multiline codec are unlikely to change very much until we deliver Event Milling and you can always pull bugfixes from upstream. Event Milling is meant to provide a flexible mini-pipeline inside an input that allows a user to direct LS exactly how they want the data chunk to be processed. |
@guyboertje I changed the stdin plugin, it's much cleaner this way. I'll close this PR and suggest we adopt: logstash-plugins/logstash-input-stdin#11 |
Until we move forward with proper boundary detection across inputs/codecs, I am proposing this solution to the multiline codec #63 |
IMHO the stdin input plugin should be thrown out of logstash if not fixed. It is not just broken, its broken in away that it ruins your day when you finally realize that something is screwing with your log files. |
@retoo You are witnessing this problem using the multiline codec right? (per #37). The stdin input is not broken in the way you think it is. It correctly reads blocks of data and passes them to the underlying codec. It is the codec's job to correctly deal with data across blocks. I have recently submitted #63 to fix this problem in the multiline codec. If you are interested in helping some more with this you could try this fix locally or we could arrange to get a test build of the codec to see how it works for you? |
Thanks for the explanation. I'm no longer using logstash in that particular setup, I just got a bit frustrated about seeing it still open after having a working fix in 2016 ;) |
@retoo I totally understand, we'll try to do better. Thanks for your contribution. |
The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.
In our environment this change didn't have any noticable effect on the
performance.
Fixes Issue #37.