Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle reads by the input module that are not aligned to a newline #38

Closed

Conversation

retoo
Copy link

@retoo retoo commented Jun 14, 2016

The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.

In our environment this change didn't have any noticable effect on the
performance.

Fixes Issue #37.

…newline

The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.

In our environment this change didn't have any noticable effect on the
performance.

Fixes Issue logstash-plugins#37.
@suyograo
Copy link
Contributor

@jordansissel
Copy link
Contributor

I feel this should use BufferedTokenizer since it solve this problem and is
know to Logstash historically as a reusable solution for line splitting.

On Tuesday, June 14, 2016, Reto Schüttel notifications@github.com wrote:

The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.

In our environment this change didn't have any noticable effect on the
performance.

Fixes Issue #37

#37.

You can view, comment on, or merge this pull request online at:

#38
Commit Summary

  • Correctly handle reads by the input module that are not aligned to a
    newline

File Changes

Patch Links:

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.patch

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.diff


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#38,
or mute the thread
https://github.com/notifications/unsubscribe/AAIC6p73c-J9XR1zhcoLR90vrbv2XZwjks5qLtBOgaJpZM4I1fAs
.

@guyboertje
Copy link
Contributor

This has to take auto-flush into account.

I would prefer not to do this as I am sure that Event Milling will take this into account.

@retoo
Copy link
Author

retoo commented Jun 15, 2016

Yes, the autoflush part is completely broken today. On the other hand the codec is completely useless for us without the fix. Is stdin the only input plugin having this problem?

Yesterday I briefly tried to fix the tests with something simple, but failed. I'll give it another shot sometime today.

@guyboertje
Copy link
Contributor

guyboertje commented Jun 15, 2016

@retoo - Do you know how to use LS with your own fork of a plugin?

It may be easier to fork stdin and add Buffered Tokeniser like the line codec does today then use the fork.

Both stdin and multiline codec are unlikely to change very much until we deliver Event Milling and you can always pull bugfixes from upstream.

Event Milling is meant to provide a flexible mini-pipeline inside an input that allows a user to direct LS exactly how they want the data chunk to be processed.

@retoo
Copy link
Author

retoo commented Jul 27, 2016

@guyboertje I changed the stdin plugin, it's much cleaner this way.

I'll close this PR and suggest we adopt: logstash-plugins/logstash-input-stdin#11

@colinsurprenant
Copy link
Contributor

Until we move forward with proper boundary detection across inputs/codecs, I am proposing this solution to the multiline codec #63

@retoo
Copy link
Author

retoo commented Sep 27, 2018

IMHO the stdin input plugin should be thrown out of logstash if not fixed.

It is not just broken, its broken in away that it ruins your day when you finally realize that something is screwing with your log files.

@colinsurprenant
Copy link
Contributor

@retoo You are witnessing this problem using the multiline codec right? (per #37). The stdin input is not broken in the way you think it is. It correctly reads blocks of data and passes them to the underlying codec. It is the codec's job to correctly deal with data across blocks. I have recently submitted #63 to fix this problem in the multiline codec.

If you are interested in helping some more with this you could try this fix locally or we could arrange to get a test build of the codec to see how it works for you?

@retoo
Copy link
Author

retoo commented Sep 27, 2018

Thanks for the explanation. I'm no longer using logstash in that particular setup, I just got a bit frustrated about seeing it still open after having a working fix in 2016 ;)

@colinsurprenant
Copy link
Contributor

@retoo I totally understand, we'll try to do better. Thanks for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants