-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
16k buffer limit causes \n in message #16
Comments
I'm confused. This is not a buffer. We do a 16kb read and pass it to the codec. This is how tcp input works also. Can we add a way to reproduce this? |
Yep, I will pass you a reproduction offline |
I try to reproduce, but I don't know enough in the described behavior to know how to reproduce what you are talking about. Here's my attempt:
1 line, 16395 bytes.
Identical to the input. Longer than 16kb |
It is a buffer: |
@jakelandis let me rephrase. We don't buffer two reads together. We do 1 read, then ship that data into the codec immediately. The contents of the buffer are (should be?) lost after that because the codec has done its work. Buffering, in this context, I am meaning this strategy:
^^ In stdin, we do this:
|
This is not a bug in stdin. It's not stated in this issue, but I believe the context is the use of the multiline codec with stdin. And yes, stdin+multiline is broken. It's a bug caused by the fact that multiline was primarily written for the file input, and the file input itself is already line-oriented. stdin is not. examples of this working with stdin + line codec:
Now let's test with multiline codec:
^^ 19 lines? There are clearly only 10 lines provided on input, and none match the Strange! The
But notice, if I just use the
This is a bug in the way the multiline codec was designed to work only with the file input which has a bug in the way it provides data to Logstash (this is an ancient issue, but we can fix it now if desired). |
The bug is in the multiline codec which assumes each decode() call a complete event (or a complete sequence of events) -- https://github.com/logstash-plugins/logstash-codec-multiline/blob/master/lib/logstash/codecs/multiline.rb#L198 |
The multiline codec causes this bug because it makes some false assumptions (this is a really old bug, from what I remember).
This causes a bug because a partial read (of a full event) will be sent like this: If we said "hello world" and only read "hello" so far:
which calls The multiline codec needs to buffer until it has at least a full line, but it does not do this. The line codec does this, for what it's worth. Historically, the multiline codec was primarily focused on the file input which is why I think it has this behavior; the file input only emits one whole line at a time, so it accidentally dodges this bug in the multiline codec. |
Yes, to echo what @jordansissel said: this is a not a stdin input problem but a multiline codec problem: This was the interpretation of the original problem which triggered the creation of this issue by @PhaedrusTheGreek :
Which explains the rogue Our long standing milling concept would the long term solution to this I believe otherwise, as @jordansissel mentioned maybe just adding line buffering in the multiline code using the |
Ran into the same issue today. I am making a project that uses the stdin and multiline plugin. A lot of lines are rejected by my grok filter because of the newlines. Any new info about this issue? |
Fyi I just ran in to this issue too. I added detail comments in #37 |
Until we move forward with proper boundary detection across inputs/codecs, I am proposing this solution to the multiline codec logstash-plugins/logstash-codec-multiline#63 |
After 16kb input, a line break is inserted into the messages due to a buffer limit:
per @jakelandis:
This problem makes it difficult to use STDIN as a "read files and exit logstash" use case.
The text was updated successfully, but these errors were encountered: