Correctly handle reads by the input module that are not aligned to a newline #38

retoo · 2016-06-14T16:01:18Z

The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.

In our environment this change didn't have any noticable effect on the
performance.

Fixes Issue #37.

…newline The input plugin might not align the reads by the newlines (i.e. stdin). For example stdin reads 16384 bytes and passes them to the codec, regardless if that read ends with a newline. This commit fixes this by keeping the last 'partial' line and adds it with the next decode call. In our environment this change didn't have any noticable effect on the performance. Fixes Issue logstash-plugins#37.

suyograo · 2016-06-14T16:02:39Z

@retoo Can you please perform step 2 of https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md#contribution-steps

jordansissel · 2016-06-14T16:11:25Z

I feel this should use BufferedTokenizer since it solve this problem and is
know to Logstash historically as a reusable solution for line splitting.

On Tuesday, June 14, 2016, Reto Schüttel notifications@github.com wrote:

The input plugin might not align the reads by the newlines (i.e. stdin).
For example stdin reads 16384 bytes and passes them to the codec,
regardless if that read ends with a newline. This commit fixes
this by keeping the last 'partial' line and adds it with the next
decode call.

In our environment this change didn't have any noticable effect on the
performance.

Fixes Issue #37

#37.

You can view, comment on, or merge this pull request online at:

#38
Commit Summary

Correctly handle reads by the input module that are not aligned to a
newline

File Changes

M lib/logstash/codecs/multiline.rb
https://github.com/logstash-plugins/logstash-codec-multiline/pull/38/files#diff-0
(21)

Patch Links:

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.patch

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#38,
or mute the thread
https://github.com/notifications/unsubscribe/AAIC6p73c-J9XR1zhcoLR90vrbv2XZwjks5qLtBOgaJpZM4I1fAs
.

guyboertje · 2016-06-15T07:39:00Z

This has to take auto-flush into account.

I would prefer not to do this as I am sure that Event Milling will take this into account.

retoo · 2016-06-15T08:00:59Z

Yes, the autoflush part is completely broken today. On the other hand the codec is completely useless for us without the fix. Is stdin the only input plugin having this problem?

Yesterday I briefly tried to fix the tests with something simple, but failed. I'll give it another shot sometime today.

guyboertje · 2016-06-15T08:58:45Z

@retoo - Do you know how to use LS with your own fork of a plugin?

It may be easier to fork stdin and add Buffered Tokeniser like the line codec does today then use the fork.

Both stdin and multiline codec are unlikely to change very much until we deliver Event Milling and you can always pull bugfixes from upstream.

Event Milling is meant to provide a flexible mini-pipeline inside an input that allows a user to direct LS exactly how they want the data chunk to be processed.

retoo · 2016-07-27T15:38:45Z

@guyboertje I changed the stdin plugin, it's much cleaner this way.

I'll close this PR and suggest we adopt: logstash-plugins/logstash-input-stdin#11

colinsurprenant · 2018-09-25T19:35:55Z

Until we move forward with proper boundary detection across inputs/codecs, I am proposing this solution to the multiline codec #63

retoo · 2018-09-27T08:45:49Z

IMHO the stdin input plugin should be thrown out of logstash if not fixed.

It is not just broken, its broken in away that it ruins your day when you finally realize that something is screwing with your log files.

colinsurprenant · 2018-09-27T13:45:24Z

@retoo You are witnessing this problem using the multiline codec right? (per #37). The stdin input is not broken in the way you think it is. It correctly reads blocks of data and passes them to the underlying codec. It is the codec's job to correctly deal with data across blocks. I have recently submitted #63 to fix this problem in the multiline codec.

If you are interested in helping some more with this you could try this fix locally or we could arrange to get a test build of the codec to see how it works for you?

retoo · 2018-09-27T15:12:25Z

Thanks for the explanation. I'm no longer using logstash in that particular setup, I just got a bit frustrated about seeing it still open after having a working fix in 2016 ;)

colinsurprenant · 2018-09-27T15:26:01Z

@retoo I totally understand, we'll try to do better. Thanks for your contribution.

suyograo added the enhancement label Jun 14, 2016

suyograo assigned guyboertje Jun 14, 2016

suyograo added the missing cla label Jun 14, 2016

retoo closed this Jul 27, 2016

retoo deleted the bugfix/37-random-new-lines branch July 27, 2016 15:39

This was referenced Jun 29, 2017

stdin/multiline-code adds random newlines #37

Closed

[WIP] proper delimiter support for line based input #26

Open

colinsurprenant mentioned this pull request Sep 24, 2018

[WIP] use BufferedTokenizer and configurable line delimiter #63

Closed

mashhurs mentioned this pull request Feb 11, 2025

[Placeholder] Codec adds new line when used with streaming input (TCP/UDP) plugins. #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly handle reads by the input module that are not aligned to a newline #38

Correctly handle reads by the input module that are not aligned to a newline #38

retoo commented Jun 14, 2016

suyograo commented Jun 14, 2016

jordansissel commented Jun 14, 2016

#37.

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.patch

guyboertje commented Jun 15, 2016

retoo commented Jun 15, 2016

guyboertje commented Jun 15, 2016 •

edited

Loading

retoo commented Jul 27, 2016

colinsurprenant commented Sep 25, 2018

retoo commented Sep 27, 2018

colinsurprenant commented Sep 27, 2018

retoo commented Sep 27, 2018

colinsurprenant commented Sep 27, 2018

Correctly handle reads by the input module that are not aligned to a newline #38

Correctly handle reads by the input module that are not aligned to a newline #38

Conversation

retoo commented Jun 14, 2016

suyograo commented Jun 14, 2016

jordansissel commented Jun 14, 2016

#37.

https://github.com/logstash-plugins/logstash-codec-multiline/pull/38.patch

guyboertje commented Jun 15, 2016

retoo commented Jun 15, 2016

guyboertje commented Jun 15, 2016 • edited Loading

retoo commented Jul 27, 2016

colinsurprenant commented Sep 25, 2018

retoo commented Sep 27, 2018

colinsurprenant commented Sep 27, 2018

retoo commented Sep 27, 2018

colinsurprenant commented Sep 27, 2018

guyboertje commented Jun 15, 2016 •

edited

Loading