Should STDOUT always set flush_on_newline by default? #7431

michael-selig · 2019-02-14T03:54:06Z

I am just starting with Crystal, being a Ruby programmer for many years. I am impressed with its performance, but playing around, I found one thing that badly affects performance when writing to standard output: that flush_on_newline is always set by default on STDOUT. My trivial test:

s = "a" * 60

10000000.times do
  puts s
end

takes real: 10.4s user: 2.4s sys: 8.0s when its output is redirected to a file. (FYI the same thing run by the Ruby 2.5.1p57 interpreter takes real: 3.8s user: 3.5s sys: 0.3s)

Yet if I add the line:

STDOUT.flush_on_newline = false

at the top, the time drops dramatically to: real: 0.5s user: 0.3s sys: 0.2s, which is more what I was expecting.

I suggest that the definition of STDOUT in kernel.cr be changed to:

STDOUT = IO::FileDescriptor.new(1).tap { |f| f.flush_on_newline = true if f.tty? }

so that flush_on_newline is only set when STDOUT is a TTY. I am fairly sure that this is how Unix & Linux C programs normally behave. Typically I believe that standard error is completely unbuffered, but appears to be line buffered in Crystal. This, however, probably isn't a big deal.

The text was updated successfully, but these errors were encountered:

ysbaddaden · 2019-02-14T08:43:29Z

STDOUT is more and more used to log things (systemd, docker expect that), if we don't flush on newline, we can easily reach out-of-sync logs when not logging furiously, or even truncated logs if crystal crashes.

I believe the flush on newline default is good, even when it's not a tty. One can always fix this for some cases with STDOUT.flush_on_newline = false.

If we change the default to change on tty or not, then we should also fix logger to flush after each message.

michael-selig · 2019-02-14T22:34:48Z

That is true, but I would have thought that developers of loggers would have to be conscious of the fact that each log msg needs to be flushed. Should a regular person writing a program writing to STDOUT (like I did) have to worry about whether STDOUT needs to be set to flush or not to get decent performance (one of Crystal's claim to fame)? Also consider code like:

log_file = File.new("/var/spool/myapp.log", "a")

which returns a file with flush_on_newline off, requiring a logger developer to either turn it on, or flush themselves after writing. Why should STDOUT be different?

So I would argue that the logger should be the one with the onus of "fixing" the issue by turning on flush_on_newline.

Also having STDOUT set to flush by default seems to run contrary to all other languages I know, including Ruby.

asterite · 2019-02-14T22:56:31Z

I agree with @michael-selig

I originally added flush_on_newline = true to STDOUT and STDERR because I noticed that when you do STDOUT.puts in Ruby then you would immediately see the output in the console. But I didn't know that if the output is not a TTY then it wouldn't flush right away.

The interesting thing is that if you check the value of STDOUT.sync it's always false, regardless of whether it's a TTY or not. So I guess Ruby flushes whenever it sees that the device is a TTY (I see they store this info in the file pointer itself at the beginning of the program, at least from some C code I've seen).

We should probably do the same. And we could also remove flush_on_newline: we should always flush on TTY and only flush when a buffer page is full (similar to Ruby). This will also simplify our code a bit.

To do that, we can probably check for TTY at the beginning of the program (which we do now) and set sync to true/false (or use another flag, like Ruby, but that seems a bit magical too me, plus I'd like to see that if it's a TTY sync becomes true and if not it becomes false).

Then loggers should flush after each line. For loggers it makes sense.

asterite · 2019-02-14T22:57:21Z

(side topic: should we rename topic:stdlib:files to topic:stdlib:io?)

SlayerShadow · 2019-03-30T01:01:15Z

I was surprised when Kemal logs is suddenly stopped after an ordinary build. 😄
Interesting point: changing default behaviour forces all third-party logging systems be patched (to work properly in dockers, for ex).

asterite · 2019-03-30T01:20:01Z

Logger does flush, so that might not be the issue.

asterite · 2019-08-30T17:37:19Z

I think this might have been a mistake.

If you run a crystal program through Process.run(...) and want to redirect output to a file or something else, and have a live stream, it won't work unless you change that crystal program to do STDOUT.sync = true. It's very unintuitive.

I think I prefer a slower output that one can still make faster by doing STDOUT.sync = STDOUT.tty? but have the default work in an intuitive way.

asterite · 2019-08-30T17:44:15Z

Should a regular person writing a program writing to STDOUT (like I did) have to worry about whether STDOUT needs to be set to flush or not to get decent performance (one of Crystal's claim to fame)?

Yes. Crystal's philosophy is that you can get really good performance if you want. But we prefer intuitive behavior over performance when possible, and here we broke that rule.

michael-selig · 2019-08-30T23:42:45Z

How default STDIN/OUT/ERR buffering should work has always been controversial - Posix only says that it is "implementation dependent".

we prefer intuitive behavior over performance when possible

I guess this comes down to what you interpret "intuitive" to mean. A C programmer would expect the default buffering behaviour to work like the C library, which (on Linux anyway) is:
stdin - line buffered
stdout - buffered unless it is a TTY (interactive device), when it is line buffered
stderr - unbuffered
[https://eklitzke.org/stdout-buffering has a good explanation]
I was coming to Crystal with this mindset, so "intuitive" to me means this behaviour. Of course to others it may mean the opposite. In my experience most programming languages buffer STDOUT when it is directed to a file.

asterite · 2019-08-31T02:01:46Z

Interesting article, thank you! I guess I don't know what's best to do.

sam0x17 · 2019-08-31T19:05:28Z

I'm not super familiar with the default behaviors for this in different environments, but I've always wanted it to be time based, as in flush if data sits in the buffer unflushed for more than X milliseconds. This could be implemented by having a timestamp for the last flush and knowledge that there is data in the buffer.

michael-selig · 2019-09-02T01:40:45Z

@asterite I noticed that unlike crystal, ruby appears to flush STDOUT & STDERR before forking a new process. eg: Try running

STDOUT << "1 "
system "echo -n 2"
STDOUT << " 3\n"

redirected to a file using both ruby & crystal (ensuring STDOUT is buffered). I think Ruby's behaviour addresses your concern & is more intuitive.

asterite · 2019-09-02T10:32:57Z

@michael-selig No, it doesn't solve my problem.

The problem is this. You have one program that outputs to STDOUT. For example:

# subprogram.cr
100.times do |i|
  puts i
  sleep 1
end

Then you have another program that streams the output of that program to a file:

# program.cr
file = File.open("output.txt")
Process.run("subprogram", output: file)

Now you execute program.cr and want to see if it works. You do tail -f output.txt and you don't get any output. You only get output when the buffer is flushed, which might take a while.

How do you fix it? There's no way to fix this without recompiling subprogram.cr to do STDOUT.sync = true or by making it configurable.

I just tried this program in Go:

package main

import (
	"fmt"
)

func main() {
	for i := 0; i < 100; i++ {
		fmt.Println(i)
	}
}

When you run it with TTY it takes 2.5 seconds. When you run it and redirect to a file it takes 4 seconds (I don't know why). And I can tail -f just file the file that I'm piping to.

I also tried putting a sleep in between, using Println or Print and I can always tail the program.

It seems Go cares more about being able to tail a file and get immediate output than to win a few extra seconds.

I think that if you do care about those extra seconds you can always enable C's behaviour by doing STDOUT.sync = STDOUT.tty?. It's just one line that you need to add to your program. For any other file you'll get expected (but possibly slower) behavior.

I think we should revert the PR that changed this.

asterite · 2019-09-02T11:29:49Z

A bit more: now I understand that fmt.Println in Go just outputs to os.Stdout and that's unbuffered. You can also do fmt.Prinltn(os.Stdout, "...") with the same result. If you want it buffered then you must create a buffered wrapper around that and use that.

Well, in our case we can let STDOUT.sync = true always be true and let users change it if they want to.

michael-selig · 2019-09-03T01:46:22Z

@asterite Sorry, I did misunderstand your problem, which I thought was to show that the output of the subprogram could be written to the file before stuff written previously by the calling program. This is what my example tried to show.

I have never used Go, so I can't really comment on your example. However, I tested the code in your "subprogram.cr" in Ruby, and equivalent code in Python and C (using buffered STDIO), and all 3 behave like Crystal does using buffered output - nothing is written to the output file until the program finishes.

Well, in our case we can let STDOUT.sync = true always be true and let users change it if they want to.

Or we can do the opposite :-)

The only way I can think of to make it "intuitive" to everyone and also maintain good performance is to implement what @sam0x17 suggested: when data is written into an empty buffer, set a timer to flush the buffer. It would also need to force a flush of the buffer before passing the file descriptor to another process. However, I am not sure if this "automatic timed flushing" is really worth the effort of implementing, given that I am not aware of other languages that do.

The usage cases where buffered behaviour is unintuitive seem to be things like logging or writing status to STDOUT in long-running programs, or using STDOUT for debug messages, where the order and timing of those messages is important. In short-running programs that write data (text, HTML, XML, JSON etc) to STDOUT, speed should win.

So assuming that "automatic timed flushing" isn't implemented, for which types of programs should coders have to worry about buffering? I would suggest the former.

RX14 · 2019-09-09T23:38:05Z

At the very least flush_on_newline should be true for STDOUT and STDERR, sync would be fine too. Especially for STDERR.

I've been bitten by this, when writing a program which outputs newline-deliminated JSON to the parent process via stdout. Behaviour shouldn't differ between TTY and pipe. It makes debugging a nightmare.

If your application depends on stdout performance (very rare) it can always change the mode.

michael-selig · 2019-09-10T01:20:50Z

Behaviour shouldn't differ between TTY and pipe. It makes debugging a nightmare.

Unix & Linux programmers have come to expect that STDOUT works like that in many if not most languages. Also when you want to write a large amount of data to STDOUT via a pipe, why should performance suffer? Sure, if you're just writing a small amount of JSON or whatever to STDOUT you won't notice the difference.

If your application depends on stdout performance (very rare) it can always change the mode.

Not sure if I agree that this is rare. I would argue that if you want to debug you can always change the mode. That way you get performance by default. If it's the other way around, programmers who aren't aware of the need to set buffering on STDOUT will get poorer performance and likely not even be aware of it. At least when you are debugging, you'll soon find out that it is being buffered!

RX14 · 2019-09-17T13:33:56Z

Poor performance is far far far less frustrating to a user than their application changing semantics for unclear reasons. Because some people are used to it already, is not a good argument for this

asterite · 2019-09-17T14:21:52Z

Because some people are used to it already, is not a good argument for this

Exactly my thoughts!

However, I tried the original program with STDOUT.sync = true and it takes like 1 minute vs half a second to finish. I think flush_on_newline can improve the situation a bit, making it take 10 seconds instead of 1 minute, but still a lot more than half a second... but maybe it's acceptable for outputting 10000000 times "a" * 60. And one can also make it faster by just doing STDOUT.flush_on_newline = false.

So I'm thinking maybe it's better to revert the whole thing to how it was before this entire discussion started. It was actually pretty good back then.

asterite · 2019-09-17T14:22:12Z

But I'll try to discuss this today with Juan and Brian if I have time.

asterite · 2019-09-17T20:19:54Z

We talked about this today.

On one hand we think not buffering when writing to a file is good because you just want to write it as fast as possible. This matches what C, Ruby, Python and current Crystal does.

On the other hand we'd like there to be output from time to time so a user watching that file, for example with tail -f, or via another process, can see (not necessarily real-time) feedback.

So maybe the idea of having it fully buffered but flushing from time to time, say, once per second, could be ideal: prevents too many syscalls but you get feedback.

But so far we didn't agree on any approach, plus we don't really know how to do the "flush every X time unit".

Right now the behavior matches C, Python and Ruby so it's good enough. But let's leave this open and come back to this later.

michael-selig · 2019-09-18T00:26:38Z

Tough decision over 1 line of code!

One more suggestion if I may:
If we are sticking with buffering, flush STDOUT before forking another process (which is inheriting STDOUT). This at least ensures that the output appears before that of the sub process. I believe that Ruby does this.

sam0x17 · 2019-09-18T15:47:02Z

To reiterate my thoughts, I think flushing every X bytes or X milliseconds, which ever happens first, is the way to go eventually, as this will perform well and have the expected behavior in both use cases.

RX14 · 2019-09-18T20:27:10Z

plus we don't really know how to do the "flush every X time unit".

spawn do
  loop do
    sleep 100.milliseconds
    STDOUT.flush
    STDERR.flush
  end
end

But, I still prefer flush_on_newline.

Printing "aaaaaa" 10 million times is a microbenchmark and the results are hardly representitive of real-world applications.

michael-selig · 2019-09-19T00:04:37Z

A possible compromise:
There is a method Kernel#print which writes to STDOUT & flushes. Perhaps the other Kernel print-to-STDOUT methods should flush too (puts, p, printf)? You will then get line buffered output if you use these simple methods (eg: debugging, logging) but fully buffered otherwise. Yes, making Kernel#puts flush would make the original example slower, but if it is clearly documented, you would know to use STDOUT.puts if you want buffered output.

I am actually uncertain if this is a great idea, but I think it does make behaviour more obvious than having to specifically turn on or buffering. It won't help if you pipe buffered output to tail -f. But it does make the IO method buffering behaviour of STDOUT consistent with other files (except when it's a tty).

asterite · 2019-09-19T09:58:43Z

The behavior you describe is how it was done in the very beginning of Crystal. But it wasn't very nice that doing puts vs stdout puts behaved differently.

sam0x17 · 2019-09-20T14:33:32Z

I'll revise my suggestion -- we should flush on:

newline
buffer filled
50ms has passed

Whichever occurs first

RX14 · 2019-09-20T17:07:39Z

If you flush on newline and 50ms, then you might as well just flush on newline and loose the complexity.

sam0x17 · 2019-09-21T20:30:38Z

@RX14 I disagree. Suppose you print a few characters and then sit for 100ms. It would be odd for that not to get flushed, but neither a newline detector nor a buffer flush are going to account for that.

asterite · 2020-03-24T17:30:34Z

I really thing we should bring back the old functionality.

https://gitter.im/crystal-lang/crystal?at=5e7a32ff65bc2a6fa687efd6

just lost 2 hours because doing output in systemd services doesn't flush stdout anymore :P
is there some way to get IO#flush_on_newline back?

It's really unintuitive that flushing is disabled in one case or the other.

I very much prefer things to be a bit slower than unexpected.

Thoughts?

oprypin · 2020-03-25T09:50:14Z

Another earlier reference on gitter
https://gitter.im/crystal-lang/crystal?at=5e597ebb376d882250bfde5e
Yes, please. Thanks for the consideration.

asterite added the topic:stdlib:files label Feb 14, 2019

asterite added the status:discussion label Feb 14, 2019

asterite mentioned this issue Feb 21, 2019

IO: remove flush_on_newline and only sync on TTY devices #7470

Merged

bcardiff closed this as completed in #7470 Mar 29, 2019

asterite reopened this Aug 30, 2019

asterite mentioned this issue Mar 24, 2020

Add IO::Buffered#flush_on_newline back and set it to true for non-tty #8935

Merged

asterite closed this as completed in #8935 Mar 25, 2020

sstadick mentioned this issue May 24, 2020

Do not flush buffer every newline lh3/biofast#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should STDOUT always set flush_on_newline by default? #7431

Should STDOUT always set flush_on_newline by default? #7431

michael-selig commented Feb 14, 2019

ysbaddaden commented Feb 14, 2019

michael-selig commented Feb 14, 2019

asterite commented Feb 14, 2019 •

edited

Loading

asterite commented Feb 14, 2019

SlayerShadow commented Mar 30, 2019

asterite commented Mar 30, 2019

asterite commented Aug 30, 2019

asterite commented Aug 30, 2019

michael-selig commented Aug 30, 2019 •

edited by jhass

Loading

asterite commented Aug 31, 2019

sam0x17 commented Aug 31, 2019 •

edited

Loading

michael-selig commented Sep 2, 2019

asterite commented Sep 2, 2019 •

edited

Loading

asterite commented Sep 2, 2019

michael-selig commented Sep 3, 2019

RX14 commented Sep 9, 2019

michael-selig commented Sep 10, 2019

RX14 commented Sep 17, 2019

asterite commented Sep 17, 2019

asterite commented Sep 17, 2019

asterite commented Sep 17, 2019

michael-selig commented Sep 18, 2019

sam0x17 commented Sep 18, 2019

RX14 commented Sep 18, 2019

michael-selig commented Sep 19, 2019

asterite commented Sep 19, 2019

sam0x17 commented Sep 20, 2019 •

edited

Loading

RX14 commented Sep 20, 2019

sam0x17 commented Sep 21, 2019

asterite commented Mar 24, 2020

oprypin commented Mar 25, 2020

Should STDOUT always set flush_on_newline by default? #7431

Should STDOUT always set flush_on_newline by default? #7431

Comments

michael-selig commented Feb 14, 2019

ysbaddaden commented Feb 14, 2019

michael-selig commented Feb 14, 2019

asterite commented Feb 14, 2019 • edited Loading

asterite commented Feb 14, 2019

SlayerShadow commented Mar 30, 2019

asterite commented Mar 30, 2019

asterite commented Aug 30, 2019

asterite commented Aug 30, 2019

michael-selig commented Aug 30, 2019 • edited by jhass Loading

asterite commented Aug 31, 2019

sam0x17 commented Aug 31, 2019 • edited Loading

michael-selig commented Sep 2, 2019

asterite commented Sep 2, 2019 • edited Loading

asterite commented Sep 2, 2019

michael-selig commented Sep 3, 2019

RX14 commented Sep 9, 2019

michael-selig commented Sep 10, 2019

RX14 commented Sep 17, 2019

asterite commented Sep 17, 2019

asterite commented Sep 17, 2019

asterite commented Sep 17, 2019

michael-selig commented Sep 18, 2019

sam0x17 commented Sep 18, 2019

RX14 commented Sep 18, 2019

michael-selig commented Sep 19, 2019

asterite commented Sep 19, 2019

sam0x17 commented Sep 20, 2019 • edited Loading

RX14 commented Sep 20, 2019

sam0x17 commented Sep 21, 2019

asterite commented Mar 24, 2020

oprypin commented Mar 25, 2020

asterite commented Feb 14, 2019 •

edited

Loading

michael-selig commented Aug 30, 2019 •

edited by jhass

Loading

sam0x17 commented Aug 31, 2019 •

edited

Loading

asterite commented Sep 2, 2019 •

edited

Loading

sam0x17 commented Sep 20, 2019 •

edited

Loading