-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should STDOUT always set flush_on_newline by default? #7431
Comments
STDOUT is more and more used to log things (systemd, docker expect that), if we don't flush on newline, we can easily reach out-of-sync logs when not logging furiously, or even truncated logs if crystal crashes. I believe the flush on newline default is good, even when it's not a tty. One can always fix this for some cases with If we change the default to change on tty or not, then we should also fix logger to flush after each message. |
That is true, but I would have thought that developers of loggers would have to be conscious of the fact that each log msg needs to be flushed. Should a regular person writing a program writing to STDOUT (like I did) have to worry about whether STDOUT needs to be set to flush or not to get decent performance (one of Crystal's claim to fame)? Also consider code like:
which returns a file with So I would argue that the logger should be the one with the onus of "fixing" the issue by turning on flush_on_newline. Also having STDOUT set to flush by default seems to run contrary to all other languages I know, including Ruby. |
I agree with @michael-selig I originally added The interesting thing is that if you check the value of We should probably do the same. And we could also remove To do that, we can probably check for TTY at the beginning of the program (which we do now) and set Then loggers should flush after each line. For loggers it makes sense. |
(side topic: should we rename |
I was surprised when Kemal logs is suddenly stopped after an ordinary build. 😄 |
Logger does flush, so that might not be the issue. |
I think this might have been a mistake. If you run a crystal program through I think I prefer a slower output that one can still make faster by doing |
Yes. Crystal's philosophy is that you can get really good performance if you want. But we prefer intuitive behavior over performance when possible, and here we broke that rule. |
How default STDIN/OUT/ERR buffering should work has always been controversial - Posix only says that it is "implementation dependent".
I guess this comes down to what you interpret "intuitive" to mean. A C programmer would expect the default buffering behaviour to work like the C library, which (on Linux anyway) is: |
Interesting article, thank you! I guess I don't know what's best to do. |
I'm not super familiar with the default behaviors for this in different environments, but I've always wanted it to be time based, as in flush if data sits in the buffer unflushed for more than X milliseconds. This could be implemented by having a timestamp for the last flush and knowledge that there is data in the buffer. |
@asterite I noticed that unlike crystal, ruby appears to flush STDOUT & STDERR before forking a new process. eg: Try running
redirected to a file using both ruby & crystal (ensuring STDOUT is buffered). I think Ruby's behaviour addresses your concern & is more intuitive. |
@michael-selig No, it doesn't solve my problem. The problem is this. You have one program that outputs to STDOUT. For example: # subprogram.cr
100.times do |i|
puts i
sleep 1
end Then you have another program that streams the output of that program to a file: # program.cr
file = File.open("output.txt")
Process.run("subprogram", output: file) Now you execute How do you fix it? There's no way to fix this without recompiling I just tried this program in Go: package main
import (
"fmt"
)
func main() {
for i := 0; i < 100; i++ {
fmt.Println(i)
}
} When you run it with TTY it takes 2.5 seconds. When you run it and redirect to a file it takes 4 seconds (I don't know why). And I can I also tried putting a It seems Go cares more about being able to tail a file and get immediate output than to win a few extra seconds. I think that if you do care about those extra seconds you can always enable C's behaviour by doing I think we should revert the PR that changed this. |
A bit more: now I understand that Well, in our case we can let |
@asterite Sorry, I did misunderstand your problem, which I thought was to show that the output of the subprogram could be written to the file before stuff written previously by the calling program. This is what my example tried to show. I have never used Go, so I can't really comment on your example. However, I tested the code in your "subprogram.cr" in Ruby, and equivalent code in Python and C (using buffered STDIO), and all 3 behave like Crystal does using buffered output - nothing is written to the output file until the program finishes.
Or we can do the opposite :-) The only way I can think of to make it "intuitive" to everyone and also maintain good performance is to implement what @sam0x17 suggested: when data is written into an empty buffer, set a timer to flush the buffer. It would also need to force a flush of the buffer before passing the file descriptor to another process. However, I am not sure if this "automatic timed flushing" is really worth the effort of implementing, given that I am not aware of other languages that do. The usage cases where buffered behaviour is unintuitive seem to be things like logging or writing status to STDOUT in long-running programs, or using STDOUT for debug messages, where the order and timing of those messages is important. In short-running programs that write data (text, HTML, XML, JSON etc) to STDOUT, speed should win. So assuming that "automatic timed flushing" isn't implemented, for which types of programs should coders have to worry about buffering? I would suggest the former. |
At the very least I've been bitten by this, when writing a program which outputs newline-deliminated JSON to the parent process via stdout. Behaviour shouldn't differ between TTY and pipe. It makes debugging a nightmare. If your application depends on stdout performance (very rare) it can always change the mode. |
Unix & Linux programmers have come to expect that STDOUT works like that in many if not most languages. Also when you want to write a large amount of data to STDOUT via a pipe, why should performance suffer? Sure, if you're just writing a small amount of JSON or whatever to STDOUT you won't notice the difference.
Not sure if I agree that this is rare. I would argue that if you want to debug you can always change the mode. That way you get performance by default. If it's the other way around, programmers who aren't aware of the need to set buffering on STDOUT will get poorer performance and likely not even be aware of it. At least when you are debugging, you'll soon find out that it is being buffered! |
Poor performance is far far far less frustrating to a user than their application changing semantics for unclear reasons. Because some people are used to it already, is not a good argument for this |
Exactly my thoughts! However, I tried the original program with So I'm thinking maybe it's better to revert the whole thing to how it was before this entire discussion started. It was actually pretty good back then. |
But I'll try to discuss this today with Juan and Brian if I have time. |
We talked about this today. On one hand we think not buffering when writing to a file is good because you just want to write it as fast as possible. This matches what C, Ruby, Python and current Crystal does. On the other hand we'd like there to be output from time to time so a user watching that file, for example with So maybe the idea of having it fully buffered but flushing from time to time, say, once per second, could be ideal: prevents too many syscalls but you get feedback. But so far we didn't agree on any approach, plus we don't really know how to do the "flush every X time unit". Right now the behavior matches C, Python and Ruby so it's good enough. But let's leave this open and come back to this later. |
Tough decision over 1 line of code! One more suggestion if I may: |
To reiterate my thoughts, I think flushing every X bytes or X milliseconds, which ever happens first, is the way to go eventually, as this will perform well and have the expected behavior in both use cases. |
spawn do
loop do
sleep 100.milliseconds
STDOUT.flush
STDERR.flush
end
end But, I still prefer Printing "aaaaaa" 10 million times is a microbenchmark and the results are hardly representitive of real-world applications. |
A possible compromise: I am actually uncertain if this is a great idea, but I think it does make behaviour more obvious than having to specifically turn on or buffering. It won't help if you pipe buffered output to |
The behavior you describe is how it was done in the very beginning of Crystal. But it wasn't very nice that doing puts vs stdout puts behaved differently. |
I'll revise my suggestion -- we should flush on:
Whichever occurs first |
If you flush on newline and 50ms, then you might as well just flush on newline and loose the complexity. |
@RX14 I disagree. Suppose you print a few characters and then sit for 100ms. It would be odd for that not to get flushed, but neither a newline detector nor a buffer flush are going to account for that. |
I really thing we should bring back the old functionality. https://gitter.im/crystal-lang/crystal?at=5e7a32ff65bc2a6fa687efd6
It's really unintuitive that flushing is disabled in one case or the other. I very much prefer things to be a bit slower than unexpected. Thoughts? |
Another earlier reference on gitter |
I am just starting with Crystal, being a Ruby programmer for many years. I am impressed with its performance, but playing around, I found one thing that badly affects performance when writing to standard output: that
flush_on_newline
is always set by default onSTDOUT
. My trivial test:takes
real: 10.4s user: 2.4s sys: 8.0s
when its output is redirected to a file. (FYI the same thing run by the Ruby 2.5.1p57 interpreter takesreal: 3.8s user: 3.5s sys: 0.3s
)Yet if I add the line:
STDOUT.flush_on_newline = false
at the top, the time drops dramatically to:
real: 0.5s user: 0.3s sys: 0.2s
, which is more what I was expecting.I suggest that the definition of STDOUT in kernel.cr be changed to:
STDOUT = IO::FileDescriptor.new(1).tap { |f| f.flush_on_newline = true if f.tty? }
so that flush_on_newline is only set when STDOUT is a TTY. I am fairly sure that this is how Unix & Linux C programs normally behave. Typically I believe that standard error is completely unbuffered, but appears to be line buffered in Crystal. This, however, probably isn't a big deal.
The text was updated successfully, but these errors were encountered: