-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command improvements for ergonomics and error handling #3362
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Lucius Hu <1222865+lebensterben@users.noreply.github.com>
I agree with the motivation that |
Um? That's not true. In my RFC, by default, error output is sent to the Rust program's stderr (inherit).
I think this depends what program you're running. I don't think it is wrong to have the option to treat stderr output as an error, as a non-default mode.
No, that's not true either. If you have a |
Ah, I seem to have overlooked that, sorry. |
text/0000-command-ergonomics.md
Outdated
|
||
Runs the command and collects its stdout. | ||
Decodes the stdout as UTF-8, and fails if that's not possible. | ||
Fails unless the output is a single line (with or without line ending). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify the behavior if the program has no output. Does this fail in that case, or is the requirement "at most one line"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should succeed and give an empty string. Will write that into the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this helper would commonly be used for programs for which you expect to parse the output, in which case having to separately check for empty string seems to defeat part of the ergonomic win of using the helper.
But in any case, this isn't a blocker as you've split this helper out into possible future work.
This API cannot be fixed; | ||
see [#73126](https://github.com/rust-lang/rust/issues/73126). | ||
|
||
* Apply `#[must_use]` to `Command` and `Child`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🎉 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make this change without an RFC? Seems somewhat unfortunate to tie this together with the API additions.
text/0000-command-ergonomics.md
Outdated
Starts the command, allowing the caller to | ||
read the stdout in a streaming way. | ||
Neither EOF nor an error will be reported by `ChildOutputStream` | ||
until the child has exited, *and* the stdout pipe reports EOF. | ||
(This includes errors due to nonempty stderr, | ||
if stderr was set to `piped`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description makes me wonder about the semantics of the other functions in this family. Do they wait for process exit, or for process exit and EOF? (I'm hoping the former, or possibly "process exit and then a non-blocking read saying it would block".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stated explicitly in the docs for what I am now calling read_stdout_bytes
. All of these functions wait for both EOF and process exit. I will add a short xref note to the other two functions. In the real docs the spec text should probably be cut and pasted instead.
In a broad sense, I think the approach of this RFC seems mostly reasonable, and I'd like to see something along these lines. There are two categories of issues I'd like to raise. I think both are addressable, and I'm hoping it'll be possible to address them and merge the resulting RFC. One family of issues are largely bikeshed-painting, where the functionality seems perfectly fine but I think the names are unintuitive. For those, I want to be clear that I'm not attached to any particular names, and primarily want to express some desired properties of a name that I think the proposed names don't have. I'd like to align on those properties and then I don't especially care what name we pick that meets those properties. The other family of issues are those of functionality, where I think the proposal makes a common use case more challenging. First, the bikesheds:
Next, the semantics:
|
Shouldn't it be
Sensitive things should not be passed as command line arguments, since those can leave traces in arbitrary places around the OS. For example, they may be included in |
No. On Windows that would be WTF-8, not raw bytes. |
But what if you also want to capture stderr? As @joshtriplett mentioned And should there be functions for getting stdout and stderr as separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is well-motivated broadly, however needs some work:
- I think deprecating
Command::output
is probably too aggressive, or at least needs a lot more justification. - I agree with @joshtriplett in that there's a lot bikeshed-painting to be done about the naming here -- most of these names don't follow the conventions of the ecosystem/stdlib that well.
- I think the mutable/constructable
SubprocessError
portion of the API needs more work -- Ideally it could be a lot smaller and have fewer moving pieces. Perhaps it's trying to be too much at once?
text/0000-command-ergonomics.md
Outdated
|
||
## Deprecations | ||
|
||
* Deprecate `std::process::Command::output()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that Command::output
is that bad. It's slightly awkward and requires manually checking errors, but I'm not convinced this is worth deprecating a extremely widely-used function and causing a large amount of ecosystem churn (which many users will respond to with #[allow(deprecated)]
, due to MSRV+the existing method not being broken).
I think you need to make a much clearer case for why this needs deprecation, and document the downside of ecosystem and educational churn as folks move to the new API(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you and Josh are right about this. I have moved the deprecation to Future possibilities.
text/0000-command-ergonomics.md
Outdated
fn communication_error() -> Option<&io::Error>; | ||
|
||
/// If trouble included failed UTF-8 conversion. | ||
fn utf8_error(&self) -> Option<&std::str::FromUtf8Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which UTF-8 conversion does this refer to? I think this probably should be pushed down to whatever getter would return something UTF-8 validated rather than having this stored as a field on the error.
If this is stored (and almost certainly if it is not), we should use std::std::Utf8Error
rather than a &std::string::FromUtf8Error
in order to avoid needing to store the invalid UTF8 twice (and to avoid forcing us to convert eagerly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using std::str::FromUtf8Error
which doesn't contain or reference the string, but does contain the offset. I think this doesn't suffer from the problem you describe?
text/0000-command-ergonomics.md
Outdated
fn set_stderr_bytes(&mut self, stderr: impl Into<Box<u8>>); | ||
fn set_spawn_error(&mut self, error: Option<io::Error>); | ||
fn set_communication_error(&mut self, error: Option<io::Error>); | ||
fn set_utf8_error(&mut self, error: Option<std::str::FromUtf8Error>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think too much of this is mutable -- either we should just make these be public fields (we have get_
and set_
methods for them...) or we should just allow constructing one of these from parts.
Additionally it's not clear to me how these interact. Up until now, I had imagined that there was an internal enumeration
enum SubprocessErrorCause {
Spawn(io::Error),
Communication(io::Error),
// Actually, I don't think we want this one, but anyway
Utf8(str::Utf8Error),
// ...
}
but this API has individual setters for each one, which implies it either is not that way, or they have perhaps-confusing interactions.
Does it make any sense for there to be more than one error here anway? Which would gets reported in std::error::Error::source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this a transparent struct. Making it opaque was indeed confusing.
text/0000-command-ergonomics.md
Outdated
fn has_problem(&self) -> bool; | ||
} | ||
impl Default for SubprocessError { ... } | ||
impl Clone for SubprocessError { ... } // contained io:Errors are in Arcs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should instead ask users who need a Clone
able SubprocessError
to use Arc<SubprocessError>
, rather than imposing that requirement on the internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I have changed this.
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
text/0000-command-ergonomics.md
Outdated
//// | ||
/// Use this if you want to to tolerate some exit statuses, | ||
/// but still fail if there were other problems. | ||
pub fn just_status(self) -> Result<ExitStatus, SubprocessError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little awkward to use: if you call this to filter out some exit statuses, you lose the rest of the error, which means you don't have stdout_bytes
anymore. What code pattern would you suggest for code that wants to run diff
, check the exit code for 0/1 to determine if there was a difference, bail on >1, and capture the patch output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this (straw proposal) might be easier to use:
/// If the process is only considered to have errored due
/// to its exit code, calls the provided filter, returning
/// `Ok(Self)` if it returns `true`, or `Err(Self)` if it
/// returns `false`.
pub fn permit_statuses(self, f: impl FnOnce(ExitStatus) -> bool) -> Result<ProcessError, ProcessError>
This would make it relatively straightforward to say "allow 0 or 1, reject everything else". (Still mildly annoying due to code()
returning Option
, but doable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like this idea. I wasn't sure people would like the notion of returning Ok(ProcessError)
, but if you think that's OK it seems OK to me.
mildly annoying due to
code()
returningOption
I wondered whether the closure should receive ExitCode
(so would only be called if the process exited rather than was killed). But then you can't use this to ignore (say) SIGPIPE
.
I do kind of hate booleans. Would impl FnOnce(ExitStatus) -> Result<(), ()>
be too awful?
Nit (emphatically not a blocker): your commit message "Rename to ChildOutputStream" seems inaccurate (specifically the "to"), as it renames |
Because maybe several things went wrong, ever providing a `Some` `cause` | ||
would involve prioritising multiple possible problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example where multiple different things could simultaneously go wrong? It seems like spawn_error
, communication_error
, and utf8_error
should be mutually exclusive.
The only things that seem like they could simultaneously go wrong would be "exit status was nonzero" and one of the errors processing output. And I think in that case, exit status should be printed by ProcessError, and the other error should be in cause
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following can all occur in the same execution:
- The application calls
.stderr(piped)
and.read_stdout()
- Spawn succeeds
- The subprocess produces some invalid-UTF output
- Then, something in the kernel goes very badly awry (or, some other thing in our Rust program wrecks our fd) and
read()
on one of the pipes fails - The subprocess prints some stderr (which the application has said to treat as an error)
- The subprocess then crashes.
We must report at least all of: the error from read
; the stderr output (which might well be very helpful for diagnosis); and the process's exit status.
I think we should eagerly do a unicode validity check whenever we captured the stdout, and record the unicode conversion error in the ProcessError
. That way a caller who calls your .permit_status()
and gets Ok
can know that the stdout output is UTF-8; conversely, if the command unexpectedly printed non-UTF-8, the caller can conveniently report the command line etc.
Or to put it another way, we shouldn't "short circuit" some checks and report only a subset of the errors.
text/0000-command-ergonomics.md
Outdated
and we should print only the command name. | ||
|
||
Sometimes people pass sensitive information (such ass passwords) in command line arguments. | ||
This is not a good idea, because command line arguments are generally public on Unix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good idea, because command line arguments are generally public on Unix. | |
This is not a good idea in portable software, because command line arguments are often public on Unix targets. |
text/0000-command-ergonomics.md
Outdated
### Further APIs for `ProcessError` | ||
|
||
A `ProcessError` is a transparent `Default` struct so it can be | ||
constructed outside std, for example by async frameworks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constructed outside std, for example by async frameworks. | |
constructed outside std, such as by async frameworks or other | |
code launching processes and handling errors from them. |
I've nominated this for the next libs-api meeting. I anticipate proposing FCP on it, unless folks raise any other issues before then. |
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
I had been thinking about this, and in particular the implementation, and felt that I'm not 100% sure whether this is better than my previous proposal. One option woudl be to leave this one way or the other for now, and let the implementor decide. |
However, | ||
avoiding deadlocks when reading subprocess output, | ||
and also doing error checks properly, | ||
is rather subtle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify how stdout_reader
makes avoiding this deadlock easier?
Is this just that you need to interleave .write
calls to the process stdin with .read
calls to the reader, rather than calling .write_all
and then reading all the output at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deadlocks that stdout_reader
avoids completely are ones ike: waiting for the command's stdout EOF, and child termination, in the wrong order; waiting for stderr EOF (which can sometimes block forever for complicated reasons).
It helps only a little with deadlocks resulting from piping both into and out of the same Command
. (I think it may still help in those use cases, but it doesn't help with "you wrote too much to the command's input while it's trying to write output to you".)
…omcc Make ExitStatus implement Default And, necessarily, make it inhabited even on platforms without processes. I noticed while preparing rust-lang/rfcs#3362 that there was no way for anyone to construct an `ExitStatus`. This would be insta-stable so needs an FCP.
…olnay Make ExitStatus implement Default And, necessarily, make it inhabited even on platforms without processes. I noticed while preparing rust-lang/rfcs#3362 that there was no way for anyone to construct an `ExitStatus`. This would be insta-stable so needs an FCP.
…olnay Make ExitStatus implement Default And, necessarily, make it inhabited even on platforms without processes. I noticed while preparing rust-lang/rfcs#3362 that there was no way for anyone to construct an `ExitStatus`. This would be insta-stable so needs an FCP.
Make ExitStatus implement Default And, necessarily, make it inhabited even on platforms without processes. I noticed while preparing rust-lang/rfcs#3362 that there was no way for anyone to construct an `ExitStatus`. This would be insta-stable so needs an FCP.
Then, | ||
if the stderr output is nonempty, this is considered an error, | ||
and reported in the `ProcessError`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit uncertain about this semantics. It's popular in command-line applications to use stderr
to collect running commentary about what the program is doing (for example log::info
under env_logger
) in ways that definitely shouldn't be considered errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an experience report that erroring out on non-empty stderr is wrong! One common case here is when you run something like rustc --version
, but that thing is rustup
managed, so, before you get to rustc --version
output, you have printout from rustup
on the stderr that it downloads the appropriate version.
…verbose. I was assuming [`exit_status_error`][1] would be stabilized quickly, but there's been no movement on it for at least a year and there are now other [(possibly) competing/conflicting proposals][2]. This should reduce [confusion when trying to build the launcher][3]. [1]: rust-lang/rust#84908 [2]: rust-lang/rfcs#3362 [3]: #137 (comment)
Rendered