Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env::args differs from MSVC CRT in some cases on Windows #44650

Closed
SpaceManiac opened this issue Sep 17, 2017 · 15 comments · Fixed by #87580
Closed

env::args differs from MSVC CRT in some cases on Windows #44650

SpaceManiac opened this issue Sep 17, 2017 · 15 comments · Fixed by #87580
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@SpaceManiac
Copy link

For a command-line of the form "a/"b.exe, where a/b.exe does indeed exist, std::env::args() produces different results than argc and argv in a C++ program. Specifically, CommandLineToArgvW is returning troublesome results. It looks like the CRT and CommandLineToArgvW disagree.

  • Invocation in Command Prompt: "a/"b.exe
  • Expected arguments: [a/b.exe]
  • C++ main() arguments: [a/b.exe]
  • Result of GetCommandLineW: "a/"b.exe
  • Result of CommandLineToArgvW: [a/, b.exe]
  • Result of std::env::args(): [a/, b.exe]

Obviously, this interpretation makes no sense. env::current_exe does not seem to be affected.

A Rust program which demonstrates the mismatch:

fn main() {
    for (i, j) in std::env::args().enumerate() {
        println!("{}: {}", i, j);
    }
}

A C++ program which demonstrates the mismatch:

#include <stdio.h>
#include <windows.h>

int main(int argc, char* argv[]) {
	printf("---- argc and argv think:\n");
	for (int i = 0; i < argc; ++i) {
		printf("%i: %s\n", i, argv[i]);
	}
	
	printf("---- GetCommandLineW is:\n");
	printf("%ws\n", GetCommandLineW());
	
	printf("---- CommandLineToArgvW thinks:\n");
	LPWSTR *szArglist;
	int nArgs;
	szArglist = CommandLineToArgvW(GetCommandLineW(), &nArgs);
	if (szArglist == NULL) {
		wprintf(L"CommandLineToArgvW failed\n");
		return 0;
	}
	for (int i = 0; i < nArgs; i++) {
		printf("%d: %ws\n", i, szArglist[i]);
	}
}
@retep998 retep998 added the O-windows Operating system: Windows label Sep 17, 2017
@mattico
Copy link
Contributor

mattico commented Sep 17, 2017

This probably won't surprise you, but stdlib uses CommandLineToArgvW:

let szArgList = c::CommandLineToArgvW(lpCmdLine, &mut nArgs);

@TimNN TimNN added C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 17, 2017
@notriddle
Copy link
Contributor

notriddle commented Jun 3, 2020

That's fine. It wouldn't be too terrible to write a test case to match with the newer CRT, as long as the libs team was fine with it. This sounds like it's going to end with "you broke backwards compat."

@retep998
Copy link
Member

retep998 commented Jun 3, 2020

Fundamentally this is a big @rust-lang/libs decision of which standard to follow for argument parsing. I'm not particularly concerned with which standard we follow as long as it is Microsoft's standard and we follow it accurately.

Decide between:

  • Maintain status quo of CommandLineToArgvW behavior.
  • Change behavior to match what newer CRT versions do.

@Amanieu
Copy link
Member

Amanieu commented Jun 3, 2020

I personally feel that we should follow the newer standard since that's what all Windows apps are using these days. I'm concerned about the possibility of this being a breaking change but this seems to only affect some rare edge cases in quote handling.

@notriddle
Copy link
Contributor

notriddle commented Jun 4, 2020

I'm legitimately not sure what "all Windows apps are using these days."

PS C:\Users\micha\testexe> cat .\src\main.rs
use std::env;
use winapi::um::processenv::GetCommandLineW;
use std::os::windows::ffi::OsStringExt;
use std::ffi::OsString;
use std::slice;
fn main() {
    println!("env::args() = {:?}", env::args().collect::<Vec<_>>());
    let get_command_line_w = unsafe {
        let wide_ptr = GetCommandLineW();
        let mut wide_len = 0;
        while *wide_ptr.offset(wide_len) != 0 {
            wide_len += 1;
        }
        slice::from_raw_parts(wide_ptr, wide_len as usize)
    };
    println!("GetCommandLineW = {}", OsString::from_wide(get_command_line_w).to_str().unwrap());
}
PS C:\Users\micha\testexe>

Let's try cmd.exe:

C:\Users\micha\testexe>.\target\debug\testexe.exe "a ""b"" c"
env::args() = [".\\target\\debug\\testexe.exe", "a \"b", "c"]
GetCommandLineW = .\target\debug\testexe.exe  "a ""b"" c"

C:\Users\micha\testexe>.\target\debug\testexe.exe "a """b""" c"
env::args() = [".\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = .\target\debug\testexe.exe  "a """b""" c"

C:\Users\micha\testexe>.\target\debug\testexe.exe "a \"b\" c"
env::args() = [".\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = .\target\debug\testexe.exe  "a \"b\" c"

C:\Users\micha\testexe>.\target\debug\testexe.exe "a "\^"b\^"" c"
env::args() = [".\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = .\target\debug\testexe.exe  "a "\"b\"" c"

C:\Users\micha\testexe>

Now let's try PowerShell:

PS C:\Users\micha\testexe> & .\target\debug\testexe.exe "a ""b"" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a "b" c"
PS C:\Users\micha\testexe> & .\target\debug\testexe.exe "a """b""" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a \"", "b c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a "" b" c
PS C:\Users\micha\testexe> & .\target\debug\testexe.exe "a \"b\" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a \" b\\", "c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a \" "b\ c"
PS C:\Users\micha\testexe> & .\target\debug\testexe.exe "a "\^"b\^"" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a ", "\\^b\\^ c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a " \^b\^" c
PS C:\Users\micha\testexe>

I'm not actually sure if there even exists a way to get decent escaping in PS. From the way variables behave, it looks like putting two quotes in a row creates an escaped quote, but they don't get properly re-escaped when you pass them to a command line app:

PS C:\Users\micha\testexe> $test="a ""b"" c"
PS C:\Users\micha\testexe> echo $test
a "b" c
PS C:\Users\micha\testexe> & .\target\debug\testexe.exe $test
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a "b" c"
PS C:\Users\micha\testexe>

Microsoft is making it really difficult to figure out what actually counts as "correct" command line parsing, when there doesn't even seem to be a "right" way that actually works with their stuff.

Let's see what WSL does:

notriddle@DESKTOP-20BNT14:/mnt/c/Users/micha/testexe$ ./target/debug/testexe.exe "a \"b\" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = C:\Users\micha\testexe\target\debug\testexe.exe "a \"b\" c"
notriddle@DESKTOP-20BNT14:/mnt/c/Users/micha/testexe$ ./target/debug/testexe.exe "a ""b"" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = C:\Users\micha\testexe\target\debug\testexe.exe "a b c"
notriddle@DESKTOP-20BNT14:/mnt/c/Users/micha/testexe$ ./target/debug/testexe.exe "a """b""" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = C:\Users\micha\testexe\target\debug\testexe.exe "a b c"
notriddle@DESKTOP-20BNT14:/mnt/c/Users/micha/testexe$

Bash still acts like you expect it, and apparently the WSL-to-Windows bridge generates backslash escapes when it tries to convert WSL's arguments list into a Windows GetCommandLineW string. Nuts that Bash on Ubuntu on Windows does a better job than PowerShell does (at least, it generates command strings that are capable of being unambiguously parsed).

My next thought was to create a file name with quotes in it, and see what kind of parameters Explorer generates for me, but apparently Windows doesn't actually allow quotes in file names.

And, finally, let's see what happens if one Rust program calls another one:

PS C:\Users\micha\testexe2> cat .\src\main.rs
use std::process::Command;
fn main() {
    Command::new("../testexe/target/debug/testexe.exe")
      .arg("a \"b\" c")
      .spawn().unwrap();
}
PS C:\Users\micha\testexe2> .\target\debug\testexe.exe
env::args() = ["../testexe/target/debug/testexe.exe", "a \"b\" c"]
GetCommandLineW = "../testexe/target/debug/testexe.exe" "a \"b\" c"
PS C:\Users\micha\testexe2> 

Apparently, everybody seems to prefer backslashes when generating CLIs for Windows, except for PowerShell which passes quotes unescaped, and cmd.exe which just passes the raw CLI unchanged. So the current approach already seems to have interoperability as good as we would need it to be, but changing it to mimic the current CRT version probably won't break anything, either.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 4, 2020

I think it's safe to say that "all Windows apps" are using Microsoft's C runtime for parsing commandline arguments. If "all" means "the majority of software built since 2008".


In powershell you can use backticks to escape quotes:

.\target\debug\testexe.exe "a `"b`" c"

Or you can surround the argument with single quotes:

 .\target\debug\testexe.exe 'a "b" c'

Or you can pass the entire line verbatim, without interpretation by the shell:

.\target\debug\testexe.exe --% "a "b" c"

Or, if you're feeling particularly adventurous:

.\target\debug\testexe.exe @'
you can write
a multi-line
"argument"
'@

I'm not sure how you reached your last conclusion. It seems odd to use Rust's argument parsing to prove how most Windows programs handle arguments, no?

@notriddle
Copy link
Contributor

In powershell you can use backticks to escape quotes

No I can't. Here's what happens when I try:

PS C:\Users\micha\testexe> .\target\debug\testexe.exe "a `"b`" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a "b" c"

It's exactly the same as when I did it with two quotes in a row. It creates a string object with quotes in it (like we want), but then it turns around and passes it verbatim without re-escaping it.

Or you can surround the argument with single quotes

PS C:\Users\micha\testexe> .\target\debug\testexe.exe 'a "b" c'
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a b c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe" "a "b" c"

It's not PowerShell's syntax that I'm questioning here. I can easily create a string object with quotes in it. What I want to know is what syntax PowerShell expects us to have. What syntax do they use when re-serializing the command line?

Apparently, they don't.

I'm not sure how you reached your last conclusion. It seems odd to use Rust's argument parsing to prove how most Windows programs handle arguments, no?

Rust and Cargo communicate through command-line parameters, so we probably want to make sure that std::process::Command and std::env::args can accurately round-trip their data.

@notriddle
Copy link
Contributor

notriddle commented Jun 4, 2020

The behaviour I was hoping for, by the way, is something like one of these:

PS C:\Users\micha\testexe> .\target\debug\testexe.exe --% "a \"b\" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe"  "a \"b\" c"

PS C:\Users\micha\testexe> .\target\debug\testexe.exe --% "a """b""" c"
env::args() = ["C:\\Users\\micha\\testexe\\target\\debug\\testexe.exe", "a \"b\" c"]
GetCommandLineW = "C:\Users\micha\testexe\target\debug\testexe.exe"  "a """b""" c"

It's the same behaviour that you were getting when you demoed the python interpreter's argument parsing: it's a single argument with spaces and quotes within it. It's "how do I escape a quote?"

@ChrisDenton
Copy link
Member

Just to be absolute clear about how Windows argv works, it might help to completely forget how *nix works.

The Windows kernel doesn't know at all about arguments. As far as it's concerned arguments are just a single string that it can pass on to a new process. No more, no less. It's not an array of strings. It's just one string. This string doesn't have to contain the .exe or anything else. The kernel does not care. This arguments string is what GetCommandLineW gets. It's up to individual applications to send the kernel a string when creating a process. It's up to each individual application to parse it however it likes (or not at all).

So what happens is relatively simple:

  1. The shell sets the CommandLineW string to the .exe being invoked and appends the arguments as the user input them (subject to how the shell itself interprets strings).
  2. The invoked application parses the CommandLineW string according to its own rules (although in reality it usually defers to the C runtime's interpretation).

Step 1 in the above could be replaced by an application. In which case it's entirely up to the application what it sets CommandLineW to. There are no rules.

Of course you usually want to make an arguments string that will be correctly interpreted by other applications. So for compatibility reasons double quotes should be escaped if they are actually wanted in the parsed arguments. Or not if they're unwanted.

Some code for printing arguments

// C++
// Compile: `cl /EHsc /nologo argv.cpp`
#include <iostream>
int wmain(int argc, const wchar_t* wargv[])
{
    for (int i = 0; i < argc; i++) {
        std::wcout << '`' << wargv[i] << '`' << std::endl;
    }
}
// C#
// Compile: `csc /nologo argv.cs`
class MainClass
{
    static int Main(string[] args)
    {
        foreach (string arg in args) {
            System.Console.WriteLine("`" + arg + "`");
        }
        return 0;
    }
}
# Python
import sys
for arg in sys.argv:
    print(f"`{arg}`")

They all work the same, except for Rust which is special. In the following table I've trimmed the enclosing " for GetCommandLineW onwards, just to make it clearer but you can imagine they're still there.

Powershell GetCommandLineW Everyone else Rust
'a "b" c' a "b" c a b c a b c
'a \"b\" c' a \"b\" c a "b" c a "b" c
'a ""b"" c' a ""b"" c a "b" c a "b, c
'a b c"' a b c" a b c" a b c"

@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 5, 2020

In short the powershell user is constructing the CommandLineW string more or less directly, the particulars of string types aside.

But I would add that the intricacies of powershell should have nothing to do with Rust std's handling of command line arguments. So long as the std behaves like other Windows applications it doesn't matter. If powershell makes a nicer way to construct a CommandLineW string then great! Rust will benefit too, assuming it acts the same as other Windows applications.


So to sum up, as you say, Rust has two jobs:

  • Parse the incoming command line string (the Microsoft CRT parsing rules were linked above)
  • Construct an outgoing command line string (for std::process::Command)

Of course it's easy for Rust to be consistent with itself no matter what rules it uses. It just has to test that construction and parsing match up as expected. However, it's obviously beneficial to be consistent with other applications on the platform.


All that said, there's nothing massively wrong with how Rust currently parses arguments. It's just that 12 years ago Microsoft tweaked their CRT's argument parsing rules slightly. Rust uses the old rules so, in some situations, Rust applications may exhibit surprising behaviours to users more accustomed to modern applications.

@notriddle
Copy link
Contributor

Yeah, I agree with everything you said there.

I'm fine with changing to the new CRT behaviour, as long as we don't break anything. And it looks like we won't.

@stej
Copy link

stej commented Jun 8, 2020

PowerShell is not a good example, because the rules are still screwed up. PowerShell/PowerShell#1995 . It's very uncomfortable to call apps from PowerShell. I sometimes ended up with generating a bat file and then running that from PowerShell.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jun 8, 2020

@stej I'd highly recommend installing Powershell 7 if you can. The current stable version is 7.0.1 which has slightly better command line handling, and the upcoming 7.1 release should improve it further. Otherwise, using --% (or manually escaping double-quotes using backticks) to set the command string directly produces the most consistent results.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Sep 1, 2021
…-ou-se

Update Windows Argument Parsing

Fixes rust-lang#44650

The Windows command line is passed to applications [as a single string](https://docs.microsoft.com/en-us/archive/blogs/larryosterman/the-windows-command-line-is-just-a-string) which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as [described on MSDN](https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-160#parsing-c-command-line-arguments) and [further detailed here](https://daviddeley.com/autohotkey/parameters/parameters.htm#WIN). It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of `argv`. See [my repo](https://github.com/ChrisDenton/winarg/tree/std) if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see rust-lang#87580 (comment).

For some examples see: rust-lang#87580 (comment)
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Sep 1, 2021
…-ou-se

Update Windows Argument Parsing

Fixes rust-lang#44650

The Windows command line is passed to applications [as a single string](https://docs.microsoft.com/en-us/archive/blogs/larryosterman/the-windows-command-line-is-just-a-string) which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as [described on MSDN](https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args?view=msvc-160#parsing-c-command-line-arguments) and [further detailed here](https://daviddeley.com/autohotkey/parameters/parameters.htm#WIN). It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of `argv`. See [my repo](https://github.com/ChrisDenton/winarg/tree/std) if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see rust-lang#87580 (comment).

For some examples see: rust-lang#87580 (comment)
@bors bors closed this as completed in 1cf8fdd Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-windows Operating system: Windows T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants