Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: shell script cannot be used as an 'executable' for an action #3589

Closed
davido opened this issue Aug 19, 2017 · 31 comments
Closed

Windows: shell script cannot be used as an 'executable' for an action #3589

davido opened this issue Aug 19, 2017 · 31 comments

Comments

@davido
Copy link
Contributor

davido commented Aug 19, 2017

This is an attempt to fix command line limit problem reported in #3579. It's usual approach to work with a response file. So I adapted the code in question from passing the whole 25k command, that was truncated on 8191 character directly to ctx.action(), that was going through the shell:

ctx.action(
    inputs = inputs,
    outputs = [war],
    mnemonic = 'WAR',
    command = '\n'.join(cmd),
    use_default_shell_env = True,
)

with an intermediate step, generated shell script and pass it as executable to ctx.action() instead:

  # Executable script to generate war archive
  gen_war = ctx.new_file(ctx.label.name + "_gen_war.sh")
  ctx.file_action(gen_war, '\n'.join(cmd), True)

   ctx.action(
     inputs = inputs,
     outputs = [war],
     mnemonic = 'WAR',
     executable = gen_war,
     use_default_shell_env = True,
   )

This works as expected on Linux, but is failing on Windows with:

CreateProcess(): %1 is not a valid Win32 application.

I would expect that Bazel does some heuristic here and guess that the passed file is a shell script, for example, by trying to parse the first file line and see whether or not the first line is a shebang or something and shell out instead of trying to pass previously generated shell script directly to create process.

@damienmg
Copy link
Contributor

@laszlocsomor: can you take a look? Thanks!

@laszlocsomor
Copy link
Contributor

Ok, i'll take a look

@laszlocsomor
Copy link
Contributor

@davido : I dislike the idea to special-case shell scripts in Skylark actions on Windows. The rule had to "magically" depend on Bash if the executable looked like a shell script or war file.
Can you use ctx.run_shell instead? That should behave correctly on all platforms.

@laszlocsomor
Copy link
Contributor

@davido
Copy link
Contributor Author

davido commented Aug 23, 2017

@laszlocsomor Thanks for the pointer, I wasn't aware of ctx.run_shell method, and was only trying to rectify #3579. Interestingly, running shell script just works on Linux.

The rule had to "magically" depend on Bash if the executable looked like a shell script or war file.

But it does already, isn't? I can pass command line and it just works. Moreover I was wondering about inconsistency: command = '\n'.join(cmd) works (but truncated after 8k) , while executable = gen_war works on Linux but failing on Windows.

Another option would be to expose command_shell = gen_war in ctx.action()and use ctx.run_shell() internally?

I will replace the call and let you know if it fixed the problem. Thanks again.

@laszlocsomor
Copy link
Contributor

@davido : Thanks for trying it!

Interestingly, running shell script just works on Linux.

It works because Linux special-cases files that start with #!:

Under Unix-like operating systems, when a script with a shebang is run as a program, the program loader parses the rest of the script's initial line as an interpreter directive; the specified interpreter program is run instead, passing to it as an argument the path that was initially used when attempting to run the script.

Source: https://en.wikipedia.org/wiki/Shebang_(Unix)

But it does already, isn't? I can pass command line and it just works.

Good point. The shell is the default executable of actions unless you set another one. As you said, executing a shell script just works on Linux, but it doesn't work on Windows because the .sh extension is not executable.

I stand corrected though. We could set the actual executable to the shell if the action requested to use a shell script or war file. But then we must also rewrite the argument vector, and maybe quote the arguments as well. I'd rather not do that.

Moreover I was wondering about inconsistency: command = '\n'.join(cmd) works (but truncated after 8k) , while executable = gen_war works on Linux but failing on Windows.

My previous paragraph explains this.

Another option would be to expose command_shell = gen_war in ctx.action()and use ctx.run_shell() internally?

I think the cost of widening and maintaining the already large interface of ctx.action outweighs the small convenience benefit.

I will replace the call and let you know if it fixed the problem. Thanks again.

Thanks!

@davido
Copy link
Contributor Author

davido commented Aug 24, 2017

Unfortunately, it doesn't work: http://paste.openstack.org/show/619248/

If I change ctx.action() invocation with ctx.run_shell():

ctx.actions.run_shell(
    inputs = inputs,
    outputs = [war],
    mnemonic = 'WAR',
    command = '\n'.join(cmd),
    use_default_shell_env = True,
  )

I get:

[...]
Action failed to execute: java.io.IOException: CreateProcess(): Falscher Parameter.

And even if we sort out, why my current bazel version built from master is failing to correctly invoke ctx.actions.run_shell(), i cannot see how this could work (and how ctx.actions.run_shell() differ from ctx.action()), given that the generated script is 25k, see the whole story in #3579, so that the attepmt to pass the lines as parameter would be subject of command line limitation on 8191 char and fail? IOW, the run_shell is missing attribute: shell_script to pass a file and not a string?

ctx.action() already works on all platforms and i can pass a string as command, except that the lines are truncated on 8191 char on Windows.

What I'm looking for, is the possibility to switch to using generated shell script file (25k size) in Skylark action on Windows. It seems that it's currently not supported. In which case this is a blocker to build Gerrit on Windows, because the generated script to package WAR file is too big, and cannot be passed as parameter to the shell without truncation.

@laszlocsomor
Copy link
Contributor

I think run_shell should automatically create a shell script if the command line is too long for the platform. genrule does the same. If run_shell doesn't do that, as it seems to, that's a bug.

Let me look into that.

@davido
Copy link
Contributor Author

davido commented Aug 24, 2017

I think run_shell should automatically create a shell script if the command line is too long for the platform. genrule does the same. If run_shell doesn't do that, as it seems to, that's a bug.

But wait, if in some code paths shell script is automatically created if the command line is too long, then it raises the question why ctx.action() doesn't do it?

Is it a bug in ctx.action(), that could be easily fixed, see #3579? In ideal world, this would just work:

ctx.action(
    inputs = inputs,
    outputs = [war],
    mnemonic = 'WAR',
    command = '\n'.join(cmd),
    use_default_shell_env = True,
)

... on all platforms, and even on Windows, even if the script is longer then 8191 chars.

@laszlocsomor
Copy link
Contributor

But wait, if in some code paths shell script is automatically created if the command line is too long, then it raises the question why ctx.action() doesn't do it?

Because ctx.action() is buggy, it seems. I just tested the following scenario on Linux. I created a simple Skylark rule that executes a very long shell command, and a genrule that executes the same command. The genrule created a script from the command, but the Skylark action didn't.

Also earlier I confused two things:

  • Bazel creates a shell script from genrule.cmd if it is longer than some limit.

    I don't know this limit but it's not important anyway. genrule.cmd is a series of shell commands so Bazel can always safely write it to a file and execute that file as a shell script.

  • Bazel creates a parameter file for SpawnActions and passes that file instead of the args, as long as the SpawnAction:

    • enabled parameter file usage (because the executable must support it too), and
    • has a longer argv than the shell's limit.

Important: ~8K is cmd.exe's limit, but CreateProcess supports command lines up to 32K long. Meaning that a single program can have an argv of about 32K characters. No shell script trick can work around that, only parameter file usage can.

By the way ctx.action() is deprecated in favor of actions.run() / actions.run_shell().
https://docs.bazel.build/versions/master/skylark/lib/ctx.html#action

Is it a bug in ctx.action(), that could be easily fixed, see #3579? In ideal world, this would just work:

ctx.action(
    ...
    command = '\n'.join(cmd),
)

Yes, I think it's a bug.

@davido
Copy link
Contributor Author

davido commented Aug 25, 2017

Thanks for the detailed explanation. I also missed the fact that ctx.action() is deprecated now. I regularly read the release infos for every single Bazel release, but somehow I missed that.

So, am I getting it correctly, that the question boils down as for why ctx.actions.run_shell() doesn't work as expected? I wonder, if it is easily possible to create more verbose output without actually really debugging Bazel? I could easily patch and re-build and re-run this attempt with patched Bazel to dump more detailed information what CreateProcess() is actually doing in this code path.

@davido
Copy link
Contributor Author

davido commented Aug 25, 2017

I guess we could dump much more infos here, right?

if (!ok) {
result->error_ = bazel::windows::GetLastErrorString("CreateProcess()");
return PtrAsJlong(result);
}

@laszlocsomor
Copy link
Contributor

Yes, but why?

Btw I remembered one more thing:

  • actions.run should run a native command (i.e. call CreateProcess on Windows).
  • actions.run_shell should run a shell command (i.e. call bash/bash.exe)

@laszlocsomor
Copy link
Contributor

...and that's why run_shell could automatically create a helper shell script if the command is too large -- we know it's a shell command -- but run can't and shouldn't.

@laszlocsomor
Copy link
Contributor

Yes, but why?

Sorry, I missed the question part from your previous comment.

I wonder, if it is easily possible to create more verbose output without actually really debugging Bazel? I could easily patch and re-build and re-run this attempt with patched Bazel to dump more detailed information what CreateProcess() is actually doing in this code path.

Yes, you could patch it there, e.g. by dumping mutable_commandline to a file.

@davido
Copy link
Contributor Author

davido commented Aug 25, 2017

Yes, you could patch it there, e.g. by dumping mutable_commandline to a file.

Have you done something similar in the past or could provide some code pointers where something similar is done already?

@laszlocsomor
Copy link
Contributor

Yes, for one-off debugging purposes I have. I'd simply append it to a hardcoded file; nothing fancy. Something like this (not tested):

FILE* fh = fopen("c:/tempdir/bazel.log", "at");
fprintf(fh, "%s\n", mutable_command.c_str());
fclose(fh);

@davido
Copy link
Contributor Author

davido commented Aug 25, 2017

Thanks, @laszlocsomor , will test it and let you know.

Actually this JNI place or the code this JNI code is called from is always very interesting from debugging/dump perspective. So, I wonder why we can't easily instrument this native code and guard it with

// Uncomment when needed 
// #define DEBUG_ 42
[...]
#ifdef DEBUG_
    <dump the whole truth>
#endif

Looks like this is missing and such diagnostic code could be even committed there?

@laszlocsomor
Copy link
Contributor

You're welcome, and thanks!

Looks like this is missing and such diagnostic code could be even committed there?

I don't see the need for it. I believe adding debugging code wouldn't be any harder than discovering and understanding the existing ifdef'd code. However, I'm also not against it, so if you'd like to submit something, go ahead :)

@laszlocsomor
Copy link
Contributor

As for actions.run_shell writing a dispatcher script like genrule does: i have a WIP change that works on Linux but crashes on Windows. I'll investigate next week.
(https://bazel-review.googlesource.com/c/bazel/+/15850)

@davido
Copy link
Contributor Author

davido commented Aug 26, 2017

@laszlocsomor Wau, it just worked with your patch: https://bazel-review.googlesource.com/c/bazel/+/15850.

I adjusted the pkg_war.bzl:

  ctx.actions.run_shell(
    inputs = inputs,
    outputs = [war],
    mnemonic = 'WAR',
    command = '\n'.join(cmd),
    use_default_shell_env = True,
  )

Applied gerrit patch, built Bazel from scratch, and now gerrit WAR creation just worked:

bazel --nomaster_bazelrc build --workspace_status_command=/C/users/davido/projects/gerrit/tools/workspace-status.cmd --verbose_failures -s :headless
[...]
  SET PATH=C:\msys64\usr\bin;C:\msys64\bin;C:\Python36;C:\msys64\usr\local\bin;C:\msys64\usr\bin;C:\msys64\usr\bin;C:\msys64\opt\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\msys64\usr\bin\site_perl;C:\msys64\usr\bin\vendor_perl;C:\msys64\usr\bin\core_perl
    SET TEMP=C:\Users\davido\AppData\Local\Temp
    SET TMP=C:\Users\davido\AppData\Local\Temp
  C:/msys64/usr/bin/bash.exe bazel-out/msvc_x64-fastbuild/genfiles/headless.run_shell.sh
____[277 / 278] Still waiting for 1 job to complete:
      Running (local):
        WAR headless.war, 10 s
Target //:headless up-to-date:
  C:/msys64/tmp/_bazel_davido/_biltsqa/execroot/gerrit/bazel-out/msvc_x64-fastbuild/bin/headless.war
____Elapsed time: 73,719s, Critical Path: 70,37s

And testing it, reveals, that it even works ;-)

$ "$JAVA_HOME/bin/java" -jar C:/msys64/tmp/_bazel_davido/_biltsqa/execroot/gerrit/bazel-out/msvc_x64-fastbuild/bin/headless.war init -d ../test
[2017-08-26 10:07:18,004] [main] INFO  com.google.gerrit.server.config.GerritServerConfigProvider : No C:\Users\davido\projects\gerrit\..\test\etc\gerrit.config; assuming defaults
Generating SSH host key ... rsa... dsa... ed25519... ecdsa 256... ecdsa 384... ecdsa 521... done
Initialized C:\Users\davido\projects\test

@laszlocsomor
Copy link
Contributor

Whoa, nice! :) Thanks for trying it, I'm glad it works.

@laszlocsomor
Copy link
Contributor

I'm unlikely to get this change upstreamed this week because of other work I have to do.

@davido
Copy link
Contributor Author

davido commented Aug 28, 2017

No problem. Hope to see this CL merged ASAP. WAR gathering is pretty much the last missing link in the chain to produce headless.war for Gerrit project on Windows.

@laszlocsomor
Copy link
Contributor

Ah, there's also #3636 .

An MSYS bug is responsible for this issue and it prevents Bazel from running shell scripts with long chains of commands, i.e. write long extremely genrule commands to shell scripts and dispatch to Bash.

@laszlocsomor
Copy link
Contributor

David, snapshot 3 in https://bazel-review.googlesource.com/c/bazel/+/15850 contains the fix.

I tested with the following bzl file:

def _impl(ctx):
  out1 = ctx.outputs.out1
  out2 = ctx.outputs.out2

  ctx.actions.run_shell(
      outputs=[out1],
      progress_message="Running your command 1",
      command="( %s ; ) > %s" % (ctx.attr.cmd1, out1.path))

  ctx.actions.run_shell(
      outputs=[out2],
      progress_message="Running your command 2",
      command="( %s ; ) > %s" % (ctx.attr.cmd2, out2.path))

foo = rule(
    implementation=_impl,
    attrs={"cmd1": attr.string(), "cmd2": attr.string()},
    outputs={"out1": "%{name}1.txt", "out2": "%{name}2.txt"},
)

And BUILD file:

load("//foo:foo.bzl", "foo")

foo(
  name = "foo",
  cmd1 = " ; ".join(["echo xxx%d" % i for i in range(0, 1000)]),
  cmd2 = " ; ".join(["echo xxx%d" % i for i in range(0, 1000)]),
)

@davido
Copy link
Contributor Author

davido commented Aug 29, 2017

@laszlocsomor Patch set 4 of your gerrit change just worked (I noticed, it needs a rebase, as it wasn't cleanly applied on recent master):

$  bazel --nomaster_bazelrc build --workspace_status_command=/C/users/davido/projects/gerrit/tools/workspace-status.cmd --verbose_failures -s :headless
____Loading complete.  Analyzing...
____Found 1 target...
____Building...
____[53 / 159] Writing script headless.run_shell_0.sh
____[277 / 278] WAR headless.war
>>>>>>>>> # //:headless [action 'WAR headless.war']
cd C:/msys64/tmp/_bazel_davido/_biltsqa/execroot/gerrit
  SET PATH=C:\msys64\usr\bin;C:\msys64\bin;C:\Python36;C:\msys64\usr\local\bin;C:\msys64\usr\bin;C:\msys64\usr\bin;C:\msys64\opt\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\msys64\usr\bin\site_perl;C:\msys64\usr\bin\vendor_perl;C:\msys64\usr\bin\core_perl
    SET TEMP=C:\Users\davido\AppData\Local\Temp
    SET TMP=C:\Users\davido\AppData\Local\Temp
  C:/msys64/usr/bin/bash.exe bazel-out/msvc_x64-fastbuild/genfiles/headless.run_shell_0.sh
____[277 / 278] Still waiting for 1 job to complete:
      Running (local):
        WAR headless.war, 10 s
Target //:headless up-to-date:
  C:/msys64/tmp/_bazel_davido/_biltsqa/execroot/gerrit/bazel-out/msvc_x64-fastbuild/bin/headless.war
____Elapsed time: 34,942s, Critical Path: 21,59s

I checked the content of generated file: C:/msys64/usr/bin/bash.exe bazel-out/msvc_x64-fastbuild/genfiles/headless.run_shell_0.sh and it looks sane: [1].

Thank you again for the quick fix!

@davido
Copy link
Contributor Author

davido commented Aug 29, 2017

@damienmg @laszlocsomor Just in case you are curious, these are 4 diffs currently needed to patch gerrit master to build :headless rule on Windows: [1]. Just clone gerrit, apply the patch, create this custom workspace-status.cmd file: [2] and run:

  $ bazel --nomaster_bazelrc build --workspace_status_command=/C/users/davido/projects/gerrit/tools/workspace-status.cmd --verbose_failures -s :headless

Note, that I'm using Python 3 to make download_file.py home grown Maven tool chain work.

lucamilanesio pushed a commit to GerritCodeReview/gerrit that referenced this issue Aug 30, 2017
ctx.action() is deprecated in favor of ctx.actions.run() /
ctx.actions.run_shell(): [1]. See discussion in [2], why we need this.

[1] https://docs.bazel.build/versions/master/skylark/lib/ctx.html#action
[2] bazelbuild/bazel#3589

Change-Id: I11af3afebc0d3c76bb8fb59a790db2e3f3ce982b
@davido
Copy link
Contributor Author

davido commented Aug 31, 2017

@laszlocsomor @meteorcloudy pointed out in bazelbuild/continuous-integration#126, that we cannot add Windows CI job for Gerrit project until this fix was released. Can't we accelerate it somehow?

@laszlocsomor
Copy link
Contributor

laszlocsomor commented Aug 31, 2017

@davido : I don't know. I asked a related question on bazelbuild/continuous-integration#126.
FWIW https://bazel-review.googlesource.com/c/bazel/+/15850 is submitted internally, it will land on GitHub the next time we push code. This week's sheriff (and code pusher) is @vladmos.

@vladmos
Copy link
Member

vladmos commented Aug 31, 2017

It will be pushed in a couple of hours.

csilvestrini added a commit to csilvestrini/arcs that referenced this issue Oct 24, 2019
Replaced call to ctx.actions.run with ctx.actions.run_shell.

Windows was trying to execute the script as a win32 binary, which was
failing with the following error:

```
Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(199): CreateProcessW("C:\temp\7qgjmpto\execroot\__main__\bazel-out\x64_windows-fastbuild\bin\particles\Tutorial\Kotlin\3_JsonStore\JsonStore_genrule.sh"): %1 is not a valid Win32 application.
```

Advice from bazelbuild/bazel#3589 is to use
ctx.actions.run_shell instead.
sjmiles pushed a commit to PolymerLabs/arcs that referenced this issue Oct 24, 2019
Replaced call to ctx.actions.run with ctx.actions.run_shell.

Windows was trying to execute the script as a win32 binary, which was
failing with the following error:

```
Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(199): CreateProcessW("C:\temp\7qgjmpto\execroot\__main__\bazel-out\x64_windows-fastbuild\bin\particles\Tutorial\Kotlin\3_JsonStore\JsonStore_genrule.sh"): %1 is not a valid Win32 application.
```

Advice from bazelbuild/bazel#3589 is to use
ctx.actions.run_shell instead.
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this issue Sep 4, 2022
    If the shell command in
    ctx.actions.run_shell.command is longer than the
    platform's shell's limit, Bazel will dump the
    command to a helper shell script and execute that
    script in the run_shell action.

    Genrules also write a helper script when
    genrule.cmd is longer than the shell's limit, and
    ctx.actions.run_shell now uses the same machinery.

    Fixes bazelbuild/bazel#3589

    Change-Id: Ib24dce90182ef69552deb2d400e00ae061537309
    PiperOrigin-RevId: 167126560
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants