Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add support for WASM_BUILD_CLEAN_TARGET environment variable that cleans up unnecessary wasm build artifacts #12672

Conversation

nazar-pc
Copy link
Contributor

Resolves #12671

@nazar-pc nazar-pc requested a review from koute as a code owner November 10, 2022 04:38
@nazar-pc nazar-pc mentioned this pull request Nov 10, 2022
1 task
@nazar-pc
Copy link
Contributor Author

nazar-pc commented Nov 10, 2022

Hm, worked locally, but fails in CI with reason that doesn't make much sense (on Windows): https://github.com/subspace/subspace/actions/runs/3434067151/jobs/5725035860

Upstream bug report: brson/wasm-opt-rs#116

@koute
Copy link
Contributor

koute commented Nov 10, 2022

Maybe that's because Windows doesn't let you remove files/directories which are still used by something?

Anyway, hm, I'm not entirely convinced that this is the way we should fix the problem. (:

The main issue here is that every runtime we build gets its own workspace and its own target directory, which means that for every runtime the same crates get unnecessarily compiled and stored separately. This wasn't so bad when we had only one runtime or two, but now substrate apparently compiles 6 runtimes, and cumulus has 14!

On my 32-core Threadripper it takes something like ~6 minutes to run a cargo check in a fresh cumulus repository**, and the whole wbuild directory takes 11GB. This isn't sustainable.

I think a proper fix would be to just rework how we build the runtimes and have a shared namespace for building all of them. That would drastically cut down on the compile times and the storage space required.

cc @paritytech/sdk-node

** - even though it is a check it actually builds all of the runtimes; we should probably fix this, but that's another issue

@nazar-pc
Copy link
Contributor Author

Maybe that's because Windows doesn't let you remove files/directories which are still used by something?

As far as I see, the previous process (cargoc rustc) has finished successfully and nothing should hold onto that particular file unless build script of wasm-opt-cxx-sys spawns some background process.

I think a proper fix would be to just rework how we build the runtimes and have a shared namespace for building all of them. That would drastically cut down on the compile times and the storage space required.

I was considering to put all crates under wbuild under workspace to share most of the crates, but we can't build them concurrently in that case and can't build together because a) called from build.rs of different crates and b) will cause feature unification in Cargo breaking everything. It feels like building them sequentially would potentially negate the benefit of saving on disk space and compilation time.

@bkchr
Copy link
Member

bkchr commented Nov 10, 2022

I think a proper fix would be to just rework how we build the runtimes and have a shared namespace for building all of them. That would drastically cut down on the compile times and the storage space required.

The initial implementation was doing exactly this and I have then reworked this to using different target directories. The reasoning behind this was the parallelism. If all are sharing the same target directory, all runtimes would build sequentially and that is really slow ;)

On my 32-core Threadripper it takes something like ~6 minutes to run a cargo check in a fresh cumulus repository**, and the whole wbuild directory takes 11GB. This isn't sustainable.

cargo check should be run with SKIP_WASM_BUILD=1 and then we don't build the wasm runtimes. IMO the best solution here. The size of wbuild, storage is cheap 🤷

To the pr in general, I'm not convinced that we should need this. If your CI requires this, you can just write some lines like rm -rf path to do the same.

@nazar-pc
Copy link
Contributor Author

If your CI requires this, you can just write some lines like rm -rf path to do the same.

It is more involved, I just run cargo test and a whole bunch of stuff gets built. So much that it runs out of space, so I'd need to delete things in build.rs files, which is certainly possible, but I thought upstreaming it would make sense too.

@koute
Copy link
Contributor

koute commented Nov 10, 2022

If all are sharing the same target directory, all runtimes would build sequentially and that is really slow ;)

Well, not necessarily. (:

We could put them in the same workspace and the same target directory, and let them still compile in parallel, and since the dependencies would be shared (so they wouldn't have to be compiled up to 14 times as it is the case in cumulus) it should speed up the compilation. This would of course need a slight rework to how we do the compilation, but I think it should be doable.

The one problem that I can see here (and which @nazar-pc noted) is that cargo will unify the dependencies' features when compiling all of the crates in the workspace, so when building the runtimes it'd have to do it in two steps: first call cargo build --all, and then for each runtime call cargo build -p $crate. If the features are the same then the second cargo build would be a no-op and not take any time, and if they're not then it would only have to rebuild those.

So in the optimal case if the feature flags are the same across all of the runtimes we'd get a significant speedup in compile times. The more feature flags differ the slower it'd get, since then it would have to compile those sequentially.

Or another alternative: have a separate target directory per a set of feature flags. (We could use cargo metadata to see which features are enabled.) So in the optimal case they'd compile in parallel within the same workspace, and in the worst case it'd degenerate to essentially what we have now compiling in parallel in separate workspaces.

(And if you're wondering what we'd do when someone would like to compile only a single runtime, the answer is also feature flag abuse, but this time in a beneficial way to detect which runtime has to be compiled.)

Of course, while technically doable this does have some extra complexity, so even I'm not entirely convinced that we should do it. Probably depends on how many more runtimes we'll add to cumulus...? (:

@koute
Copy link
Contributor

koute commented Nov 10, 2022

If your CI requires this, you can just write some lines like rm -rf path to do the same.

It is more involved, I just run cargo test and a whole bunch of stuff gets built. So much that it runs out of space, so I'd need to delete things in build.rs files, which is certainly possible, but I thought upstreaming it would make sense too.

Couldn't you just run cargo build to compile only one runtime, delete the unnecessary artifacts while leaving the .wasm blob, and then do it again for another runtime, and once you go through all of them then trigger the tests? (Which should not trigger a rebuild of the runtimes since those are already built, although it is possible that this is broken and it'd still rebuild them. :P)

@bkchr
Copy link
Member

bkchr commented Nov 10, 2022

first call cargo build --all

This doesn't work. When you have a clean target directory, the wbuild directory doesn't exist and also not all the individual projects in there. So, you can not run cargo build --all as these individual projects are there generated over time. We could may scan the entire workspace for stuff that should be compiled to wasm and then generate all projects in the wbuild directory. However, this would let the complexity of the wasm-builder explode even more, which is already quite high :P

We could put them in the same workspace and the same target directory, and let them still compile in parallel, and since the dependencies would be shared (so they wouldn't have to be compiled up to 14 times as it is the case in cumulus) it should speed up the compilation. This would of course need a slight rework to how we do the compilation, but I think it should be doable.

For sure it is doable, as I said it was this way in earlier versions. #7532 here is the pr that removed the sequential runtime builds. A much simpler solution (if it finally works, which IDK), would be to use sccache: https://github.com/mozilla/sccache Then you should get shared deps for "free" without needing to do anything special.

@koute
Copy link
Contributor

koute commented Nov 11, 2022

We could may scan the entire workspace for stuff that should be compiled to wasm and then generate all projects in the wbuild directory.

Well, I mean, yeah, that's what I was assuming we'd have to do. :P But as you said, it would indeed make the thing even more complex.

If we'd want to reduce complexity yet another option would be to just get rid of the most of wasm-builder. Just have the Cargo.tomls for the runtimes be static (including the toplevel workspace Cargo.toml for the runtimes) instead of dynamically generating them, and just check that into the source tree. That of course also has downsides; while simpler and less magic it requires all of the runtimes' sources to be moved to live somewhere under a single directory (but I guess that also simplifies things? :P) and it does require people to migrate (and they're used to the current system; although maybe we could then provide an one-off migration tool that'd do this automatically?).

I don't really see a perfect solution here that doesn't have at least some downsides. 🤷

A much simpler solution (if it finally works, which IDK), would be to use sccache: https://github.com/mozilla/sccache Then you should get shared deps for "free" without needing to do anything special.

Wouldn't that essentially only work only on recompilations? And AFAIK it doesn't support incremental compilation, so it'd work only in --release mode (or we'd have to disable incremental compilation in debug mode).

@bkchr
Copy link
Member

bkchr commented Nov 11, 2022

Wouldn't that essentially only work only on recompilations? And AFAIK it doesn't support incremental compilation, so it'd work only in --release mode (or we'd have to disable incremental compilation in debug mode).

Crates should be directly being shared. At least I would assume so. Regarding incremental compilation, we compile wasm files always in release mode. You need to opt-in debug if you want it.

@stale
Copy link

stale bot commented Dec 11, 2022

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Dec 11, 2022
@stale stale bot closed this Dec 25, 2022
@nazar-pc nazar-pc deleted the support-cleaning-wasm-build-artifacts branch December 26, 2022 10:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to optionally clean up wbuild/*/target in wasm-builder
3 participants