Miscellaneous fixes for Windows builds #1376

ashay · 2022-09-16T00:11:23Z

This patch contains multiple commits that make it possible to build and test Torch-MLIR on Windows. See commit messages for descriptions.

test: allow spaces in path to Python executable
python: remove header file that causes Windows build failures
python: drop TORCH_API from function defined in Torch-MLIR
python: make output of class anotations deterministic
test: use Python3_EXECUTABLE as interpreter path for consistency
test: fix RUN string
python: port parallel test framework to Windows

powderluv · 2022-09-16T06:13:19Z

woah. thank you. I had to reboot that Windows machine to test some linux driver stuff on that GPU so it will be a few days before I get back to Windows. Thank for sharing.

test/Conversion/TorchToMhlo/basic.mlir

test/python/importer/jit_ir/ivalue_import/submodules.py

powderluv

Thank you for this. Happy to test it out once it lands.

ashay · 2022-09-21T21:33:57Z

cc: @powderluv Although I marked this PR as ready for review, this is still not the full set of fixes required for Windows builds (I just didn't want to keep this PR pending for too long). There are a couple of upstream LLVM and StableHLO changes necessary, if you're trying to build at your end. But both those upstream changes should be merged into Torch-MLIR on Monday.

powderluv · 2022-09-21T21:39:41Z

cool Thanks for the heads up. Will wait for the monday roll.

ashay · 2022-09-22T14:32:43Z

Cancelled the CI since it's anyway going to fail until the LTC problem is resolved.

@silvasean Among all these commits, the last commit is perhaps the most controversial. Let me know if you think I should use a different approach (like, say, disabling multi-threaded tests on Windows, instead of adding a new module).

Once approved, I'll be sure to squash the commits into one before merging.

ashay · 2022-09-22T20:05:10Z

Out of the things that I learnt while working on this PR, there are two that are probably relevant to folks who will build on Windows.

There are (at least) three toolchains that can be used to build LLVM on Windows:
(a) MSVC cl as the compiler and MSVC link as the linker
(b) MSVC clang as the compiler and MSVC link as the linker
(c) MinGW g++/clang++ as the compiler and MinGW ld/lld as the linker

(a) and (c) are well supported, whereas (b) produces tons of warnings and a build failure, which was fixed in this upstream change: https://reviews.llvm.org/D134165. I ran into another build failure, this time because of what seems like a bug in the MSVC clang compiler, so I don't recommend using MSVC clang to build LLVM. I also haven't tried (c), although it appears (from the upstream patch reviews) that folks do use MinGW to build LLVM regularly, so presumably, approach (c) should be fine.

When using MSVC cl and MSVC link, avoid overriding LLVM_HOST_TRIPLE because doing so can create a mismatch between the target triple of the compiled programs (which is derived from LLVM_DEFAULT_TARGET_TRIPLE) and the target triple of the JIT (which is derived from LLVM_HOST_TRIPLE). More precisely, by setting LLVM_HOST_TRIPLE=x86_64, we don't specify the environment (i.e. "gnu", "msvc", etc.) which causes the mangling scheme to default to ELF for x86_64. On the other hand, the unmodified LLVM_DEFAULT_TARGET_TRIPLE with MSVC cl uses "msvc" as the environment, causing the mangling scheme to be WinCOFF. The result of this mismatch is that when LLVM JIT parses the compiled module, it causes an assertion failure with the message Added modules have incompatible data layouts. The solution is to let LLVM_HOST_TRIPLE match LLVM_DEFAULT_TARGET_TRIPLE.

Combined, this means that I could use the following CMake command to build and test Torch-MLIR on Windows, modulo the fixes in this PR and the upstream StableHLO changes that will be merged into Torch-MLIR early next week.

cmake -G Ninja -B msvc-build -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=cl -DCMAKE_LINKER=link -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_INSTALL_UTILS=ON -DLLVM_USE_SPLIT_DWARF=ON -DMLIR_ENABLE_BINDINGS_PYTHON=ON -DPython3_FIND_VIRTUALENV=ONLY -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects" -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=%cd% -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=%cd%\externals\llvm-external-projects\torch-mlir-dialects externals\llvm-project\llvm

Happy hacking!

@sstamenova

On Windows, the path to the Python binary may contain spaces, so this patch adds quotes around the path to the python executable. Thanks to @sstamenova for suggesting the fix!

Similar to https://reviews.llvm.org/D125284, we can safely remove this header file without affecting the build on either Linux. It is necessary to remove this header file on Windows builds since otherwise it causes build errors.

`TORCH_API` should apply to functions that are either exported by libtorch.so or ones that are imported from libtorch.so by its downstream consumers (like Torch-MLIR). Neither case applies to the `importJitFunctionAsFuncOp()` function, since it is defined in Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem by dropping `TORCH_API` from that function's declaration.

The `class-annotator-repr.py` test checks for class annotations in a specific order, but prior to this patch, the order was non-deterministic, since the code iterated on an _unordered_ map. This patch makes the iteration order deterministic through two changes: 1. using a sorted map 2. using the class qualified name instead of the address of the class in memory

This ensures that tests use the Python3 version that was detected using CMake, instead of whichever python version that happens to be in the PATH variable when invoking the test.

The parenthesis syntax does not run on Windows (the shell interprets the `(` character as part of the path). Moreover, the ODR violation in the comment no longer seems to apply.

Since Windows does not support `fork` natively, Python's `multiprocessing` module needs to use `spawn` on Windows. However, to use `spawn`, the multiprocessing module serializes (or pickles) the worker function and its arguments. Sadly, the multiprocessing module (both the default one in Python and the one that is extended in PyTorch) is unable to serialize lambda functions (see https://stackoverflow.com/a/19985580) for detals. Unfortunately, given how our tests are structured, we require that the function under test is passed as an argument to another function, so we cannot sidestep our use of lambda functions. To resolve this problem, this patch makes use of the `multiprocess` and `dill` Python modules, which together offers a multiprocessing mechanism that can serialize lambda functions. The multiprocess module also offers a process pool, which simplifies the code for our parallel testing framework.

powderluv · 2022-09-29T17:35:16Z

should we add a Windows CI next ?

sstamenova · 2022-09-29T18:14:05Z

should we add a Windows CI next ?

I think that would be great to make sure we don't accidentally regress after all the work that was put into getting windows to work.

@sstamenova

* test: allow spaces in path to Python executable On Windows, the path to the Python binary may contain spaces, so this patch adds quotes around the path to the python executable. Thanks to @sstamenova for suggesting the fix! * python: remove header file that causes Windows build failures Similar to https://reviews.llvm.org/D125284, we can safely remove this header file without affecting the build on either Linux. It is necessary to remove this header file on Windows builds since otherwise it causes build errors. * python: drop `TORCH_API` from function defined in Torch-MLIR `TORCH_API` should apply to functions that are either exported by libtorch.so or ones that are imported from libtorch.so by its downstream consumers (like Torch-MLIR). Neither case applies to the `importJitFunctionAsFuncOp()` function, since it is defined in Torch-MLIR (and thus outside libtorch.so). This patch fixes the problem by dropping `TORCH_API` from that function's declaration. * python: make output of class anotations deterministic The `class-annotator-repr.py` test checks for class annotations in a specific order, but prior to this patch, the order was non-deterministic, since the code iterated on an _unordered_ map. This patch makes the iteration order deterministic through two changes: 1. using a sorted map 2. using the class qualified name instead of the address of the class in memory * test: use Python3_EXECUTABLE as interpreter path for consistency This ensures that tests use the Python3 version that was detected using CMake, instead of whichever python version that happens to be in the PATH variable when invoking the test. * test: fix RUN string The parenthesis syntax does not run on Windows (the shell interprets the `(` character as part of the path). Moreover, the ODR violation in the comment no longer seems to apply. * python: port parallel test framework to Windows Since Windows does not support `fork` natively, Python's `multiprocessing` module needs to use `spawn` on Windows. However, to use `spawn`, the multiprocessing module serializes (or pickles) the worker function and its arguments. Sadly, the multiprocessing module (both the default one in Python and the one that is extended in PyTorch) is unable to serialize lambda functions (see https://stackoverflow.com/a/19985580) for detals. Unfortunately, given how our tests are structured, we require that the function under test is passed as an argument to another function, so we cannot sidestep our use of lambda functions. To resolve this problem, this patch makes use of the `multiprocess` and `dill` Python modules, which together offers a multiprocessing mechanism that can serialize lambda functions. The multiprocess module also offers a process pool, which simplifies the code for our parallel testing framework.

Importing onnx graph fails if an output is also used by another node. This happens because the output ValueInfo will be registered, and then it will throw an error that it already exists when importing internal ValueInfos. Solution is to import the internal ValueInfos before importing the output ValueInfos. Resolves llvm#1376 Signed-off-by: Michael Holman <michhol@microsoft.com>

sstamenova reviewed Sep 16, 2022

View reviewed changes

test/Conversion/TorchToMhlo/basic.mlir Outdated Show resolved Hide resolved

sstamenova reviewed Sep 16, 2022

View reviewed changes

test/python/importer/jit_ir/ivalue_import/submodules.py Outdated Show resolved Hide resolved

ashay changed the title ~~[do not review] Miscellaneous fixes for Windows builds~~ Miscellaneous fixes for Windows builds Sep 21, 2022

ashay marked this pull request as ready for review September 21, 2022 21:26

powderluv approved these changes Sep 21, 2022

View reviewed changes

ashay requested a review from silvasean September 22, 2022 14:30

ashay mentioned this pull request Sep 22, 2022

Windows python interop.h #1397

Closed

ashay added 7 commits September 29, 2022 14:13

test: allow spaces in path to Python executable

7d81403

On Windows, the path to the Python binary may contain spaces, so this patch adds quotes around the path to the python executable. Thanks to @sstamenova for suggesting the fix!

test: use Python3_EXECUTABLE as interpreter path for consistency

5752a17

This ensures that tests use the Python3 version that was detected using CMake, instead of whichever python version that happens to be in the PATH variable when invoking the test.

test: fix RUN string

344d779

The parenthesis syntax does not run on Windows (the shell interprets the `(` character as part of the path). Moreover, the ODR violation in the comment no longer seems to apply.

ashay merged commit 0b46462 into llvm:main Sep 29, 2022

ashay deleted the ashay/lit-file-extension branch September 29, 2022 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous fixes for Windows builds #1376

Miscellaneous fixes for Windows builds #1376

ashay commented Sep 16, 2022 •

edited

Loading

powderluv commented Sep 16, 2022

powderluv left a comment

ashay commented Sep 21, 2022

powderluv commented Sep 21, 2022

ashay commented Sep 22, 2022

ashay commented Sep 22, 2022 •

edited

Loading

powderluv commented Sep 29, 2022

sstamenova commented Sep 29, 2022

Miscellaneous fixes for Windows builds #1376

Miscellaneous fixes for Windows builds #1376

Conversation

ashay commented Sep 16, 2022 • edited Loading

powderluv commented Sep 16, 2022

powderluv left a comment

Choose a reason for hiding this comment

ashay commented Sep 21, 2022

powderluv commented Sep 21, 2022

ashay commented Sep 22, 2022

ashay commented Sep 22, 2022 • edited Loading

powderluv commented Sep 29, 2022

sstamenova commented Sep 29, 2022

ashay commented Sep 16, 2022 •

edited

Loading

ashay commented Sep 22, 2022 •

edited

Loading