Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement decomposition from onnx.Sum to sequence of onnx.Add #2964

Merged
merged 5 commits into from
Oct 3, 2024

Conversation

srcarroll
Copy link
Contributor

No description provided.

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@srcarroll
Copy link
Contributor Author

I can't find any current lowering path for onnx.Sum. I figured it would be best to add decomposition pattern here rather than do a direct conversion to stablehlo. I'm open to suggestions if this is not a desirable change.

Signed-off-by: Sam <srcarroll314@gmail.com>
@srcarroll srcarroll force-pushed the decompose-onnx-sum-op branch from b9e145c to 89dc581 Compare October 2, 2024 20:25
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@srcarroll
Copy link
Contributor Author

just realized i messsed up the one input case. fixing now

Signed-off-by: Sam <srcarroll314@gmail.com>
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@srcarroll
Copy link
Contributor Author

not sure what's up with the jenkins failures though

@AlexandreEichenberger
Copy link
Collaborator

Starting a new batch of CI, let me look at the PR soon.

@srcarroll
Copy link
Contributor Author

Starting a new batch of CI, let me look at the PR soon.

Thanks. I still need to figure out the test failure too. I thought fixing the one input case woud have resolved it but it didn't. I'm unfamiliar with the test that's failing so will have to dig.

@AlexandreEichenberger
Copy link
Collaborator

@srcarroll here is the log of the error for Linux amd64

/usr/lib/python3.10/subprocess.py:526: CalledProcessError
----------------------------- Captured stdout call -----------------------------
[1/6] Thu Oct  3 09:08:26 2024 (0s) Importing ONNX Model to MLIR Module from "test_sum_one_input.onnx"
[2/6] Thu Oct  3 09:08:26 2024 (0s) Compiling and Optimizing MLIR Module
----------------------------- Captured stderr call -----------------------------
error: failed to legalize unresolved materialization from ('tensor<?xf32>') to 'tensor<*xf32>' that remained live after conversion
=========================== short test summary info ============================
 Debug/test.py::OnnxBackendNodeModelTest::test_sum_one_input_cpu - subprocess...
1 failed, 521 passed, 2344 skipped in 84.49s (0:01:24)
make[3]: *** [test/backend/CMakeFiles/check-onnx-backend-dynamic.dir/build.make:71: test/backend/CMakeFiles/check-onnx-backend-dynamic] Error 1
make[2]: *** [CMakeFiles/Makefile2:14510: test/backend/CMakeFiles/check-onnx-backend-dynamic.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
bringing up nodes...
bringing up nodes...

It feels like something is missing for the shape inference. If you convert the sum into a series of add before shape infrence, then shape inference should be able to operate normally. Maybe you need to put it earlier than currently placed.

You can run the test directly in your env

TEST_CASE_BY_USER=test_sum_one_input_cpu make -j 12 check-onnx-backend-dynamic

You can add compiler options (for debugging) by defining the ONNX_MLIR_FLAGS env var

Hope this helps.

@AlexandreEichenberger
Copy link
Collaborator

AlexandreEichenberger commented Oct 3, 2024

Note on the check-onnx-backend-dynamic: we force all the input sizes to be made dynamic (-1) at compilation time and then run the model with the original sizes set at run time. That ensure us that all of the ONNX tests run both with static (as they are in ONNX repo) and dynamic (as we force them to be here).

Signed-off-by: Sam <srcarroll314@gmail.com>
@srcarroll
Copy link
Contributor Author

srcarroll commented Oct 3, 2024

@AlexandreEichenberger thanks for the pointers. I'm actually having a hard time setting up my environment to run python tests and haven't had time to look into fixing it. so i'm just guessing based on the error what the problem is. i think this error would happen any time the result type of onnx.Sum is an unranked tensor, but the inputs are ranked. hence why it complains about unrealized cast from tensor<?xf32> -> tensor<*xf32>. so one solution i can think of is to just cast the result of the final onnx.Add to the original onnx.Sum result type when they don't match. i'll push soon

i was able to reproduce the error on an example IR and have added that case to lit test to test the fix

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@AlexandreEichenberger
Copy link
Collaborator

@jenkins-droid test this please

@AlexandreEichenberger
Copy link
Collaborator

Yes, setting up is a bit tricky, to get ONNX installed (I do a pip install -e third_party/onnx, protobuf sometimes cause problems). That is why you can also develop on docker images that we publish, where all should be working smoothly.

@AlexandreEichenberger
Copy link
Collaborator

@jenkins-droid test this please

@AlexandreEichenberger
Copy link
Collaborator

(I think our Jenkins is having problems, folks are working on this)

@AlexandreEichenberger
Copy link
Collaborator

@Sunny-Anand mind doing a "test this please" on this PR once the Jenkins issue is fixed, thanks

@Sunny-Anand
Copy link
Collaborator

All checks have passed.

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the pattern.

Let me know if you want me to merge it to main @srcarroll

@srcarroll
Copy link
Contributor Author

@AlexandreEichenberger thanks! yes please do merge.

@AlexandreEichenberger AlexandreEichenberger merged commit 265ee60 into onnx:main Oct 3, 2024
7 checks passed
@AlexandreEichenberger
Copy link
Collaborator

@srcarroll thanks for your contributions, much appreciated

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15775 [push] Implement decomposition ... started at 18:50

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15778 [push] Implement decomposition ... started at 19:50

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14805 [push] Implement decomposition ... started at 20:02

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15775 [push] Implement decomposition ... passed after 1 hr 18 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15778 [push] Implement decomposition ... passed after 1 hr 42 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14805 [push] Implement decomposition ... passed after 2 hr 17 min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants