Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile Error when CTest pytorch-operator #1065

Closed
wangshankun opened this issue May 20, 2019 · 9 comments
Closed

Compile Error when CTest pytorch-operator #1065

wangshankun opened this issue May 20, 2019 · 9 comments
Assignees

Comments

@wangshankun
Copy link

Test command: /home/shankun.shankunwan/onnxruntime/build/Linux/Debug/onnx_test_runner "/home/shankun.shankunwan/onnxruntime/cmake/external/onnx/onnx/backend/test/data/pytorch-operator"
3: Test timeout computed to be: 10000000
3: 2019-05-20 20:44:33.944246227 [E:onnxruntime:Default, runner.cc:132 ParallelRunTests] Running tests in parallel: at most 16 models at any time
3: 2019-05-20 20:44:33.956119134 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_mm failed:Could not find an implementation for the node Gemm(6)
3: 2019-05-20 20:44:33.956155422 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_params failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.956631467 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_addconstant failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.958266286 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_basic failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.959317542 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_add_size1_right_broadcast failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.959487318 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_add_size1_singleton_broadcast failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.961650814 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_add_size1_broadcast failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.963035157 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_pow failed:Could not find an implementation for the node Pow(1)
3: 2019-05-20 20:44:33.965077207 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_add_broadcast failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:33.966434907 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_addmm failed:Could not find an implementation for the node Gemm(6)
3: 2019-05-20 20:44:33.968296956 [E:onnxruntime:Default, runner.cc:485 RunSingleTestCase] Test operator_non_float_params failed:Could not find an implementation for the node Add(6)
3: 2019-05-20 20:44:34.017473812 [E:onnxruntime:Default, runner.cc:151 ParallelRunTests] Running tests finished. Generating report
3: result:
3: Models: 35
3: Total test cases: 35
3: Succeeded: 24
3: Not implemented: 11
3: Failed: 0
3: Stats by Operator type:
3: Not implemented(0):
3: Failed:
3: Failed Test Cases:
3/3 Test #3: onnx_test_pytorch_operator ....... Passed 0.15 sec

67% tests passed, 1 tests failed out of 3

Total Test time (real) = 12.99 sec

The following tests FAILED:
1 - onnxruntime_test_all (ILLEGAL)
Errors while running CTest

@snnn
Copy link
Member

snnn commented May 21, 2019

The error is in onnxruntime_test_all

The following tests FAILED:
1 - onnxruntime_test_all (ILLEGAL)

Could you please scroll up a little ?

@wangshankun
Copy link
Author

The error is in onnxruntime_test_all

The following tests FAILED:
1 - onnxruntime_test_all (ILLEGAL)

Could you please scroll up a little ?

67% tests passed, 1 tests failed out of 3

Total Test time (real) = 11.45 sec

The following tests FAILED:
1 - onnxruntime_test_all (SEGFAULT)
Errors while running CTest
Traceback (most recent call last):
File "/home/shankun.shankunwan/onnxruntime/tools/ci_build/build.py", line 840, in
sys.exit(main())
File "/home/shankun.shankunwan/onnxruntime/tools/ci_build/build.py", line 798, in main
args.use_tvm, args.use_tensorrt, args.use_ngraph)
File "/home/shankun.shankunwan/onnxruntime/tools/ci_build/build.py", line 514, in run_onnxruntime_tests
cwd=cwd, dll_path=dll_path)
File "/home/shankun.shankunwan/onnxruntime/tools/ci_build/build.py", line 178, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/home/shankun.shankunwan/anaconda/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/ctest', '--build-config', 'RelWithDebInfo', '--verbose']' returned non-zero exit status 8.

@faxu faxu added the bug label May 24, 2019
@pranavsharma
Copy link
Contributor

pranavsharma commented May 31, 2019

Our CI builds that run these tests run just fine. You're probably running on a different OS/arch. Please see https://github.com/microsoft/onnxruntime/blob/master/BUILD.md.

@snnn
Copy link
Member

snnn commented May 31, 2019

Could you please run 'onnxruntime_test_all' with valgrind?

valgrind ./onnxruntime_test_all

@wangshankun
Copy link
Author

Could you please run 'onnxruntime_test_all' with valgrind?

valgrind ./onnxruntime_test_all

sorry,i remove onnxruntime , replace with onnx-tensorrt.

Test Env:
Xeon(R) Silver 4110 + Ubuntu16.04 + cuda10 + cudnn7.5 + tensorrt5.15

@snnn
Copy link
Member

snnn commented Jun 3, 2019

How did you build it? Could you uninstall tensorrt before running build.sh ?

@wangshankun
Copy link
Author

How did you build it? Could you uninstall tensorrt before running build.sh ?

Compile cmd:
./build.sh --cudnn_home /home/shankun.shankunwan/trt/cudnn-10-v7.5/lib64 --cuda_home /usr/local/cuda-10.0 --use_tensorrt --tensorrt_home /home/shankun.shankunwan/trt/TensorRT-5.1.5.0

OR:
./build.sh --config RelWithDebInfo --build_wheel --cuda_home /usr/local/cuda-10.0 --use_tensorrt --tensorrt_home /home/shankun.shankunwan/trt/TensorRT-5.1.5.0 --cudnn_home /usr/local/cuda-10.0/lib64

trt is necessary for me ······

@stevenlix
Copy link
Contributor

Could you add --use_full_protobuf in compile command when you build TRT? If some tests still failed, please attach the whole compile log so that I can see which test failed.

@stevenlix
Copy link
Contributor

Is the issue still there? I am going to close this. Please feel free to reopen it if necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants