Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg: #171

Open
xmfbit opened this issue Jan 19, 2022 · 6 comments

Comments

@xmfbit
Copy link

xmfbit commented Jan 19, 2022

Describe the bug
不能稳定复现,应该是和当前docker实例具体环境有关系。

可以按照 microsoft/onnxruntime#8313 中提到的:

https://github.com/daquexian/onnx-simplifier/blob/master/onnxsim/onnx_simplifier.py#L188

sess_options.intra_op_num_threads = 1
sess_options.inter_op_num_threads = 1

进行修改。

Model
和模型无关,和环境相关,我的pip list如下:

absl-py                 1.0.0
addict                  2.4.0
appdirs                 1.4.4
asn1crypto              1.2.0
astunparse              1.6.3
backcall                0.2.0
cachetools              4.2.4
certifi                 2019.9.11
cffi                    1.13.0
chardet                 3.0.4
charset-normalizer      2.0.10
click                   8.0.3
conda                   4.7.12
conda-package-handling  1.6.0
contextlib2             21.6.0
cryptography            2.8
cycler                  0.11.0
Cython                  0.29.26
decorator               5.1.1
dnspython               2.1.0
filelock                3.4.2
flatbuffers             1.12
fonttools               4.28.5
fvcore                  0.1.5.post20210924
gast                    0.3.3
gevent                  21.8.0
google-auth             2.3.3
google-auth-oauthlib    0.4.6
google-pasta            0.2.0
graphviz                0.8.4
greenlet                1.1.2
grpcio                  1.43.0
gunicorn                20.1.0
h5py                    2.10.0
hiddenlayer             0.3
idna                    2.8
imageio                 2.13.5
importlib-metadata      4.10.0
iopath                  0.1.9
ipaddress               1.0.23
ipdb                    0.13.9
ipython                 7.31.0
jedi                    0.18.1
joblib                  1.1.0
Keras-Preprocessing     1.1.2
kiwisolver              1.3.2
Mako                    1.1.6
Markdown                3.3.6
MarkupSafe              2.0.1
matplotlib              3.5.1
matplotlib-inline       0.1.3
ml-collections          0.1.0
mmcv                    1.4.2
mxnet                   1.8.0
mypy-protobuf           3.0.0
networkx                2.6.3
numpy                   1.21.4
oauthlib                3.1.1
onnx                    1.8.0
onnx-simplifier         0.3.6
onnxoptimizer           0.2.6
onnxruntime             1.10.0
opencv-python           4.5.5.62
opt-einsum              3.3.0
packaging               20.9
parso                   0.8.3
pexpect                 4.8.0
pickleshare             0.7.5
Pillow                  8.0.1
pip                     21.3.1
ply                     3.11
portalocker             2.3.2
prompt-toolkit          3.0.24
protobuf                3.19.3
psutil                  5.9.0
ptyprocess              0.7.0
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pycosat                 0.6.3
pycparser               2.19
pycryptodome            3.9.8
pycuda                  2021.1
Pygments                2.11.2
PyJWT                   1.7.1
pyOpenSSL               19.0.0
pyparsing               3.0.6
PySocks                 1.7.1
pytest-runner           5.3.1
python-dateutil         2.8.2
python-etcd             0.4.5
pytools                 2021.2.9
PyWavelets              1.2.0
PyYAML                  5.4.1
redis                   3.5.3
regex                   2021.11.10
requests                2.27.1
requests-oauthlib       1.3.0
rsa                     4.8
ruamel_yaml             0.15.46
sacremoses              0.0.47
schedule                0.6.0
scikit-image            0.15.0
scipy                   1.7.3
sentencepiece           0.1.91
setuptools              60.5.0
simplejson              3.17.6
six                     1.16.0
tabulate                0.8.9
tensorboard             2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
tensorflow-estimator    2.3.0
tensorflow-gpu          2.3.1
tensorrt                7.2.1.6
termcolor               1.1.0
terminaltables          3.1.10
tf2onnx                 1.8.5
thriftpy2               0.4.14
timm                    0.4.12
tokenizers              0.9.3
toml                    0.10.2
torch                   1.7.1+cu110
torchvision             0.8.2
tqdm                    4.36.1
traitlets               5.1.1
transformers            3.5.1
types-futures           3.3.2
types-protobuf          3.19.0
typing_extensions       4.0.1
urllib3                 1.26.8
wcwidth                 0.2.5
Werkzeug                2.0.2
wheel                   0.37.1
wrapt                   1.13.3
yacs                    0.1.8
yapf                    0.29.0
zipp                    3.7.0
zope.event              4.5.0
zope.interface          5.4.0
@xmfbit
Copy link
Author

xmfbit commented Jan 19, 2022

考虑到forward只会调用一次,是否可以将上面比较hack的方法直接加入?这样可以避免像我一样因为环境配置产生问题

@xmfbit
Copy link
Author

xmfbit commented Feb 17, 2023

今天又被这个坑了一次。。。求作者更新,onnxruntime下面的讨论:
microsoft/onnxruntime#8313

@daquexian
Copy link
Owner

daquexian commented Feb 17, 2023

好的,感谢!不好意思之前错过了这个 issue

QAQ 考虑加入 ONNX QQ 群或者微信群吗,可以加我的好友(QQ 和微信 ID 都是 daquexian)

@Hellohyy
Copy link

今天又被这个坑了一次。。。求作者更新,onnxruntime下面的讨论: microsoft/onnxruntime#8313

你的容器环境是否是cpuset模式的呢?我看了一下源码,python3.6支持的版本中,从onnxruntime 1.7到1.10是能够稳定复现的,根因似乎是,做cpu亲和的代码在cpuset模式下并没有兼容,亲和逻辑是获取cpu核数,然后从0号cpu开始绑定亲和,但是容器cpuset的环境下,实际online的cpu号并非从0号开始,从而导致代码异常退出。

@xmfbit
Copy link
Author

xmfbit commented May 22, 2023

今天又被这个坑了一次。。。求作者更新,onnxruntime下面的讨论: microsoft/onnxruntime#8313

你的容器环境是否是cpuset模式的呢?我看了一下源码,python3.6支持的版本中,从onnxruntime 1.7到1.10是能够稳定复现的,根因似乎是,做cpu亲和的代码在cpuset模式下并没有兼容,亲和逻辑是获取cpu核数,然后从0号cpu开始绑定亲和,但是容器cpuset的环境下,实际online的cpu号并非从0号开始,从而导致代码异常退出。

感谢你的分析!等我看下后同步。

@daquexian
Copy link
Owner

我会看一下这个问题,在 runtime error 时用上面的 fallback 方案

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants