You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
Describe the issue:
When running NNI from inside a container made with Singularity, NNI tries to write config files in /usr/local. As opposed to Docker, processes inside these containers do not run as root and the root filesystem is read-only. This means /usr/local is read-only and non-writable. If conda is used to install packages to this container when it was built, /usr/local/conda-meta exists and unwritable, regardless of how NNI is installed. Note that Singularity is a popular container tool in the academic/HPC environment.
As in my containers, sys.prefix = sys.base_prefix = /usr/local, and /usr/local/conda-meta exists
nni/runtime/config.py tries to find a directory to write configuration files but does not check if it can actually write in those directories. It is also impossible to override this behavior without editing NNI source code from a user perspective.
Solution would be to check if the directory is actually writable, or allow user to set an environment variable like NNI_CONFIG_DIRECTORY=/some/writable/directory to override this behavior.
Environment:
NNI version: 2.3
Training service (local|remote|pai|aml|etc): local
Client OS: Ubuntu 20.04
Server OS (for remote mode only):
Python version: 3.8
PyTorch/TensorFlow version: N/A
Is conda/virtualenv/venv used?: conda
Is running in Docker?: Singularity is used.
nnictl stdout and stderr:
Singularity> nnictl create --port 8081 --config config.yml INFO: expand codeDir: . to [privatedir]/. Traceback (most recent call last): File "/usr/local/bin/nnictl", line 8, in <module> sys.exit(parse_args()) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/nnictl.py", line 278, in parse_args args.func(args) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 515, in create_experiment config_v2 = convert.to_v2(config_yml).json() File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/convert.py", line 20, in to_v2 v2 = ExperimentConfig(platform) File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/common.py", line 85, in __init__ kwargs['trainingservice'] = util.training_service_config_factory( File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/util.py", line 43, in training_service_config_factory custom_ts_config_path = nni.runtime.config.get_config_file('training_services.json') File "/usr/local/lib/python3.8/site-packages/nni/runtime/config.py", line 33, in get_config_file shutil.copyfile(default, config_file) File "/usr/local/lib/python3.8/shutil.py", line 261, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: OSError: [Errno 30] Read-only file system: '/usr/local/nni/training_services.json'
How to reproduce it?:
See above. Run NNI, installed in a conda installed as root, on a filesystem as an unpriviledged user.
The text was updated successfully, but these errors were encountered:
We use conda-meta to detect whether the directory is a conda environment. If it is, it should always be writable.
So the problem here is, your conda is "merged" into /usr/local, instead of being installed to somewhere like /usr/local/anaconda. The template files of conda-meta is placed alongside python runtime, and to NNI it looks like a conda environment directory.
I have to say it's a really strange setup...
We will try some other ways to detect conda environment but cannot guarantee whether there is ever a better one. Because conda does not officially provide APIs to do that.
Overriding config path with environment variable is a nice suggestion. We will implement it sooner or later.
There's no guarantees by conda that the directory is writable! There's many scenarios in which it's not (this is just one of them). For example, if you install conda as root and install packages system-wide, then drop down to an unprivileged user to run them, you get this behavior. This can happen a lot in production environments where packages should be read-only. Conda is used then not to provide virtual environments but as an alternative for pip as a package manager.
What is the reason for detecting a conda environment in the first place? Conda does not provide APIs to do that, as package behavior should not change depending on it.
Because there might be multiple NNI instances in different environments, and they must not share config file. The config file is about installed packages, and each environment must install its own packages separately.
The root problem is, Python provides no standard way to use config files, and with conda it breaks operating system's FHS. On the other hand wheel does not support post-install hook either. So there must be trade-off.
Since conda's tutorial has covered how to manage multiple environments but never mentions sudo, we think the former use case has higher priority.
Describe the issue:
When running NNI from inside a container made with Singularity, NNI tries to write config files in /usr/local. As opposed to Docker, processes inside these containers do not run as root and the root filesystem is read-only. This means /usr/local is read-only and non-writable. If conda is used to install packages to this container when it was built, /usr/local/conda-meta exists and unwritable, regardless of how NNI is installed. Note that Singularity is a popular container tool in the academic/HPC environment.
The error occurs here:
nni/nni/runtime/config.py
Line 16 in 3943239
As in my containers, sys.prefix = sys.base_prefix = /usr/local, and /usr/local/conda-meta exists
nni/runtime/config.py tries to find a directory to write configuration files but does not check if it can actually write in those directories. It is also impossible to override this behavior without editing NNI source code from a user perspective.
Solution would be to check if the directory is actually writable, or allow user to set an environment variable like NNI_CONFIG_DIRECTORY=/some/writable/directory to override this behavior.
Environment:
NNI version: 2.3
Training service (local|remote|pai|aml|etc): local
Client OS: Ubuntu 20.04
Server OS (for remote mode only):
Python version: 3.8
PyTorch/TensorFlow version: N/A
Is conda/virtualenv/venv used?: conda
Is running in Docker?: Singularity is used.
nnictl stdout and stderr:
Singularity> nnictl create --port 8081 --config config.yml INFO: expand codeDir: . to [privatedir]/. Traceback (most recent call last): File "/usr/local/bin/nnictl", line 8, in <module> sys.exit(parse_args()) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/nnictl.py", line 278, in parse_args args.func(args) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 515, in create_experiment config_v2 = convert.to_v2(config_yml).json() File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/convert.py", line 20, in to_v2 v2 = ExperimentConfig(platform) File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/common.py", line 85, in __init__ kwargs['trainingservice'] = util.training_service_config_factory( File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/util.py", line 43, in training_service_config_factory custom_ts_config_path = nni.runtime.config.get_config_file('training_services.json') File "/usr/local/lib/python3.8/site-packages/nni/runtime/config.py", line 33, in get_config_file shutil.copyfile(default, config_file) File "/usr/local/lib/python3.8/shutil.py", line 261, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: OSError: [Errno 30] Read-only file system: '/usr/local/nni/training_services.json'
How to reproduce it?:
See above. Run NNI, installed in a conda installed as root, on a filesystem as an unpriviledged user.
The text was updated successfully, but these errors were encountered: