get_config_directory returns read-only paths when using NNI from Singularity container #3924

Markus92 · 2021-07-09T10:57:48Z

Describe the issue:
When running NNI from inside a container made with Singularity, NNI tries to write config files in /usr/local. As opposed to Docker, processes inside these containers do not run as root and the root filesystem is read-only. This means /usr/local is read-only and non-writable. If conda is used to install packages to this container when it was built, /usr/local/conda-meta exists and unwritable, regardless of how NNI is installed. Note that Singularity is a popular container tool in the academic/HPC environment.

The error occurs here:

nni/nni/runtime/config.py

Line 16 in 3943239

if sys.prefix != sys.base_prefix or Path(sys.prefix, 'conda-meta').is_dir():

As in my containers, sys.prefix = sys.base_prefix = /usr/local, and /usr/local/conda-meta exists
nni/runtime/config.py tries to find a directory to write configuration files but does not check if it can actually write in those directories. It is also impossible to override this behavior without editing NNI source code from a user perspective.

Solution would be to check if the directory is actually writable, or allow user to set an environment variable like NNI_CONFIG_DIRECTORY=/some/writable/directory to override this behavior.

Environment:

NNI version: 2.3
Training service (local|remote|pai|aml|etc): local
Client OS: Ubuntu 20.04
Server OS (for remote mode only):
Python version: 3.8
PyTorch/TensorFlow version: N/A
Is conda/virtualenv/venv used?: conda
Is running in Docker?: Singularity is used.
nnictl stdout and stderr:

Singularity> nnictl create --port 8081 --config config.yml INFO: expand codeDir: . to [privatedir]/. Traceback (most recent call last): File "/usr/local/bin/nnictl", line 8, in <module> sys.exit(parse_args()) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/nnictl.py", line 278, in parse_args args.func(args) File "/usr/local/lib/python3.8/site-packages/nni/tools/nnictl/launcher.py", line 515, in create_experiment config_v2 = convert.to_v2(config_yml).json() File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/convert.py", line 20, in to_v2 v2 = ExperimentConfig(platform) File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/common.py", line 85, in __init__ kwargs['trainingservice'] = util.training_service_config_factory( File "/usr/local/lib/python3.8/site-packages/nni/experiment/config/util.py", line 43, in training_service_config_factory custom_ts_config_path = nni.runtime.config.get_config_file('training_services.json') File "/usr/local/lib/python3.8/site-packages/nni/runtime/config.py", line 33, in get_config_file shutil.copyfile(default, config_file) File "/usr/local/lib/python3.8/shutil.py", line 261, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: OSError: [Errno 30] Read-only file system: '/usr/local/nni/training_services.json'

How to reproduce it?:
See above. Run NNI, installed in a conda installed as root, on a filesystem as an unpriviledged user.

The text was updated successfully, but these errors were encountered:

liuzhe-lz · 2021-07-13T03:56:55Z

We use conda-meta to detect whether the directory is a conda environment. If it is, it should always be writable.
So the problem here is, your conda is "merged" into /usr/local, instead of being installed to somewhere like /usr/local/anaconda. The template files of conda-meta is placed alongside python runtime, and to NNI it looks like a conda environment directory.
I have to say it's a really strange setup...
We will try some other ways to detect conda environment but cannot guarantee whether there is ever a better one. Because conda does not officially provide APIs to do that.

Overriding config path with environment variable is a nice suggestion. We will implement it sooner or later.

Markus92 · 2021-07-13T14:18:03Z

There's no guarantees by conda that the directory is writable! There's many scenarios in which it's not (this is just one of them). For example, if you install conda as root and install packages system-wide, then drop down to an unprivileged user to run them, you get this behavior. This can happen a lot in production environments where packages should be read-only. Conda is used then not to provide virtual environments but as an alternative for pip as a package manager.

What is the reason for detecting a conda environment in the first place? Conda does not provide APIs to do that, as package behavior should not change depending on it.

I'll send a pull request with desired behavior.

liuzhe-lz · 2021-07-14T03:07:18Z

Because there might be multiple NNI instances in different environments, and they must not share config file. The config file is about installed packages, and each environment must install its own packages separately.
The root problem is, Python provides no standard way to use config files, and with conda it breaks operating system's FHS. On the other hand wheel does not support post-install hook either. So there must be trade-off.
Since conda's tutorial has covered how to manage multiple environments but never mentions sudo, we think the former use case has higher priority.

QuanluZhang assigned SparkSnail and liuzhe-lz Jul 12, 2021

Markus92 mentioned this issue Jul 13, 2021

Allow for environment variable to set NNI configuration directory #3936

Merged

liuzhe-lz closed this as completed Jul 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_config_directory returns read-only paths when using NNI from Singularity container #3924

get_config_directory returns read-only paths when using NNI from Singularity container #3924

Markus92 commented Jul 9, 2021 •

edited

Loading

liuzhe-lz commented Jul 13, 2021 •

edited

Loading

Markus92 commented Jul 13, 2021

liuzhe-lz commented Jul 14, 2021 •

edited

Loading

get_config_directory returns read-only paths when using NNI from Singularity container #3924

get_config_directory returns read-only paths when using NNI from Singularity container #3924

Comments

Markus92 commented Jul 9, 2021 • edited Loading

liuzhe-lz commented Jul 13, 2021 • edited Loading

Markus92 commented Jul 13, 2021

liuzhe-lz commented Jul 14, 2021 • edited Loading

Markus92 commented Jul 9, 2021 •

edited

Loading

liuzhe-lz commented Jul 13, 2021 •

edited

Loading

liuzhe-lz commented Jul 14, 2021 •

edited

Loading