Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for setting Python environment #778

Merged
merged 3 commits into from
Jan 17, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 21 additions & 13 deletions docs/create-pyenv.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ This page goes through the Python environment setup steps in more detail and wit
module load wcEcoli/sherlock2
```

2. Optional: Download and install other packages according to their instructions or take a wait-and-see approach with them.
1. Optional: Download and install other packages according to their instructions or take a wait-and-see approach with them.

* CPLEX from IBM (free for students)

Expand All @@ -77,7 +77,7 @@ This page goes through the Python environment setup steps in more detail and wit
WARNING: The Python readline extension was not compiled. Missing the GNU readline lib?
WARNING: The Python sqlite3 extension was not compiled. Missing the SQLite3 lib?

2. Install the required version of Python via `pyenv`, and _remember to enable it as a shared library_ so Theano can call into it:
1. Install the required version of Python via `pyenv`, and _remember to enable it as a shared library_ so Theano can call into it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "shared library" part also needed on macOS? I forgot! I suspect that I installed 2.7.16 locally without it.


```bash
PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.16
Expand All @@ -102,13 +102,13 @@ This page goes through the Python environment setup steps in more detail and wit
pyenv local wcEcoli2
```

2. Upgrade this virtual environment's installers.
1. Upgrade this virtual environment's installers.

```bash
pip install --upgrade pip setuptools virtualenv virtualenvwrapper virtualenv-clone wheel
```

3. Install OpenBLAS 0.3.5 or later.
1. Install OpenBLAS 0.3.5 or later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that openblas 0.3.7 (as installed by brew install openblas) works fine on macOS but not inside Docker on macOS unless you compile it with NO_AVX2=1. Maybe this is too complicated to even get into. OpenMathLib/OpenBLAS#2244


**Background:** Older versions of OpenBLAS have threading bugs that cause unreliable results and computations that might hang. The alternative implementations of BLAS (Basic Linear Algebra Subprograms) -- Apple's "Accelerate" framework and Intel's Math Kernel Library -- also have threading bugs as of December, 2018.

Expand All @@ -134,8 +134,11 @@ This page goes through the Python environment setup steps in more detail and wit
**Note:** If you get an `instruction not found` error while installing OpenBLAS, that probably means
your old assembler is incompatible with the Fortran compiler. Figure out how to update the assembler
or else install OpenBLAS 0.3.4 and suffer its threading bugs and inconsistent results.

**Note:** OpenBLAS 0.3.7 (as installed by brew install openblas) works fine on macOS, but not inside
Docker on macOS unless you compile it with option `NO_AVX2=1`.

4. Create `~/.numpy-site.cfg` pointing to _your OpenBLAS installation directory._
1. Create `~/.numpy-site.cfg` pointing to _your OpenBLAS installation directory._

(If you want, you can download [site.cfg.example](https://github.com/numpy/numpy/blob/master/site.cfg.example) to your local file `~/.numpy-site.cfg` to start from their example configuration choices and documentation.)

Expand All @@ -148,22 +151,22 @@ This page goes through the Python environment setup steps in more detail and wit
include_dirs = /usr/local/opt/openblas/include
```

5. Install NumPy linked to this OpenBLAS thanks to `~/.numpy-site.cfg`
1. Install NumPy linked to this OpenBLAS thanks to `~/.numpy-site.cfg`
(It won't work to install numpy and scipy at the same time into Python 2.7.):

```bash
cd wcEcoli
pip install numpy==1.14.6 --no-binary numpy --force-reinstall
```

6. Install the packages listed in `requirements.txt` (SciPy will also use `~/.numpy-site.cfg`):
1. Install the packages listed in `requirements.txt` (SciPy will also use `~/.numpy-site.cfg`):

```bash
pip install -r requirements.txt --no-binary numpy,scipy
pyenv rehash
```

6. Test the NumPy and SciPy installation
1. Test the NumPy and SciPy installation

```bash
python runscripts/debug/summarize_environment.py
Expand All @@ -178,7 +181,12 @@ This page goes through the Python environment setup steps in more detail and wit
language = c
```

8. Test Theano:
1. (Optional) Add the following line to your bash profile. This has been shown to improve performance significantly on linux machines.
```
export OPENBLAS_NUM_THREADS=1
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good thing. (And switching to plain 1. everywhere is also good.) But do we always do this? I do not know. Or always configure it this way and sometimes override it with a different local value?

FYI, Intel's alternative, Math Kernel Library (MKL) implementation has a Thread Building Blocks (TBB) feature to avoid oversubscription of threads while not limiting itself to one thread per process. The last time I tried it, it was slower than Openblas. It turns out there's a new release for 2020 and there were several releases in 2019, so maybe we should retest it. See #36

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using the export for awhile. Maybe it's linux specific but you get much better performance. Maybe we could say it's optional or add some info about why you want to add it.

Testing MKL again could be worthwhile but do you think they're optimized for intel processor instruction sets/architecture and not so much for AMD/others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuk. Very good point that we'd have to test on at least one Intel CPU and at least one AMD CPU, maybe more, and the Docker-on-Mac-on-Intel case. At least it only takes replacing some pips -- on each of those system installations. I'm not excited to test it again.


1. Test Theano:

```bash
python
Expand All @@ -194,7 +202,7 @@ This page goes through the Python environment setup steps in more detail and wit

naming the library_dirs that you set above.

9. Configure matplotlib.
1. Configure matplotlib.

Note: The wcEcoli software expects to run with `wcEcoli/` as both the current working directory and on the `$PYTHONPATH` to
make the code and `matplotlibrc` available.
Expand All @@ -216,15 +224,15 @@ This page goes through the Python environment setup steps in more detail and wit
Comment out the line `backend : TkAgg` or `backend : macosx`, then retest.
* If it didn't raise an error, run `matplotlib.get_backend()` and check that it returns `'agg'` or similar.

10. Compile the project's native code.
1. Compile the project's native code.

```bash
make clean compile
```

(Yes, that prints deprecation warnings.)

11. Run all the unit tests.
1. Run all the unit tests.

```bash
pytest
Expand All @@ -243,7 +251,7 @@ This page goes through the Python environment setup steps in more detail and wit
ln -s $SCRATCH/wcEcoli_out out
```

2. Create a symbolic link to a shared sim data cache directory on `$PI_SCRATCH` that should contain a copy of the newest sim data object (it should be updated by the daily build):
1. Create a symbolic link to a shared sim data cache directory on `$PI_SCRATCH` that should contain a copy of the newest sim data object (it should be updated by the daily build):

```bash
ln -s $PI_SCRATCH/wc_ecoli/cached cached
Expand Down