Skip to content

Commit

Permalink
test: update pytest framework (modflowpy#1493)
Browse files Browse the repository at this point in the history
* use pytest-benchmark's builtin profiling capability instead of manual implementation
* remove requires_exe(mf6) from test_mf6.py tests that don't run models/simulations
* add @requires_spatial_reference marker to conftest.py (for tests depending on spatialreference.org)
* try both importlib.import_module and pkg_resources.get_distribution in @requires_pkg marker
* mark test_lgr.py::test_simple_lgr_model_from_scratch as flaky (occasional forrtl error (65): floating invalid)
* split test_export.py::test_polygon_from_ij into network-bound and non-network-bound cases
* add comments to flaky tests with links to potentially similar issues
* add timeouts to CI jobs (10min for build, lint, & smoke, 45min for test, 90min for daily jobs)
* remove unneeded markers from pytest.ini
* match profiling/benchmarking test files in pytest.ini
* mark get-modflow tests as flaky (modflowpy#1489 (comment))
* cache benchmark results in daily CI and compare with prior runs
* various tidying/cleanup
  • Loading branch information
wpbonelli committed Aug 10, 2022
1 parent 9c42c37 commit 7d33c40
Show file tree
Hide file tree
Showing 17 changed files with 243 additions and 239 deletions.
11 changes: 6 additions & 5 deletions .github/workflows/commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
defaults:
run:
shell: bash
timeout-minutes: 10

steps:
- name: Checkout repo
Expand Down Expand Up @@ -50,13 +51,13 @@ jobs:
run: |
twine check --strict dist/*
lint:
name: Lint
runs-on: ubuntu-latest
defaults:
run:
shell: bash
timeout-minutes: 10

steps:
- name: Checkout repo
Expand Down Expand Up @@ -106,14 +107,13 @@ jobs:
run: |
pylint --jobs=2 --errors-only --exit-zero ./flopy
smoke:
name: Smoke
runs-on: ubuntu-latest
defaults:
run:
shell: bash
timeout-minutes: 10

steps:
- name: Checkout repo
Expand Down Expand Up @@ -185,7 +185,6 @@ jobs:
directory: ./autotest
file: coverage.xml


test:
name: Test
needs: smoke
Expand All @@ -204,6 +203,7 @@ jobs:
path: ~/.cache/pip
- os: macos-latest
path: ~/Library/Caches/pip
timeout-minutes: 45

steps:
- name: Checkout repo
Expand Down Expand Up @@ -290,6 +290,7 @@ jobs:
defaults:
run:
shell: pwsh
timeout-minutes: 45

steps:
- name: Checkout repo
Expand All @@ -302,7 +303,7 @@ jobs:
uses: actions/cache@v2.1.0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml', 'flopy') }}
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml') }}

# Standard python fails on windows without GDAL installation
# Using custom bash shell ("shell: bash -l {0}") with Miniconda
Expand Down
46 changes: 34 additions & 12 deletions .github/workflows/daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
defaults:
run:
shell: bash
timeout-minutes: 90

steps:
- name: Checkout repo
Expand Down Expand Up @@ -90,7 +91,6 @@ jobs:
file: coverage.xml

examples:

name: Example scripts & notebooks
runs-on: ${{ matrix.os }}
strategy:
Expand All @@ -110,6 +110,7 @@ jobs:
defaults:
run:
shell: bash
timeout-minutes: 90

steps:
- name: Checkout repo
Expand Down Expand Up @@ -194,6 +195,7 @@ jobs:
defaults:
run:
shell: bash
timeout-minutes: 90

steps:
- name: Checkout repo
Expand Down Expand Up @@ -230,14 +232,23 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Run tests
- name: Load cached benchmark results (for comparison)
uses: actions/cache@v2.1.0
with:
path: ./autotest/.benchmarks
key: benchmark-${{ matrix.os }}-${{ matrix.python-version }} }}

- name: Run benchmarks
working-directory: ./autotest
run: |
pytest -v --cov=flopy --cov-report=xml --durations=0 --benchmark-only --benchmark-autosave --keep-failed=.failed
pytest -v --durations=0 \
--cov=flopy --cov-report=xml \
--benchmark-only --benchmark-autosave --benchmark-compare --benchmark-compare-fail=mean:25% \
--keep-failed=.failed
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Upload failed test outputs
- name: Upload failed benchmark outputs
uses: actions/upload-artifact@v2
if: failure()
with:
Expand Down Expand Up @@ -279,6 +290,7 @@ jobs:
defaults:
run:
shell: pwsh
timeout-minutes: 90

steps:
- name: Checkout repo
Expand All @@ -291,7 +303,7 @@ jobs:
uses: actions/cache@v2.1.0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml', 'flopy') }}
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml') }}

# Standard python fails on windows without GDAL installation
# Using custom bash shell ("shell: bash -l {0}") with Miniconda
Expand Down Expand Up @@ -362,6 +374,7 @@ jobs:
defaults:
run:
shell: pwsh
timeout-minutes: 90

steps:
- name: Checkout repo
Expand All @@ -374,7 +387,7 @@ jobs:
uses: actions/cache@v2.1.0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml', 'flopy') }}
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml') }}

# Standard python fails on windows without GDAL installation
# Using custom bash shell ("shell: bash -l {0}") with Miniconda
Expand Down Expand Up @@ -410,7 +423,6 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


- name: Upload failed test outputs
uses: actions/upload-artifact@v2
if: failure()
Expand Down Expand Up @@ -446,6 +458,7 @@ jobs:
defaults:
run:
shell: pwsh
timeout-minutes: 90

steps:
- name: Checkout repo
Expand All @@ -458,7 +471,7 @@ jobs:
uses: actions/cache@v2.1.0
with:
path: ~/conda_pkgs_dir
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml', 'flopy') }}
key: ${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.run-type }}-${{ hashFiles('etc/environment.yml') }}

# Standard python fails on windows without GDAL installation
# Using custom bash shell ("shell: bash -l {0}") with Miniconda
Expand Down Expand Up @@ -487,14 +500,23 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Run tests
- name: Load cached benchmark results (for comparison)
uses: actions/cache@v2.1.0
with:
path: ./autotest/.benchmarks
key: benchmark-${{ runner.os }}-${{ matrix.python-version }} }}

- name: Run benchmarks
working-directory: ./autotest
run: |
pytest -v --cov=flopy --cov-report=xml --durations=0 --benchmark-only --benchmark-autosave --keep-failed=.failed
pytest -v --durations=0 \
--cov=flopy --cov-report=xml \
--benchmark-only --benchmark-autosave --benchmark-compare --benchmark-compare-fail=mean:25% \
--keep-failed=.failed
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Upload failed test outputs
- name: Upload failed benchmark outputs
uses: actions/upload-artifact@v2
if: failure()
with:
Expand All @@ -505,7 +527,7 @@ jobs:
- name: Upload benchmark results
uses: actions/upload-artifact@v2
with:
name: benchmark-${{ matrix.os }}-${{ matrix.python-version }}
name: benchmark-${{ runner.os }}-${{ matrix.python-version }}
path: |
./autotest/.benchmarks/**/*.json
Expand Down
35 changes: 23 additions & 12 deletions DEVELOPER.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,6 @@ Markers are a `pytest` feature that can be used to select subsets of tests. Mark
- `slow`: tests that don't complete in a few seconds
- `example`: exercise scripts, tutorials and notebooks
- `regression`: tests that compare multiple results
- `benchmark`: test that gather runtime statistics
- `profile`: tests measuring performance in detail

Markers can be used with the `-m <marker>` option. For example, to run only fast tests:

Expand Down Expand Up @@ -221,9 +219,20 @@ This will retain the test directories created by the test, which allows files to

There is also a `--keep-failed <dir>` option which preserves the outputs of failed tests in the given location, however this option is only compatible with function-scoped temporary directories (the `tmpdir` fixture defined in `conftest.py`).

### Benchmarking
### Performance testing

Benchmarking is accomplished with [`pytest-benchmark`](https://pytest-benchmark.readthedocs.io/en/latest/index.html). Any test function can be turned into a benchmark by requesting the `benchmark` fixture (i.e. declaring a `benchmark` argument), which can be used to wrap any function call. For instance:
Performance testing is accomplished with [`pytest-benchmark`](https://pytest-benchmark.readthedocs.io/en/latest/index.html).

To allow optional separation of performance from correctness concerns, performance test files may be named either as typical test files or may match any of the following patterns:

- `benchmark_*.py`
- `profile_*.py`
- `*_profile*.py`.
- `*_benchmark*.py`

#### Benchmarking

Any test function can be turned into a benchmark by requesting the `benchmark` fixture (i.e. declaring a `benchmark` argument), which can be used to wrap any function call. For instance:

```python
def test_benchmark(benchmark):
Expand Down Expand Up @@ -251,25 +260,27 @@ Rather than alter an existing function call to use this syntax, a lambda can be

```python
def test_benchmark(benchmark):
def sleep_1s():
def sleep_s(s):
import time
time.sleep(1)
time.sleep(s)
return True

assert benchmark(lambda: sleep_1s())
assert benchmark(lambda: sleep_s(1))
```

This can be convenient when the function call is complicated or passes many arguments.

To control the number of repetitions and rounds (repetitions of repetitions) use `benchmark.pedantic`, e.g. `benchmark.pedantic(some_function(), iterations=1, rounds=1)`.
Benchmarked functions are repeated several times (the number of iterations depending on the test's runtime, with faster tests generally getting more reps) to compute summary statistics. To control the number of repetitions and rounds (repetitions of repetitions) use `benchmark.pedantic`, e.g. `benchmark.pedantic(some_function(), iterations=1, rounds=1)`.

Benchmarking is incompatible with `pytest-xdist` and is disabled automatically when tests are run in parallel. When tests are not run in parallel, benchmarking is enabled by default. Benchmarks can be disabled with the `--benchmark-disable` flag.

Benchmarked functions are repeated several times (the number of iterations depending on the test's runtime, with faster tests generally getting more reps) to compute summary statistics. Benchmarking is incompatible with `pytest-xdist` and is disabled automatically when tests are run in parallel. When tests are not run in parallel, benchmarking is enabled by default. Benchmarks can be disabled with the `--benchmark-disable` flag.
Benchmark results are only printed to `stdout` by default. To save results to a JSON file, use `--benchmark-autosave`. This will create a `.benchmarks` folder in the current working location (if you're running tests, this should be `autotest/.benchmarks`).

Benchmark results are only printed to stdout by default. To save results to a JSON file, use `--benchmark-autosave`. This will create a `.benchmarks` folder in the current working location (if you're running tests, this should appear at `autotest/.benchmarks`).
#### Profiling

### Profiling
Profiling is [distinct](https://stackoverflow.com/a/39381805/6514033) from benchmarking in evaluating a program's call stack in detail, while benchmarking just invokes a function repeatedly and computes summary statistics. Profiling is also accomplished with `pytest-benchmark`: use the `--benchmark-cprofile` option when running tests which use the `benchmark` fixture described above. The option's value is the column to sort results by. For instance, to sort by total time, use `--benchmark-cprofile="tottime"`. See the `pytest-benchmark` [docs](https://pytest-benchmark.readthedocs.io/en/stable/usage.html#commandline-options) for more information.

Profiling is [distinct](https://stackoverflow.com/a/39381805/6514033) from benchmarking in considering program behavior in detail, while benchmarking just invokes functions repeatedly and computes summary statistics. Profiling test files may be named either as typical test files or matching `profile_*.py` or `*_profile*.py`. Functions marked with the `profile` marker are considered profiling tests and will not run unless `pytest` is invoked with the `--profile` (short `-P`) flag.
By default, `pytest-benchmark` will only print profiling results to `stdout`. If the `--benchmark-autosave` flag is provided, performance profile data will be included in the JSON files written to the `.benchmarks` save directory as described in the benchmarking section above.

### Writing tests

Expand Down
Loading

0 comments on commit 7d33c40

Please sign in to comment.