Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout functional tests #449

Closed
wants to merge 4 commits into from

Conversation

kgaughan
Copy link
Member

If the functional tests don't exist after five seconds of waiting on them, they should be killed. This also uses pytest-timeout to impose a maximum runtime of 30secs on any test in the test suite.

Hopefully this should prevent tests from hanging.

If the functional tests don't exist after five seconds of waiting on
them, they should be killed. This also uses pytest-timeout to impose a
maximum runtime of 30secs on any test in the test suite.

Hopefully this should prevent tests from hanging.
@kgaughan
Copy link
Member Author

Closer. The functional tests look to be completing, but pytest (or tox) looks to get stuck, so the final step in the workflow doesn't exit. My best guess is that there's a wait or join being executed in the atexit handler that's preventing the process from exiting.

@kgaughan
Copy link
Member Author

I just reran the failed workflows, and they succeeded. I guess it's at least better that the tests themselves aren't hanging and, but I'm not at all sure why the process sometimes doesn't exit like it should

@digitalresistor
Copy link
Member

Not a fan of this change. I feel like it papers over waitress not existing cleanly. I want tests to hang.

The GitHub actions runners have been slow and having issues for the last couple of days, locally these tests always succeed correctly.

@kgaughan
Copy link
Member Author

In that case, I can drop the changes to tests/test_functional.py and keep the timeout on the test: it should fail on the offending test while it's waiting on the join() and wait() calls to return. Sound good?

@digitalresistor
Copy link
Member

What do we gain? I am not sure I fully understand what the point of it is, today with the current test suite. The additional dependency/package is an additional one to download/install, across all CI runs. While it may be small it adds up, and I try to be cognizant of the overhead of adding yet another thing that can cause issues because it is not maintained/not updated as quickly as other parts of the ecosystem.

If the functional tests hanging was a bigger issue and one that could be triggered regularly I wouldn't be opposed to adding extra security against it.

@kgaughan
Copy link
Member Author

What I was originally trying to do was to see if it was reveal which of the functional tests was the most problematic when it came to hanging in the hopes of making the test suite more reliable. They seemed to most consistently fail on EchoTests and TooLargeTests with the timeout in place.

If you don't see it as worthwhile, I can close the PR though.

@digitalresistor
Copy link
Member

Where do you see them hanging? I am trying to understand better because I too want the test suite to not fail/and understand the issues and fix them.

While working on some stuff lately to fix some of the security issues I have re-ran the test suite 10's of times, I haven't seen it hang at all, and my test suite runs have been ~14 seconds (with coverage enabled, couple seconds shorter without coverage).

@digitalresistor
Copy link
Member

I will note that there seems to be an issue with the Github actions, the test suite there historically has ran in about ~60 seconds at most, and now I am seeing runs that are a couple of minutes a piece, or even longer.

I don't know how to account for those changes in runtime, it's the same code base, same tests.

@kgaughan
Copy link
Member Author

Yep, it confused me too. It hangs for me randomly from time to time in the middle of the functional tests. I couldn't detect a pattern to it, hence why I tried cleaning up the fixture subprocesses more aggressively.

@simonk52
Copy link
Contributor

It hangs for me randomly from time to time in the middle of the functional tests

Just to add an additional data point, I see the same thing. I'm on Ubuntu 24.04, running tox with pythons 3.12 and 3.13, with waitress at commit 23ac524.

I just ran tox in a loop and it completed successfully the first 2 times but hung on the third attempt, during test_functional.py. I hit Ctrl-C after a few minutes:

platform linux -- Python 3.12.3, pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/py312/.pytest_cache
rootdir: /home/nfhd78/Downloads/waitress
configfile: setup.cfg
testpaths: tests
plugins: cov-6.0.0
collected 797 items                                                                                                                                                                                                                                                      

tests/test_adjustments.py .................................................                                                                                                                                                                                        [  6%]
tests/test_buffers.py ....................................................                                                                                                                                                                                         [ 12%]
tests/test_channel.py ..........................................................................................................................                                                                                                                   [ 27%]
tests/test_functional.py .....................................................................^CROOT: [1997110] KeyboardInterrupt - teardown started
ROOT: interrupt tox environment: py312
ROOT: requested interrupt of 1997666 from 1997110, activate in 0.00
ROOT: send signal SIGINT(2) to 1997666 from 1997110 with timeout 0.30
ROOT: send signal SIGTERM(15) to 1997666 from 1997110 with timeout 0.20
ROOT: interrupt finished with success
.pkg: interrupt tox environment: .pkg
py312: exit -15 (726.06 seconds) /home/nfhd78/Downloads/waitress> python -mpytest pid=1997666
  lint: OK (13.31=setup[0.04]+cmd[0.28,1.88,5.35,5.27,0.49] seconds)
  py39: SKIP (0.06 seconds)
  py310: SKIP (0.05 seconds)
  py311: SKIP (0.05 seconds)
  py312: FAIL code -15 (731.47=setup[5.41]+cmd[0.00,726.06] seconds)
  py313: FAIL code -3 (0.01 seconds)
  pypy39: FAIL code -3 (0.01 seconds)
  pypy310: FAIL code -3 (0.01 seconds)
  coverage: FAIL code -3 (0.01 seconds)
  docs: FAIL code -3 (0.01 seconds)
  evaluation failed :( (745.03 seconds)

The fourth run hung in a similar way, although after a different number of tests within test_functional.py (so I don't think it's the same test hanging each time)

platform linux -- Python 3.12.3, pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/py312/.pytest_cache
rootdir: /home/nfhd78/Downloads/waitress
configfile: setup.cfg
testpaths: tests
plugins: cov-6.0.0
collected 797 items                                                                                                                                                                                                                                                      

tests/test_adjustments.py .................................................                                                                                                                                                                                        [  6%]
tests/test_buffers.py ....................................................                                                                                                                                                                                         [ 12%]
tests/test_channel.py ..........................................................................................................................                                                                                                                   [ 27%]
tests/test_functional.py ...................................................................................^CROOT: [1999983] KeyboardInterrupt - teardown started
ROOT: interrupt tox environment: py312
ROOT: requested interrupt of 2000537 from 1999983, activate in 0.00
ROOT: send signal SIGINT(2) to 2000537 from 1999983 with timeout 0.30
/home/nfhd78/Downloads/waitress/.tox/py312/lib/python3.12/site-packages/_pytest/main.py:337: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: _cov, Hook: pytest_runtestloop
KeyboardInterrupt: 
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
  config.hook.pytest_runtestloop(session=session)
Process ForkProcess-84:
Traceback (most recent call last):
ROOT: send signal SIGTERM(15) to 2000537 from 1999983 with timeout 0.20
ROOT: interrupt finished with success
.pkg: interrupt tox environment: .pkg
py312: exit -15 (338.49 seconds) /home/nfhd78/Downloads/waitress> python -mpytest pid=2000537
  lint: OK (13.41=setup[0.04]+cmd[0.25,1.82,5.31,5.49,0.49] seconds)
  py39: SKIP (0.05 seconds)
  py310: SKIP (0.05 seconds)
  py311: SKIP (0.07 seconds)
  py312: FAIL code -15 (343.95=setup[5.45]+cmd[0.00,338.49] seconds)
  py313: FAIL code -3 (0.01 seconds)
  pypy39: FAIL code -3 (0.01 seconds)
  pypy310: FAIL code -3 (0.01 seconds)
  coverage: FAIL code -3 (0.01 seconds)
  docs: FAIL code -3 (0.01 seconds)
  evaluation failed :( (357.65 seconds)

In total I ran tox 7 times, and it hung on 5 of those runs.

@kgaughan
Copy link
Member Author

@simonk52 Delta's doing some testing elsewhere that 🤞 might render this PR mostly moot.

@digitalresistor
Copy link
Member

It's due to a hack that we have had in test_functional.py which adds a signal handler for SIGTERM, which breaks on newer versions of coverage. See #455

#454 contains some test runs I did with various coverage versions. The change was introduced in 7.5.4 which also enabled running on Py3.13 without the GIL.

@digitalresistor
Copy link
Member

I've merged the removal of the hack, I am closing this PR.

@kgaughan
Copy link
Member Author

Thanks! Nice that the mystery is solved!

@kgaughan kgaughan deleted the timeout-functional-tests branch November 16, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants