gh-110722: Make `-m test -T -j` use sys.monitoring #111710

ambv · 2023-11-03T19:09:16Z

Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. Below is a non-scientific benchmark of how much faster this is compared to the old way of doing things.

The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise.

-T -j requires --with-pydebug as it relies on -Xpresite=.

Without coverage

== Tests result: SUCCESS ==

10 slowest tests:
- test_configparser: 1.4 sec

1 test OK.

With old trace.py-based coverage

== Tests result: SUCCESS ==

10 slowest tests:
- test_configparser: 16.0 sec

1 test OK.

With new `-T -j1` coverage based on `sys.monitoring`

== Tests result: SUCCESS ==

10 slowest tests:
- test_configparser: 2.3 sec

1 test OK.

Final caveat

Not all tests in the suite currently play well with coverage. When I'm running with -Uall,extralargefile on an M1 Mac, I get:

14 tests failed:
    test.test_asyncio.test_tasks test_bz2 test_capi test_curses
    test_descr test_dis test_frame test_gc test_lzma test_monitoring
    test_opcache test_tracemalloc test_types test_zlib

429 tests OK.

This was true also with the old trace-based coverage gathering but it was so slow that probably nobody bothered to run coverage over the entire suite.

I will look into fixing some of those incompatibilities because most of them are about reference counts, like:

test test_lzma failed -- Traceback (most recent call last):
  File "/Volumes/RAMDisk/cpython/Lib/test/support/__init__.py", line 1093, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/RAMDisk/cpython/Lib/test/test_lzma.py", line 381, in test_refleaks_in_decompressor___init__
    self.assertAlmostEqual(gettotalrefcount() - refs_before, 0, delta=10)
AssertionError: 12 != 0 within 10 delta (12 difference)

📚 Documentation preview 📚: https://cpython-previews--111710.org.readthedocs.build/

Issue: Make regrtest collect test line coverage using sys.monitoring #110722

Lib/test/libregrtest/results.py

ambv · 2023-11-06T21:25:16Z

My hunch that nobody used to run a full suite with coverage seems to be corroborated by it not working, apparently. I was unable to get a full run of python -m test -T to complete on any version of Python between 3.8 and 3.13 on either Linux aarch64 or macOS M1.

So instead, I focused on bringing down the number of tests failing under coverage. With the latest change, tests already marked with @no_tracing are now temporarily disabling sys.monitoring events as well. This leaves 10 failing tests, all of which also failed with coverage before this PR. I will be fixing those in separate PRs because that's not related to the change in this PR.

10 tests failed:
    test_capi test_curses test_descr test_dis test_frame test_gc
    test_monitoring test_opcache test_tracemalloc test_types

In any case, unless anybody's got any strong opinions on this, this is ready to land.

vstinner · 2023-11-10T11:05:17Z

I would prefer to also use test.cov when tests are run sequentially. Here is a patch to support it as well. It uses the new libregrtest code to "add options to Python":

diff --git a/Lib/test/libregrtest/cmdline.py b/Lib/test/libregrtest/cmdline.py
index a5f02d6335..cf3943dd5d 100644
--- a/Lib/test/libregrtest/cmdline.py
+++ b/Lib/test/libregrtest/cmdline.py
@@ -426,8 +426,6 @@ def _parse_args(args, **kwargs):
             ns.rerun = True
         ns.print_slow = True
         ns.verbose3 = True
-    else:
-        ns._add_python_opts = False
 
     # When both --slow-ci and --fast-ci options are present,
     # --slow-ci has the priority
@@ -448,16 +446,8 @@ def _parse_args(args, **kwargs):
 
     if ns.single and ns.fromfile:
         parser.error("-s and -f don't go together!")
-    if ns.trace:
-        if ns.use_mp is not None:
-            if not Py_DEBUG:
-                parser.error("need --with-pydebug to use -T and -j together")
-        else:
-            print(
-                "Warning: collecting coverage without -j is imprecise. Configure"
-                " --with-pydebug and run -m test -T -j for best results.",
-                file=sys.stderr
-            )
+    if ns.trace and not Py_DEBUG:
+        parser.error("need --with-pydebug to use -T")
     if ns.python is not None:
         if ns.use_mp is None:
             parser.error("-p requires -j!")
diff --git a/Lib/test/libregrtest/main.py b/Lib/test/libregrtest/main.py
index 296e1b4631..2a16969814 100644
--- a/Lib/test/libregrtest/main.py
+++ b/Lib/test/libregrtest/main.py
@@ -605,6 +605,8 @@ def _add_python_opts(self):
         environ, keep_environ = self._add_cross_compile_opts(regrtest_opts)
         if self.ci_mode:
             self._add_ci_python_opts(python_opts, keep_environ)
+        if self.coverage:
+            python_opts.extend(('-X', 'test.cov'))
 
         if (not python_opts) and (not regrtest_opts) and (environ is None):
             # Nothing changed: nothing to do

vstinner

Maybe test.coverage name would be more explicit than test.cov?

I would prefer to use the same coverage implementation in all cases.

Would it be possible to make presite feature available to release builds?

Doc/library/trace.rst

Lib/test/cov.py

Lib/test/libregrtest/main.py

Lib/test/libregrtest/result.py

Lib/test/cov.py

vstinner · 2023-11-10T11:28:49Z

This leaves 10 failing tests, all of which also failed with coverage before this PR. I will be fixing those in separate PRs because that's not related to the change in this PR.

I suppose that a function can be added to test.support to check if code coverage is enabled, so we can skip lines of these tests which are impacted by coverage. Extract of test_descr:

            new_objects = len(gc.get_objects())
            self.assertEqual(orig_objects, new_objects)

Well, if a tracing tool runs in the background, such test can fail. So yeah, we should skip these lines, or just the whole test method.

Doc/library/trace.rst

Lib/test/cov.py

ambv · 2023-11-10T11:45:07Z

Thanks for the review!

I'd prefer to change sequential -T runs to sys.monitoring in a future update because it's not as simple as your quick-and-dirty patch. By the time you add opts in _add_python_opts, libregrtest already imported a ton of libraries, the data is already ruined. Running -T sequentially with sys.monitoring will require -Xpresite= to be passed manually by the user, which I would prefer people don't have to do. So we would have to change sequential tests to execve or something with -Xpresite=.

Then there's the complication that -Xpresite is only available for Py_DEBUG builds because there were concerns this new hook could be misused by users. We can revisit adding it to production builds in the future but I don't want to block this PR on that discussion.

vstinner · 2023-11-10T13:04:57Z

By the time you add opts in _add_python_opts, libregrtest already imported a ton of libraries, the data is already ruined.

This function replaces the current process with a new process which has the expected Python options: see _execute_python() method. On Unix, it uses the magic os.execv() function.

vstinner · 2023-11-10T13:06:29Z

Then there's the complication that -Xpresite is only available for Py_DEBUG builds because there were concerns this new hook could be misused by users. We can revisit adding it to production builds in the future but I don't want to block this PR on that discussion.

It makes sense, ok. I don't want to block the PR, I just mean that "it would be nice" to use test.cov in all cases.

Doc/library/trace.rst

ambv · 2023-11-10T17:19:05Z

Thanks for reviews, Victor and Serhiy! ✨ 🍰 ✨

…1710) Now all results from worker processes are aggregated and displayed together as a summary at the end of a regrtest run. The traditional trace is left in place for use with sequential in-process test runs but now raises a warning that those numbers are not precise. `-T -j` requires `--with-pydebug` as it relies on `-Xpresite=`.

Make -m test -T -j use sys.monitoring

9611b56

bedevere-app bot added the awaiting core review label Nov 3, 2023

ambv changed the title ~~Make -m test -T -j use sys.monitoring~~ gh-110722: Make -m test -T -j use sys.monitoring Nov 3, 2023

bedevere-app bot mentioned this pull request Nov 3, 2023

Make regrtest collect test line coverage using sys.monitoring #110722

Open

Only run -T -j test when Python is built --with-pydebug

bf65fbe

gaogaotiantian reviewed Nov 3, 2023

View reviewed changes

Lib/test/libregrtest/results.py Outdated Show resolved Hide resolved

ambv added 3 commits November 6, 2023 14:35

Simplify get_coverage_results

1adbf57

Turn off sys.monitoring hooks for tests marked with @no_tracing

d4f5b6b

Merge branch 'main' into pythongh-110722-regrtest

84263a3

Make patchcheck happy

3a0688b

ambv requested review from vstinner and serhiy-storchaka November 10, 2023 10:43

vstinner reviewed Nov 10, 2023

View reviewed changes

serhiy-storchaka reviewed Nov 10, 2023

View reviewed changes

Doc/library/trace.rst Outdated Show resolved Hide resolved

Lib/test/cov.py Outdated Show resolved Hide resolved

Add improvements after review by Victor and Serhiy

aa31295

serhiy-storchaka reviewed Nov 10, 2023

View reviewed changes

Doc/library/trace.rst Show resolved Hide resolved

serhiy-storchaka approved these changes Nov 10, 2023

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 10, 2023

ambv merged commit 3932b0f into python:main Nov 10, 2023

bedevere-app bot removed the awaiting merge label Nov 10, 2023

ambv deleted the gh-110722-regrtest branch November 10, 2023 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-110722: Make `-m test -T -j` use sys.monitoring #111710

gh-110722: Make `-m test -T -j` use sys.monitoring #111710

ambv commented Nov 3, 2023 •

edited by bedevere-app bot

Loading

ambv commented Nov 6, 2023

vstinner commented Nov 10, 2023

vstinner left a comment

vstinner commented Nov 10, 2023

ambv commented Nov 10, 2023

vstinner commented Nov 10, 2023

vstinner commented Nov 10, 2023

ambv commented Nov 10, 2023

gh-110722: Make -m test -T -j use sys.monitoring #111710

gh-110722: Make -m test -T -j use sys.monitoring #111710

Conversation

ambv commented Nov 3, 2023 • edited by bedevere-app bot Loading

Without coverage

With old trace.py-based coverage

With new -T -j1 coverage based on sys.monitoring

Final caveat

ambv commented Nov 6, 2023

vstinner commented Nov 10, 2023

vstinner left a comment

Choose a reason for hiding this comment

vstinner commented Nov 10, 2023

ambv commented Nov 10, 2023

vstinner commented Nov 10, 2023

vstinner commented Nov 10, 2023

ambv commented Nov 10, 2023

gh-110722: Make `-m test -T -j` use sys.monitoring #111710

gh-110722: Make `-m test -T -j` use sys.monitoring #111710

ambv commented Nov 3, 2023 •

edited by bedevere-app bot

Loading

With new `-T -j1` coverage based on `sys.monitoring`