Skip to content

0 The Plan

Eric Snow edited this page Aug 8, 2023 · 95 revisions

Pull in info from:

High-level Plan

We're aiming for 3.12 (feature freeze currently May 2023).

* per-interpreter GIL
   1. prerequisites (in no particular order)
      a. deal with global variables
         1. tooling (identify globals; CI check)
         2. consolidate globals (move (almost) all non-const globals to `_PyRuntimeState`)
         3. interpreter isolation (move most of those globals down to `PyInterpreterState`)
      b. solve main blockers (see "Main Blockers" below)
      c. add new C-API function/config for creating subinterpreters
   2. 💥move the GIL to PyInterpreterState💥
* expose multiple interpreters via stdlib
   1. minimal stdlib module (or via concurrent.futures)
   2. mechanism for basic communication (and maybe simple sharing)
* multi-interpreter support in extension modules

The above is the essential work. There is plenty of opportunity to optimize and expand once the foundation is complete.

Main Blockers

These are all discussed in detail in PEP 684.

  • objects exposed in public C-API - solved with immortal objects
  • per-interpreter vs. shared memory allocators - solved by using mimalloc (a thread-safe, performant allocator) tweaking the constraints and moving the object allocator to the interpreter
  • fixing gilstate API - ???
  • impact on extension maintainers - solved with docs and by working directly with numpy, cython, etc.

Current Status Report (2023-02-06)

expand

...

Recently completed (and milestones reached):

  • finished consolidating most remaining globals to _PyRuntimeState (except "immortal" objects and globals in stdlib extensions)
  • PEP 683 (immortal objects) status update (for final Steering Council approval)
  • PEP 554 updates
  • finished analysis of runtime state for per-interpreter GIL (gh-100227)
  • finished analysis of GILState API
  • submitted PEP 684 to Steering Council

In process:

  • (Eddie) PEP 683 - final discussions with Steering Council (and then merge)
  • (Eric) tooling - add c-analyzer CI check (globals)
  • (Eric) PEP 684 - waiting on Steering Council
  • (Daan) mimalloc - hooks for valgrind (mostly done), clang ASAN, etc.
  • (Dong-hee, Kumar, others) isolating stdlib extension modules
  • (Eric) explore concurrent.futures alternative to PEP 554

Blocked:

  • (by PEP 684) move allocators from _PyRuntimeState to PyInterpreterState (has branch, mostly ready)
  • (by PEP 684) isolate other global state that's sensitive to a per-interpreter GIL
  • (by PEP 684) extension module compatibility check (has PR, mostly ready)
  • (by PEP 684) move the GIL to PyInterpreterState (has PR)

Upcoming:

  • PEP 683 - docs updates
  • PEP 683 - make more objects immortal

Other:

  • stdlib extension modules can be fixed now, per PEP 687
  • this includes multi-phase init, and heap types

-- remaining global variables --

These will be either moved to _PyRuntimeState, moved to PyInterpreterState, or ignored (marked as safe). Most that move to _PyRuntimeState will eventually be moved to PyInterpreterState.

(Note: between immortal objects and isolating stdlib extension modules, there aren't any global variables left to consolidate.)

$ python3 Tools/c-analyzer/table-file.py Tools/c-analyzer/cpython/globals-to-fix.tsv
 --------------------------------------------------
  (262)       global objects to fix in core code
          92      exported builtin types (C-API)
          16      other exported builtin types
           4      private static builtin types
           8      static builtin structseq
         134      builtin exception types
           8      singletons
 --------------------------------------------------
   (46)       global objects to fix in builtin modules
          46      static types
 --------------------------------------------------
  (138)       global objects to fix in extension modules
          72      static types
   (15)           non-static types - initialized once
           6          heap types
           9          exception types
   (16)           cached - initialized once
           3          manually cached PyUnicodeOjbect
          13          other - during module init
   (35)           other
          21          initialized once
          14          state
 --------------------------------------------------
   (73)       global non-objects to fix in extension modules
   (45)           initialized once
           2          pre-allocated buffer
          43          other
          28      state
 --------------------------------------------------
(total: 519)

Analysis:

  • the total, 519, includes variables in core code, builtin modules, and stdlib extension modules
  • there would be 310 globals left to resolve, once all the extension modules are multi-interpreter-compatible (PEP 687)
  • of those 310, 302 are (mostly static) types (including all the builtin exceptions) and 8 are singletons, all of which will become immortal
  • that leaves 0 globals to consolidate
  • (state in _PyRuntimeState still needs to be isolated, i.e. moved or guarded by locks)

Major Projects

* per-interpreter GIL (Eric, et al)
* prerequisites for per-interpreter GIL
   * immortal objects (Eddie Elizondo) - solves "objects exposed by C-API"
   * ~incorporate mimalloc (Christian Heimes) - solves "per-interpreter vs. shared memory allocators"~
* expose multiple interpreters via stdlib (Eric, et al)
* multi-interpreter support in extension modules (Petr, et al)
* runtime state improvements, incl. init/fini (Eric, Victor, Nick, et al)

Per-interpreter GIL

(See PEP 684.)

high-level:

before PEP 684 is accepted:

  • port remaining stdlib extensions
  • isolate appropriate runtime state to PyInterpreterState (issue)
  • fix existing bugs (incl. gilstate API)
  • CI check
  • (finish immortal objects impl.)
  • PEP 684

after PEP 684 is accepted:

(possibly do all this in a demo branch earlier)

  • add locks around some runtime state (issue)
  • isolate some runtime state to PyInterpreterState (issue)
  • add new check to ExtensionFileLoader (including PyInterpreterConfig field, etc.)
  • add new public API
  • add PyInterpreterConfig.own_gil
  • resolve GIL-related constraints on allocators API
  • move GIL to PyInterpreterState
  • docs

detailed TODO list:

  • runtime improvements
    • clean up init (PEPs 432, 587)
    • add _PyRuntimeState
    • ... (lots of stuff already done)
    • impl - move the global runtime config to _PyRuntimeState gh-91120 PR
    • ...
  • tooling
    • ... (lots of stuff already done)
    • add Tools/c-analyzer
    • deal with the excluded files (in Tools/c-analyzer/cpython/_parser.py)
    • deal with globals that should have been found but weren't (in Tools/c-analyzer/cpython/globals-to-fix.tsv)
    • solve bugs with globals in Tools/c-analyzer/cpython/ignored.tsv that should have been auto-ignored
    • ensure all globals are getting found
    • improve error message for failed check
    • improve CLI help
    • enable CI check
    • bonus:
      • ? explicitly mark actually const globals as const that currently aren't marked (listed in ignored.tsv)
      • ? support non-GCC, non-linux
      • ? finish "c-analyzer.py data show" (from Tools/c-analyzer/table-file.py)
      • ? fix "c-analyzer.py data check" & add to CI
  • consolidate globals (see Tools/c-analyzer/cpython/globals-to-fix.tsv)
    • ... (lots of stuff already done)
    • finish consolidating globals to _PyRuntimeState (ignoring "immortal" objects, stdlib extensions)
  • interpreter isolation
    • ... (lots of stuff already done)
    • fix _io module isolation [PR]
    • port modules
      • (grep -l PyMODINIT_FUNC {Python,Modules}/**/*.c | xargs grep -L PyModuleDef_Init)
      • builtin modules (see Modules/config.c)
        • _abc, _ast, _codecs, _collections, _functools, _imp
        • _io [issue] [PR]
        • _locale, _operator, _signal, _sre, _stat, _string, _symtable, _thread, _tokenize
        • _tracemalloc
        • _warnings, _weakref, atexit
        • builtins
        • errno, faulthandler, gc, itertools, marshal, posix, pwd
        • sys
        • time
      • stdlib extension modules (see PEP 687 and gh-84258)
        • _asyncio
        • _bisect, _blake2, _bz2, _contextvars, _crypt, _csv
        • _ctypes
        • _curses [issue]
        • _curses_panel
        • _datetime
        • _dbm
        • _decimal
        • _gdbm, _hashlib, _heapq, _json, _lsprof, _lzna
        • _msi
        • _multibytecodec, _multiprocessing, _opcode
        • _pickle
        • _posixshmem, _posixsubprocess, _queue, _random, _scproxy, _sqlite
        • _socket
        • _ssl, _statistics, _struct
        • _tkinter
        • _typing, _uuid, _winapi, _zoneinfo
        • array, audioop, binascii, cmath, fcntl, grp, math, md5, mmap
        • msvcrt
        • nis
        • ossaudiodev
        • pyexpat
        • readline
        • resource, select, sha1, sha256, sha3, sha512
        • spwd, syslog, termios, unicodedata
        • winreg
        • winsound
        • zlib
      • test modules (maybe)
        • _ctypes_test
        • _testbuffer
        • _testcapi
        • _testimportmultiple
        • _testinternalcapi
        • _xxsubinterpreters
        • _xxtestfuzz
        • xxlimited, xxlimited_35, xxmodule, xxsubtype
    • isolate modules (ported but still have globals)
      • builtin modules
        • _collections
      • stdlib extension modules
        • _asyncio
        • _curses_panel
        • _elementtree
        • _lsprof (rotatingtree.c)
        • _multibytecodec
        • _ssl
        • _zoneinfo
        • array
        • faulthandler (issue)
        • nis
        • pyexpat
        • syslog
        • tracemalloc (issue)
      • test modules
        • _testmultiphase
        • xxlimited_35
        • xxmodule
        • xxsubtype
    • deal with _Py_IDENTIFIER() in stdlib extension modules
    • move the allocators (https://github.com/ericsnowcurrently/cpython/tree/per-interpreter-alloc)
    • add per-interpreter state for static builtin types
    • make tp_subclasses per-interpreter for static builtin types
    • make tp_weaklist per-interpreter for static builtin types
    • (not trivial) deal with _PyArgs_Parser
    • immortalize the singleton objects
    • immortalize the core/builtin static types
    • immortalize interned strings (and global objects in _PyRuntime.cached_objects) [PR]
    • isolate global state in _PyRuntimeState (some may make sense regardless of per-interpreter GIL)
      • analyze (gh-100227; fill in the following sub-checklists)
      • keep global (safely): _PyRuntimeState.cached_objects.interned [PR]
      • move state to PyInterpreterState
        • _PyRuntimeState.faulthandler issue
        • _PyRuntimeState.tracemalloc issue
        • _PyRuntimeState.allocators.obj_arena (effectively coupled to _PyRuntime.obmalloc)
        • _PyRuntimeState.obmalloc [issue] [PR]
        • _PyRuntimeState.signals.default_handler (move to module state)
        • _PyRuntimeState.signals.ignore_handler (move to module state)
        • _PyRuntimeState.imports.lock
        • _PyRuntimeState.imports.find_and_load
        • _PyRuntimeState.dtoa
        • _PyRuntimeState.dict_state.global_version
        • _PyRuntimeState.dict_state.next_keys_version
        • _PyRuntimeState.func_state.next_version
        • _PyRuntimeState.types.next_version_tag (special-case for global/immortal types) [issue] [PR]
        • _PyRuntimeState.cached_objects.str_replace_inf
        • (maybe) _PyRuntimeState.cached_objects.interned [PR]
        • Objects/object.c - _Py_RefTotal (sort of duplicated on runtime state)
      • (maybe) copy state to PyInterpreterState (where coexisting global/interpreter settings make sense)
        • ...
    • ...
    • bugs:
  • PEP 684
    • write and publish
    • initial python-dev discussion
    • intermediate edits
    • settle on a solution for the allocators
    • decide about distinct per-interpreter GIL compatibility in extensions
    • find a solution for the gilstate API (gh-59956)
    • address any other non-trivial global state concerns (see gh-100227)
    • final round of python-dev discussion
    • final edits
    • submit to steering council (issue)
    • pre-implementation
      • add _PyInterpreterConfig
      • add _Py_NewInterpreterFromConfig()
      • update optional restrictions
      • ...
    • implementation - gh-99113 (blocked until PEP 684 is accepted)
      • change private C-API to public
      • (not trivial?) address memory allocator API guarantees about the GIL
      • add granular locks to protect some of _PyRuntimeState (were protected by the GIL)
        • _PyRuntimeState.exitfuncs (race in Py_AtExit())
        • _PyRuntimeState.nexitfuncs (race in Py_AtExit())
        • _PyRuntimeState.allocators (PyMem_SetAllocator() with "wrappers"; also when using mem/obj allocators)
        • _PyRuntimeState.audit_hook_head (race in PySys_AddAuditHook())
        • _PyRuntimeState.ceval.perf
        • Objects/longobject.c - (long_from_non_binary_base) log_base_BASE (lazy)
        • Objects/longobject.c - (long_from_non_binary_base) convwidth_base (lazy)
        • Objects/longobject.c - (long_from_non_binary_base) convmultmax_base (lazy)
        • (maybe) Objects/object.c - _Py_RefTotal (also added to interpreter state)
        • global objects (in _PyRuntimeState.cached_objects: interned dict, extensions cache) [PR]
      • add a granular lock for importing extension modules (will protect _PyRuntimeState.imports.extensions and _PyRuntimeState.imports.last_module_index) [branch]
      • move state from _PyRuntimeState to PyInterpreterState
        • (maybe) allocators
        • _PyRuntimeState.tstate_current (thread-local variable)
        • ...
        • 💥the GIL💥 PR
      • extension module restrictions gh-98627 PR
      • make PyInterpreterConfig and Py_NewInterpreterFromConfig() public API
      • add PyInterpreterConfig.own_gil
      • docs additions
      • ...

Remaining Work for beta1 with Time Estimates

(expand)

Estimates are in units of 1 day. Time waiting for reviews is not included. A trailing * indicates that additional work has already been done (e.g. there's a branch ready).

The current aggregate estimate is about 64 62 days (35 33 days if you leave out the extension modules).

(Estimates on porting are strictly guesses where there isn't a *.)

isolate extension modules (29):

(see [gh-103092](https://github.com/python/cpython/issues/103092))

* ✅1* port _io (builtin)
* 3* port _tracemalloc (builtin)
* 1 port _ctypes
* 1 port _curses
* 1* port _datetime
* 1 port _decimal
* 1 port _msi
* ✅1* port _pickle
* ✅1 port _socket
* 1 ~port _tkinter~
* ✅1 port msvcrt
* 1 ~port ossaudiodev~
* 1 ~port readline~
* ✅1 port winreg
* ✅1 port winsound
* ✅1 isolate _asyncio
* ✅1* isolate _collections (builtin)
* 1 isolate _curses_panel
* ✅1 isolate _elementtree
* 1 ~isolate _lsprof~
* ✅1 isolate _multibytecodec
* ✅1* isolate _ssl
* ✅1 isolate array
* 2* isolate faulthandler (builtin)
* 1 ~isolate nis~
* ✅1 isolate pyexpat

other interpreter isolation (9):

* ✅1 immortalize the singleton objects
* ✅1 immortalize the core/builtin static types
* ✅1* immortalize global objects in `_PyRuntime.cached_objects` and interned strings
* ✅2* move `_PyRuntimeState.obmalloc` to `PyInterpreterState`
* ✅2* move `_PyRuntimeState.types.next_version_tag` to `PyInterpreterState`
* ✅1* move `_PyRuntimeState.cached_objects.interned` to `PyInterpreterState` (or make atomic)
* ✅1* move `_Py_RefTotal` to `PyInterpreterState`

per-interpreter GIL (22-23):

* 3 address memory allocator API guarantees about the GIL (e.g. wrap custom with locks?)
* 2 add a lock for modifying `_PyRuntimeState.allocators`
* 1 add a lock for `_PyRuntimeState.exitfuncs` and `_PyRuntimeState.nexitfuncs`
* 1 add a lock for `_PyRuntimeState.audit_hook_head`
* 1 add a lock for `_PyRuntimeState.ceval.perf`
* 1 add a lock for lazy PyLongObject global state (`log_base_BASE`, `convwidth_base`, `convmultmax_base`)
* 1 (maybe) add a lock for the global part of `_Py_RefTotal`
* 1* ~add a lock for global objects in `_PyRuntime.cached_objects` (interned dict, extensions cache)~
* ✅3* add a lock for importing extension modules
* ✅5 move `_PyRuntimeState.tstate_current` to a thread-local variable
* 2 add `PyInterpreterConfig.own_gil`
* 2* move the GIL to `PyInterpreterState`

other PEP 684 (3):

* ✅1 change private C-API to public
* 2 (maybe) add extra extension module restrictions

Remaining Work for 3.12

(expand)

Per PEP 693, the first release candidate is scheduled for July 31st. Pretty much all of the following (minus docs) should be done by then.

maybe finish isolating extension modules (see gh-103092):

  • add a test (issue)
  • port _tracemalloc (builtin) (issue)
  • isolate faulthandler (builtin) (issue)
  • port _ctypes (PR, PR)
  • port _datetime (PR)
  • port _decimal (PR)

related:

Immortal Objects

(This is mostly Eddie Elizondo's project. See gh-84436.)

  • PEP 683
  • initial implementation (Eddie)
  • initial python-dev discussion
  • saturated refcounts
  • final-ish edits (Eddie)
  • start final round of python-dev discussion
  • final impl adjustments for <2% (Eddie)
  • PEP: submit to steering council (update: accepted!)
  • implementation
    • finish & merge (PR)
    • perform and publish final performance impact analysis
    • update C-API docs with clarifications about ref counts (in 3.11 too?)
    • add immortal check/fix to tp_dealloc for relevant types (e.g. singletons, PyTypeObject, int, str)
  • make more objects immortal

mimalloc

(This is mostly Christian Heimes' project.)

(Note: this is no longer needed for the multi-core Python project, but we'll keep it here for continuity.)

  • ? PEP
  • working implementation
  • connect with Daan (mimalloc creator)
  • fix: fails to build with PGO
  • fix: leaking large blocks (Daan)
  • fix: integration with memory profilers, especially valgrand, asan (Daan)

Expose Multiple Interpreters via Stdlib

(See PEP 554.)

  • PEP 554
    • write and publish
    • initial python-dev discussion
    • simplify (incl. moving "Deferred" section to some other doc)
    • further python-dev discussion
    • submit to steering council
  • alternative: concurrent.futures
    • explore
    • maybe write a PEP
  • extension module (low-level)
    • implement interpreter management
    • implement channels
    • merge as test.support._xxsubinterpreters
    • ...
    • rename to _interpreters
  • Python module
    • ...

Multi-interpreter Support in Extension Modules

  • add per-interpreter module state (PEPs 384 & 3121)
  • add per-interpreter init/fini (PEP 489)
  • fix module state deficiencies (PEPs ...)
  • document how to implement support (PEP 630)
  • add the PEP 630 info to the C-API docs
  • determine how to help large extensions (reach out to numpy, cython)
  • deal with _Py_IDENTIFIER() in external (community) extension modules
  • ...