-
Notifications
You must be signed in to change notification settings - Fork 6
0 The Plan
Pull in info from:
- https://github.com/ericsnowcurrently/multi-core-python/issues/78
- https://github.com/ericsnowcurrently/multi-core-python/projects/4
- https://github.com/ericsnowcurrently/multi-core-python/projects/2
- https://github.com/ericsnowcurrently/multi-core-python/projects/1
- https://github.com/ericsnowcurrently/multi-core-python/projects/6
- https://github.com/ericsnowcurrently/multi-core-python/projects/5
- https://github.com/orgs/python/projects/3/views/1
We're aiming for 3.12 (feature freeze currently May 2023).
* per-interpreter GIL
1. prerequisites (in no particular order)
a. deal with global variables
1. tooling (identify globals; CI check)
2. consolidate globals (move (almost) all non-const globals to `_PyRuntimeState`)
3. interpreter isolation (move most of those globals down to `PyInterpreterState`)
b. solve main blockers (see "Main Blockers" below)
c. add new C-API function/config for creating subinterpreters
2. 💥move the GIL to PyInterpreterState💥
* expose multiple interpreters via stdlib
1. minimal stdlib module (or via concurrent.futures)
2. mechanism for basic communication (and maybe simple sharing)
* multi-interpreter support in extension modules
The above is the essential work. There is plenty of opportunity to optimize and expand once the foundation is complete.
These are all discussed in detail in PEP 684.
- objects exposed in public C-API - solved with immortal objects
- per-interpreter vs. shared memory allocators - solved by
using mimalloc (a thread-safe, performant allocator)tweaking the constraints and moving the object allocator to the interpreter - fixing gilstate API - ???
- impact on extension maintainers - solved with docs and by working directly with numpy, cython, etc.
expand
...
Recently completed (and milestones reached):
- finished consolidating most remaining globals to
_PyRuntimeState
(except "immortal" objects and globals in stdlib extensions) - PEP 683 (immortal objects) status update (for final Steering Council approval)
- PEP 554 updates
- finished analysis of runtime state for per-interpreter GIL (gh-100227)
- finished analysis of GILState API
- submitted PEP 684 to Steering Council
In process:
- (Eddie) PEP 683 - final discussions with Steering Council (and then merge)
- (Eric) tooling - add c-analyzer CI check (globals)
- (Eric) PEP 684 - waiting on Steering Council
- (Daan) mimalloc - hooks for valgrind (mostly done), clang ASAN, etc.
- (Dong-hee, Kumar, others) isolating stdlib extension modules
- (Eric) explore concurrent.futures alternative to PEP 554
Blocked:
- (by PEP 684) move allocators from
_PyRuntimeState
toPyInterpreterState
(has branch, mostly ready) - (by PEP 684) isolate other global state that's sensitive to a per-interpreter GIL
- (by PEP 684) extension module compatibility check (has PR, mostly ready)
- (by PEP 684) move the GIL to
PyInterpreterState
(has PR)
Upcoming:
- PEP 683 - docs updates
- PEP 683 - make more objects immortal
Other:
- stdlib extension modules can be fixed now, per PEP 687
- this includes multi-phase init, and heap types
-- remaining global variables --
These will be either moved to _PyRuntimeState
, moved to PyInterpreterState
, or ignored (marked as safe). Most that move to _PyRuntimeState
will eventually be moved to PyInterpreterState
.
(Note: between immortal objects and isolating stdlib extension modules, there aren't any global variables left to consolidate.)
$ python3 Tools/c-analyzer/table-file.py Tools/c-analyzer/cpython/globals-to-fix.tsv
--------------------------------------------------
(262) global objects to fix in core code
92 exported builtin types (C-API)
16 other exported builtin types
4 private static builtin types
8 static builtin structseq
134 builtin exception types
8 singletons
--------------------------------------------------
(46) global objects to fix in builtin modules
46 static types
--------------------------------------------------
(138) global objects to fix in extension modules
72 static types
(15) non-static types - initialized once
6 heap types
9 exception types
(16) cached - initialized once
3 manually cached PyUnicodeOjbect
13 other - during module init
(35) other
21 initialized once
14 state
--------------------------------------------------
(73) global non-objects to fix in extension modules
(45) initialized once
2 pre-allocated buffer
43 other
28 state
--------------------------------------------------
(total: 519)
Analysis:
- the total, 519, includes variables in core code, builtin modules, and stdlib extension modules
- there would be 310 globals left to resolve, once all the extension modules are multi-interpreter-compatible (PEP 687)
- of those 310, 302 are (mostly static) types (including all the builtin exceptions) and 8 are singletons, all of which will become immortal
- that leaves 0 globals to consolidate
- (state in
_PyRuntimeState
still needs to be isolated, i.e. moved or guarded by locks)
* per-interpreter GIL (Eric, et al)
* prerequisites for per-interpreter GIL
* immortal objects (Eddie Elizondo) - solves "objects exposed by C-API"
* ~incorporate mimalloc (Christian Heimes) - solves "per-interpreter vs. shared memory allocators"~
* expose multiple interpreters via stdlib (Eric, et al)
* multi-interpreter support in extension modules (Petr, et al)
* runtime state improvements, incl. init/fini (Eric, Victor, Nick, et al)
(See PEP 684.)
before PEP 684 is accepted:
- port remaining stdlib extensions
- isolate appropriate runtime state to
PyInterpreterState
(issue) - fix existing bugs (incl. gilstate API)
- CI check
- (finish immortal objects impl.)
- PEP 684
after PEP 684 is accepted:
(possibly do all this in a demo branch earlier)
- add locks around some runtime state (issue)
- isolate some runtime state to
PyInterpreterState
(issue) - add new check to
ExtensionFileLoader
(includingPyInterpreterConfig
field, etc.) - add new public API
- add
PyInterpreterConfig.own_gil
- resolve GIL-related constraints on allocators API
- move GIL to
PyInterpreterState
- docs
- runtime improvements
- tooling
- ... (lots of stuff already done)
- add Tools/c-analyzer
- deal with the excluded files (in Tools/c-analyzer/cpython/_parser.py)
- deal with globals that should have been found but weren't (in Tools/c-analyzer/cpython/globals-to-fix.tsv)
- solve bugs with globals in Tools/c-analyzer/cpython/ignored.tsv that should have been auto-ignored
- ensure all globals are getting found
- improve error message for failed check
- improve CLI help
- enable CI check
- bonus:
- ? explicitly mark actually const globals as const that currently aren't marked (listed in ignored.tsv)
- ? support non-GCC, non-linux
- ? finish "c-analyzer.py data show" (from Tools/c-analyzer/table-file.py)
- ? fix "c-analyzer.py data check" & add to CI
- consolidate globals (see Tools/c-analyzer/cpython/globals-to-fix.tsv)
- ... (lots of stuff already done)
- finish consolidating globals to
_PyRuntimeState
(ignoring "immortal" objects, stdlib extensions)
- interpreter isolation
- ... (lots of stuff already done)
- fix
_io
module isolation [PR] - port modules
- (
grep -l PyMODINIT_FUNC {Python,Modules}/**/*.c | xargs grep -L PyModuleDef_Init
) - builtin modules (see Modules/config.c)
- stdlib extension modules (see PEP 687 and gh-84258)
- _asyncio
- _bisect, _blake2, _bz2, _contextvars, _crypt, _csv
- _ctypes
- _curses [issue]
- _curses_panel
- _datetime
- _dbm
- _decimal
- _gdbm, _hashlib, _heapq, _json, _lsprof, _lzna
- _msi
- _multibytecodec, _multiprocessing, _opcode
- _pickle
- _posixshmem, _posixsubprocess, _queue, _random, _scproxy, _sqlite
- _socket
- _ssl, _statistics, _struct
- _tkinter
- _typing, _uuid, _winapi, _zoneinfo
- array, audioop, binascii, cmath, fcntl, grp, math, md5, mmap
- msvcrt
- nis
- ossaudiodev
- pyexpat
- readline
- resource, select, sha1, sha256, sha3, sha512
- spwd, syslog, termios, unicodedata
- winreg
- winsound
- zlib
- test modules (maybe)
- _ctypes_test
- _testbuffer
- _testcapi
- _testimportmultiple
- _testinternalcapi
- _xxsubinterpreters
- _xxtestfuzz
- xxlimited, xxlimited_35, xxmodule, xxsubtype
- (
- isolate modules (ported but still have globals)
- deal with
_Py_IDENTIFIER()
in stdlib extension modules - move the allocators (https://github.com/ericsnowcurrently/cpython/tree/per-interpreter-alloc)
- add per-interpreter state for static builtin types
- make tp_subclasses per-interpreter for static builtin types
- make tp_weaklist per-interpreter for static builtin types
- (not trivial) deal with
_PyArgs_Parser
- immortalize the singleton objects
- immortalize the core/builtin static types
- immortalize interned strings (and global objects in
_PyRuntime.cached_objects
) [PR] - isolate global state in
_PyRuntimeState
(some may make sense regardless of per-interpreter GIL)- analyze (gh-100227; fill in the following sub-checklists)
- keep global (safely):
_PyRuntimeState.cached_objects.interned
[PR] - move state to
PyInterpreterState
-
_PyRuntimeState.faulthandler
issue -
_PyRuntimeState.tracemalloc
issue -
_PyRuntimeState.allocators.obj_arena
(effectively coupled to_PyRuntime.obmalloc
) -
_PyRuntimeState.obmalloc
[issue] [PR] -
_PyRuntimeState.signals.default_handler
(move to module state) -
_PyRuntimeState.signals.ignore_handler
(move to module state) -
_PyRuntimeState.imports.lock
-
_PyRuntimeState.imports.find_and_load
-
_PyRuntimeState.dtoa
-
_PyRuntimeState.dict_state.global_version
-
_PyRuntimeState.dict_state.next_keys_version
-
_PyRuntimeState.func_state.next_version
-
_PyRuntimeState.types.next_version_tag
(special-case for global/immortal types) [issue] [PR] -
_PyRuntimeState.cached_objects.str_replace_inf
-
(maybe)_PyRuntimeState.cached_objects.interned
[PR] - Objects/object.c -
_Py_RefTotal
(sort of duplicated on runtime state)
-
- (maybe) copy state to
PyInterpreterState
(where coexisting global/interpreter settings make sense)- ...
- ...
- bugs:
- gh-102251: fix refleak in test_imp
- PEP 684
- write and publish
- initial python-dev discussion
- intermediate edits
- settle on a solution for the allocators
- decide about distinct per-interpreter GIL compatibility in extensions
- find a solution for the gilstate API (gh-59956)
- address any other non-trivial global state concerns (see gh-100227)
- final round of python-dev discussion
- final edits
- submit to steering council (issue)
- pre-implementation
- add
_PyInterpreterConfig
- add
_Py_NewInterpreterFromConfig()
- update optional restrictions
- ...
- add
- implementation - gh-99113 (blocked until PEP 684 is accepted)
- change private C-API to public
- (not trivial?) address memory allocator API guarantees about the GIL
- add granular locks to protect some of
_PyRuntimeState
(were protected by the GIL)-
_PyRuntimeState.exitfuncs
(race inPy_AtExit()
) -
_PyRuntimeState.nexitfuncs
(race inPy_AtExit()
) -
_PyRuntimeState.allocators
(PyMem_SetAllocator()
with "wrappers"; also when using mem/obj allocators) -
_PyRuntimeState.audit_hook_head
(race inPySys_AddAuditHook()
) -
_PyRuntimeState.ceval.perf
- Objects/longobject.c - (long_from_non_binary_base)
log_base_BASE
(lazy) - Objects/longobject.c - (long_from_non_binary_base)
convwidth_base
(lazy) - Objects/longobject.c - (long_from_non_binary_base)
convmultmax_base
(lazy) - (maybe) Objects/object.c -
_Py_RefTotal
(also added to interpreter state) - global objects (in
_PyRuntimeState.cached_objects
: interned dict, extensions cache) [PR]
-
- add a granular lock for importing extension modules (will protect
_PyRuntimeState.imports.extensions
and_PyRuntimeState.imports.last_module_index
) [branch] - move state from
_PyRuntimeState
toPyInterpreterState
- (maybe) allocators
-
_PyRuntimeState.tstate_current
(thread-local variable) - ...
- 💥the GIL💥 PR
- extension module restrictions gh-98627 PR
- make
PyInterpreterConfig
andPy_NewInterpreterFromConfig()
public API - add
PyInterpreterConfig.own_gil
- docs additions
- ...
(expand)
Estimates are in units of 1 day. Time waiting for reviews is not included.
A trailing *
indicates that additional work has already been done (e.g. there's a branch ready).
The current aggregate estimate is about 64 62 days (35 33 days if you leave out the extension modules).
(Estimates on porting are strictly guesses where there isn't a *.)
isolate extension modules (29):
(see [gh-103092](https://github.com/python/cpython/issues/103092))
* ✅1* port _io (builtin)
* 3* port _tracemalloc (builtin)
* 1 port _ctypes
* 1 port _curses
* 1* port _datetime
* 1 port _decimal
* 1 port _msi
* ✅1* port _pickle
* ✅1 port _socket
* 1 ~port _tkinter~
* ✅1 port msvcrt
* 1 ~port ossaudiodev~
* 1 ~port readline~
* ✅1 port winreg
* ✅1 port winsound
* ✅1 isolate _asyncio
* ✅1* isolate _collections (builtin)
* 1 isolate _curses_panel
* ✅1 isolate _elementtree
* 1 ~isolate _lsprof~
* ✅1 isolate _multibytecodec
* ✅1* isolate _ssl
* ✅1 isolate array
* 2* isolate faulthandler (builtin)
* 1 ~isolate nis~
* ✅1 isolate pyexpat
other interpreter isolation (9):
* ✅1 immortalize the singleton objects
* ✅1 immortalize the core/builtin static types
* ✅1* immortalize global objects in `_PyRuntime.cached_objects` and interned strings
* ✅2* move `_PyRuntimeState.obmalloc` to `PyInterpreterState`
* ✅2* move `_PyRuntimeState.types.next_version_tag` to `PyInterpreterState`
* ✅1* move `_PyRuntimeState.cached_objects.interned` to `PyInterpreterState` (or make atomic)
* ✅1* move `_Py_RefTotal` to `PyInterpreterState`
per-interpreter GIL (22-23):
* 3 address memory allocator API guarantees about the GIL (e.g. wrap custom with locks?)
* 2 add a lock for modifying `_PyRuntimeState.allocators`
* 1 add a lock for `_PyRuntimeState.exitfuncs` and `_PyRuntimeState.nexitfuncs`
* 1 add a lock for `_PyRuntimeState.audit_hook_head`
* 1 add a lock for `_PyRuntimeState.ceval.perf`
* 1 add a lock for lazy PyLongObject global state (`log_base_BASE`, `convwidth_base`, `convmultmax_base`)
* 1 (maybe) add a lock for the global part of `_Py_RefTotal`
* 1* ~add a lock for global objects in `_PyRuntime.cached_objects` (interned dict, extensions cache)~
* ✅3* add a lock for importing extension modules
* ✅5 move `_PyRuntimeState.tstate_current` to a thread-local variable
* 2 add `PyInterpreterConfig.own_gil`
* 2* move the GIL to `PyInterpreterState`
other PEP 684 (3):
* ✅1 change private C-API to public
* 2 (maybe) add extra extension module restrictions
(expand)
Per PEP 693, the first release candidate is scheduled for July 31st. Pretty much all of the following (minus docs) should be done by then.
- per-interpreter pending calls (PR)
- rename
PyInterpreterConfig.own_gil
(PR) - address memory allocator API guarantees about the GIL (e.g. wrap custom with locks?) (PR)
- tp_dict slot of static builtin types is NULL in 3.12, without mention in the changelog or an alternative (PR)
- add granular locks (unlikely races)
- add a lock for modifying
_PyRuntimeState.allocators
- add a lock for
_PyRuntimeState.exitfuncs
and_PyRuntimeState.nexitfuncs
- add a lock for
_PyRuntimeState.audit_hook_head
-
add a lock for_PyRuntimeState.ceval.perf
-
add a lock for lazy PyLongObject global state (log_base_BASE
,convwidth_base
,convmultmax_base
) -
(maybe) add a lock for the global part of_Py_RefTotal
- add a lock for modifying
- bugs
- crashes
- documentation
- add entry for
Py_NewInterpreterFromConfig()
andPyInterpreterConfig
- update C-API "Sub-interpreter support" section about the consequences of a per-interpreter GIL
- add entry for
Py_mod_multiple_interpreters
module def slot - add entry for
importlib.util._incompatible_extension_module_restrictions()
- update refcount-related docs
- update singletons' docs about becoming imortal
- update
ExtensionFileLoader
entry about when imports may fail - fix
Py_EndInterpreter()
(about holding GIL after) -
clarify constraints on arena allocator - add entries for PyInterpreterID API, etc.
- add entry for
maybe finish isolating extension modules (see gh-103092):
- add a test (issue)
- port _tracemalloc (builtin) (issue)
- isolate faulthandler (builtin) (issue)
- port _ctypes (PR, PR)
- port _datetime (PR)
- port _decimal (PR)
related:
(This is mostly Eddie Elizondo's project. See gh-84436.)
- PEP 683
- initial implementation (Eddie)
- initial python-dev discussion
- saturated refcounts
- final-ish edits (Eddie)
- start final round of python-dev discussion
- final impl adjustments for <2% (Eddie)
- PEP: submit to steering council (update: accepted!)
- implementation
- finish & merge (PR)
- perform and publish final performance impact analysis
- update C-API docs with clarifications about ref counts (in 3.11 too?)
- add immortal check/fix to
tp_dealloc
for relevant types (e.g. singletons,PyTypeObject
,int
,str
)
- make more objects immortal
(This is mostly Christian Heimes' project.)
(Note: this is no longer needed for the multi-core Python project, but we'll keep it here for continuity.)
- ? PEP
- working implementation
- connect with Daan (mimalloc creator)
- fix: fails to build with PGO
- fix: leaking large blocks (Daan)
- fix: integration with memory profilers, especially valgrand, asan (Daan)
(See PEP 554.)
- PEP 554
- write and publish
- initial python-dev discussion
- simplify (incl. moving "Deferred" section to some other doc)
- further python-dev discussion
- submit to steering council
- alternative: concurrent.futures
- explore
- maybe write a PEP
- extension module (low-level)
- implement interpreter management
- implement channels
- merge as
test.support._xxsubinterpreters
- ...
- rename to
_interpreters
- Python module
- ...
- add per-interpreter module state (PEPs 384 & 3121)
- add per-interpreter init/fini (PEP 489)
- fix module state deficiencies (PEPs ...)
- document how to implement support (PEP 630)
- add the PEP 630 info to the C-API docs
- determine how to help large extensions (reach out to numpy, cython)
- initial discussion with numpy
- solve extension-specific process-global resources (https://discuss.python.org/t/20668, https://discuss.python.org/t/20663)
- (probably) add some new C-API to solve pain points
- deal with
_Py_IDENTIFIER()
in external (community) extension modules - ...