Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.11 segmentation fault in grpc test; seg. fault in task.get_stack(limit=1) accessing frame.f_back #96

Closed
byllyfish opened this issue Oct 26, 2022 · 6 comments

Comments

@byllyfish
Copy link
Owner

byllyfish commented Oct 26, 2022

Using grpcio 1.50.0 and Python 3.11.0, the test test_switch1 will fail with a segmentation fault.

Current thread 0x000000011678b600 (most recent call first):
  File "/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_tasks.py", line 53 in _task_get_stack
  File "/Users/bfish/code/finsy/.venv_311/lib/python3.11/site-packages/grpc/aio/_channel.py", line 315 in _close
  File "/Users/bfish/code/finsy/.venv_311/lib/python3.11/site-packages/grpc/aio/_channel.py", line 374 in close
  File "/Users/bfish/code/finsy/finsy/p4client.py", line 308 in close
  File "/Users/bfish/code/finsy/finsy/switch.py", line 393 in _run

If I disable the test P4RuntimeServer with an environment variable, the test succeeds.

It seems like there is a memory misbehavior bug in the grpc async server implementation, but client code is tripping over it in _task_get_stack. If I run client code without running any of the server test code, there is no segmentation fault.

For now, the CI tests for Python 3.11 are running with FINSY_TEST_P4RUNTIME_SERVER=skip and FINSY_TEST_GNMI_SERVER=skip.

@byllyfish
Copy link
Owner Author

See grpc/grpc#31441

@byllyfish
Copy link
Owner Author

Here's a smaller reproducer. I think the problem is that GRPC has close code that calls task.get_stack(limit=1). This seg-faults with certain tasks, perhaps because they are created with cython?

import asyncio

import finsy as fy
from finsy.test.gnmi_server import GNMIServer


def get_stack():
    tasks = asyncio.all_tasks()
    for task in tasks:
        print(task)
        task.get_stack(limit=1)  # seg faults: _task_get_stack


async def main():
    async with GNMIServer("127.0.0.1:19559").run():
        async with fy.GNMIClient("127.0.0.1:19559") as client:
            sub = client.subscribe()
            sub.once(fy.GNMIPath("/"))
            await sub._subscribe()

            get_stack()


asyncio.run(main())

The output on my machine looks like this:

$ python -X dev demo2.py
<Task pending name='Task-1' coro=<main() running at /Users/bfish/code/finsy/demo2.py:21> cb=[_run_until_complete_cb() at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:180] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py:100>
<Task pending name='Task-2' coro=<AioServer._server_main_loop()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[AioServer._serving_task_crash_handler()] created at /Users/bfish/code/finsy/.venv311/lib/python3.11/site-packages/grpc/aio/_server.py:112>
<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py:80>
Fatal Python error: Segmentation fault

Thread 0x0000700002207000 (most recent call first):
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 975 in run
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/threading.py", line 995 in _bootstrap

Current thread 0x000000010f26c600 (most recent call first):
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_tasks.py", line 53 in _task_get_stack
  File "/Users/bfish/code/finsy/demo2.py", line 11 in get_stack
  File "/Users/bfish/code/finsy/demo2.py", line 21 in main
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py", line 80 in _run
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 1904 in _run_once
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py", line 118 in run
  File "/Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/runners.py", line 190 in run
  File "/Users/bfish/code/finsy/demo2.py", line 24 in <module>

Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
[1]    20930 segmentation fault  python -X dev demo2.py

Note that the offending task's coroutine is cython...

<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /Users/bfish/.pyenv/versions/3.11-dev/lib/python3.11/asyncio/events.py:80>
Fatal Python error: Segmentation fault

@byllyfish
Copy link
Owner Author

There is a Python 3.11 bug that may be related: python/cpython#99110.

The seg. fault happens on this line of code trying to access f_back.

https://github.com/python/cpython/blob/d75b2e2fc865af0a020a12b23383d8061e9c6e59/Lib/asyncio/base_tasks.py#L53

@byllyfish byllyfish changed the title Python 3.11 segmentation fault in grpc tests Python 3.11 segmentation fault in grpc test; seg. fault in task.get_stack(limit=1) accessing frame.f_back Nov 11, 2022
@byllyfish
Copy link
Owner Author

Still an invalid memory access on Python 3.11.1. Here's the same program with python 3.11.1 running under valgrind on ubuntu:

$ valgrind --track-origins=yes python -X dev demo.py 
==19645== Memcheck, a memory error detector
==19645== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19645== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==19645== Command: python -X dev demo.py
==19645== 
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:15> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py:100> took 0.923 seconds
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:19> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py:100> took 0.252 seconds
Executing <Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632> took 0.306 seconds
<Task pending name='Task-2' coro=<AioServer._server_main_loop()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[AioServer._serving_task_crash_handler()] created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_server.py:112>
<Task pending name='Task-4' coro=<_AioCall._handle_status_once_received()> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643>
<Task pending name='Task-6' coro=<_handle_exceptions() running at src/python/grpcio/grpc/_cython/_cygrpc/aio/server.pyx.pxi:None> wait_for=<Future finished result=None created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py:427> cb=[_add_callback_handler.<locals>.handle_callbacks()] created at /home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/events.py:80>
==19645== Invalid read of size 1
==19645==    at 0x2859F0: _PyFrame_IsIncomplete (pycore_frame.h:147)
==19645==    by 0x2859F0: PyFrame_GetBack (frameobject.c:1326)
==19645==    by 0x2859F0: frame_getback (frameobject.c:103)
==19645==    by 0x2BBFAE: _PyObject_GenericGetAttrWithDict (object.c:1278)
==19645==    by 0x2BB46A: PyObject_GetAttr (object.c:916)
==19645==    by 0x1F9B17: _PyEval_EvalFrameDefault (ceval.c:3464)
==19645==    by 0x3699E0: _PyEval_EvalFrame (pycore_ceval.h:73)
==19645==    by 0x3699E0: _PyEval_Vector (ceval.c:6435)
==19645==    by 0x25F573: _PyObject_VectorcallTstate (pycore_call.h:92)
==19645==    by 0x25F573: object_vacall (call.c:819)
==19645==    by 0x261F3B: PyObject_CallFunctionObjArgs (call.c:925)
==19645==    by 0x6624E95: _asyncio_Task_get_stack_impl (_asynciomodule.c:2304)
==19645==    by 0x6624E95: _asyncio_Task_get_stack (_asynciomodule.c.h:518)
==19645==    by 0x26D83F: method_vectorcall_FASTCALL_KEYWORDS (descrobject.c:426)
==19645==    by 0x25FD9E: _PyObject_VectorcallTstate (pycore_call.h:92)
==19645==    by 0x25FD9E: PyObject_Vectorcall (call.c:299)
==19645==    by 0x1FBA61: _PyEval_EvalFrameDefault (ceval.c:4772)
==19645==    by 0x279E04: _PyEval_EvalFrame (pycore_ceval.h:73)
==19645==    by 0x279E04: gen_send_ex2 (genobject.c:219)
==19645==    by 0x279E04: PyGen_am_send (genobject.c:280)
==19645==  Address 0xcdcdcdcdcdcdce12 is not stack'd, malloc'd or (recently) free'd
==19645== 
Fatal Python error: Segmentation fault

Thread 0x000000000aabd640 (most recent call first):
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 975 in run
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/threading.py", line 995 in _bootstrap

Current thread 0x0000000004b72b80 (most recent call first):
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_tasks.py", line 53 in _task_get_stack
  File "/home/bfish/code/finsy/demo.py", line 11 in get_stack
  File "/home/bfish/code/finsy/demo.py", line 21 in main
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/events.py", line 80 in _run
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 1911 in _run_once
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py", line 118 in run
  File "/home/bfish/.pyenv/versions/3.11.1/lib/python3.11/asyncio/runners.py", line 190 in run
  File "/home/bfish/code/finsy/demo.py", line 24 in <module>

Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
==19645== 
==19645== Process terminating with default action of signal 11 (SIGSEGV)
==19645==    at 0x49E0A7C: __pthread_kill_implementation (pthread_kill.c:44)
==19645==    by 0x49E0A7C: __pthread_kill_internal (pthread_kill.c:78)
==19645==    by 0x49E0A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==19645==    by 0x498C475: raise (raise.c:26)
==19645==    by 0x498C51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==19645==    by 0x2859EF: PyFrame_GetBack (frameobject.c:1326)
==19645==    by 0x2859EF: frame_getback (frameobject.c:103)
==19645== 
==19645== HEAP SUMMARY:
==19645==     in use at exit: 6,454,922 bytes in 7,966 blocks
==19645==   total heap usage: 19,529 allocs, 11,563 frees, 30,342,880 bytes allocated
==19645== 
==19645== LEAK SUMMARY:
==19645==    definitely lost: 0 bytes in 0 blocks
==19645==    indirectly lost: 0 bytes in 0 blocks
==19645==      possibly lost: 5,983,687 bytes in 4,258 blocks
==19645==    still reachable: 471,235 bytes in 3,708 blocks
==19645==                       of which reachable via heuristic:
==19645==                         stdstring          : 2,575 bytes in 50 blocks
==19645==                         newarray           : 360 bytes in 1 blocks
==19645==         suppressed: 0 bytes in 0 blocks
==19645== Rerun with --leak-check=full to see details of leaked memory
==19645== 
==19645== For lists of detected and suppressed errors, rerun with: -s
==19645== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

@byllyfish
Copy link
Owner Author

byllyfish commented Dec 8, 2022

There's an assert fail in the debug build of python 3.11.1, when printing information about the cython task.

Edit: The ResourceWarning is due to upb violating the defensive assertion added in python/cpython#95324

$ valgrind python -X dev demo.py 
==34680== Memcheck, a memory error detector
==34680== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==34680== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==34680== Command: python -X dev demo.py
==34680== 
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.MessageMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
gc:0: ResourceWarning: Object of type google._upb._message.ScalarMapContainer is not untracked before destruction
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:15> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py:100> took 0.798 seconds
Executing <Task pending name='Task-1' coro=<main() running at /home/bfish/code/finsy/demo.py:19> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> cb=[_run_until_complete_cb() at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:180] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py:100> took 0.319 seconds
Executing <Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632> took 0.289 seconds
<Task pending name='Task-3' coro=<StreamStreamCall._prepare_rpc() running at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:643> wait_for=<Future pending cb=[Task.task_wakeup()] created at /home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py:427> created at /home/bfish/code/finsy/.venv/lib/python3.11/site-packages/grpc/aio/_call.py:632>
python: Objects/codeobject.c:771: PyCode_Addr2Line: Assertion `addrq >= 0 && addrq < _PyCode_NBYTES(co)' failed.
Fatal Python error: Aborted

Thread 0x000000000abf6640 (most recent call first):
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 975 in run
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/threading.py", line 995 in _bootstrap

Current thread 0x0000000004b72b80 (most recent call first):
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/coroutines.py", line 104 in _format_coroutine
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_tasks.py", line 18 in _task_repr_info
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_tasks.py", line 28 in _task_repr
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/reprlib.py", line 21 in wrapper
  File "/home/bfish/code/finsy/demo.py", line 10 in get_stack
  File "/home/bfish/code/finsy/demo.py", line 21 in main
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/events.py", line 80 in _run
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 1911 in _run_once
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 607 in run_forever
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/base_events.py", line 640 in run_until_complete
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py", line 118 in run
  File "/home/bfish/.pyenv/versions/3.11.1-debug/lib/python3.11/asyncio/runners.py", line 190 in run
  File "/home/bfish/code/finsy/demo.py", line 24 in <module>

Extension modules: grpc._cython.cygrpc, google._upb._message (total: 2)
==34680== 
==34680== Process terminating with default action of signal 6 (SIGABRT)
==34680==    at 0x49E0A7C: __pthread_kill_implementation (pthread_kill.c:44)
==34680==    by 0x49E0A7C: __pthread_kill_internal (pthread_kill.c:78)
==34680==    by 0x49E0A7C: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==34680==    by 0x498C475: raise (raise.c:26)
==34680==    by 0x4E0986: faulthandler_fatal_error (faulthandler.c:385)
==34680==    by 0x498C51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==34680==    by 0x49E0A7B: __pthread_kill_implementation (pthread_kill.c:43)
==34680==    by 0x49E0A7B: __pthread_kill_internal (pthread_kill.c:78)
==34680==    by 0x49E0A7B: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==34680== 
==34680== HEAP SUMMARY:
==34680==     in use at exit: 6,558,722 bytes in 8,085 blocks
==34680==   total heap usage: 19,966 allocs, 11,881 frees, 31,248,068 bytes allocated
==34680== 
==34680== LEAK SUMMARY:
==34680==    definitely lost: 0 bytes in 0 blocks
==34680==    indirectly lost: 0 bytes in 0 blocks
==34680==      possibly lost: 6,088,930 bytes in 4,389 blocks
==34680==    still reachable: 469,792 bytes in 3,696 blocks
==34680==                       of which reachable via heuristic:
==34680==                         stdstring          : 2,575 bytes in 50 blocks
==34680==                         newarray           : 360 bytes in 1 blocks
==34680==         suppressed: 0 bytes in 0 blocks
==34680== Rerun with --leak-check=full to see details of leaked memory
==34680== 
==34680== For lists of detected and suppressed errors, rerun with: -s
==34680== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)

@byllyfish
Copy link
Owner Author

Fixed in Python 3.11.2. (python/cpython#99110.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant