-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the busy/idle execution state tracking for kernels. #1429
Merged
Zsailer
merged 22 commits into
jupyter-server:main
from
ojarjur:ojarjur/fix-kernel-status
Jul 18, 2024
Merged
Changes from 20 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
67255a9
Fix the activity and execution-state tracking for kernels.
ojarjur 30b45db
Improve kernel status tracking by matching idle status messages again…
ojarjur d2a952d
Merge branch 'main' of https://github.com/jupyter-server/jupyter_serv…
ojarjur 4ce5539
Test fixes
ojarjur fadf5e8
Reduce the diff against the current code
ojarjur bdb9814
Add a test for execute_state
ojarjur f37557c
Fix lint warnings
ojarjur f1eb5a6
Fix race conditions and deadlocks in the test_execution_state tests
ojarjur c870f7e
Remove sleep that caused test failures on some jobs
ojarjur 50c5f9c
Revert unexpected behavior change that affected tests on Windows
ojarjur 982cb1e
Restore accidentally deleted pydoc
ojarjur 4921682
Reduce the diff against the main branch and drop the unneeded, in-mem…
ojarjur 6257ca1
Respect status messages that explicitly report a status of "starting"
ojarjur 170cde0
Fix flakiness in thekernel test_execution_state test
ojarjur e583410
Make the kernel test_execution_state test more reliable by increasing…
ojarjur 3ecc26d
Make kernel execution state test more reliable
ojarjur bcc2f1e
Make the test/test_utils.py test pass on Windows
ojarjur 300008e
Fix a race condition in setting the initial kernel execution state
ojarjur 101ebed
Simplify the retry logic for the kernel execution state test... inste…
ojarjur 0e8bd97
Merge branch 'main' into ojarjur/fix-kernel-status
ojarjur 54ea903
Switch from having a list of tracked message types for user activity …
ojarjur 1b3ea06
Re-introduce retries in the execution status test to further reduce f…
ojarjur File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
import asyncio | ||
import datetime | ||
import json | ||
import os | ||
import platform | ||
import time | ||
import uuid | ||
import warnings | ||
|
||
import jupyter_client | ||
import pytest | ||
from flaky import flaky | ||
from tornado.httpclient import HTTPClientError | ||
from traitlets.config import Config | ||
|
||
|
||
@flaky | ||
async def test_execution_state(jp_fetch, jp_ws_fetch): | ||
r = await jp_fetch("api", "kernels", method="POST", allow_nonstandard_methods=True) | ||
kernel = json.loads(r.body.decode()) | ||
kid = kernel["id"] | ||
|
||
# Open a websocket connection. | ||
ws = await jp_ws_fetch("api", "kernels", kid, "channels") | ||
session_id = uuid.uuid1().hex | ||
message_id = uuid.uuid1().hex | ||
await ws.write_message( | ||
json.dumps( | ||
{ | ||
"channel": "shell", | ||
"header": { | ||
"date": datetime.datetime.now(tz=datetime.timezone.utc).isoformat(), | ||
"session": session_id, | ||
"msg_id": message_id, | ||
"msg_type": "execute_request", | ||
"username": "", | ||
"version": "5.2", | ||
}, | ||
"parent_header": {}, | ||
"metadata": {}, | ||
"content": { | ||
"code": "while True:\n\tpass", | ||
"silent": False, | ||
"allow_stdin": False, | ||
"stop_on_error": True, | ||
}, | ||
"buffers": [], | ||
} | ||
) | ||
) | ||
await poll_for_parent_message_status(kid, message_id, "busy", ws) | ||
es = await get_execution_state(kid, jp_fetch) | ||
assert es == "busy" | ||
|
||
message_id_2 = uuid.uuid1().hex | ||
await ws.write_message( | ||
json.dumps( | ||
{ | ||
"channel": "control", | ||
"header": { | ||
"date": datetime.datetime.now(tz=datetime.timezone.utc).isoformat(), | ||
"session": session_id, | ||
"msg_id": message_id_2, | ||
"msg_type": "debug_request", | ||
"username": "", | ||
"version": "5.2", | ||
}, | ||
"parent_header": {}, | ||
"metadata": {}, | ||
"content": { | ||
"type": "request", | ||
"command": "debugInfo", | ||
}, | ||
"buffers": [], | ||
} | ||
) | ||
) | ||
await poll_for_parent_message_status(kid, message_id_2, "idle", ws) | ||
es = await get_execution_state(kid, jp_fetch) | ||
|
||
# Verify that the overall kernel status is still "busy" even though one | ||
# "idle" response was already seen for the second execute request. | ||
assert es == "busy" | ||
|
||
await jp_fetch( | ||
"api", | ||
"kernels", | ||
kid, | ||
"interrupt", | ||
method="POST", | ||
allow_nonstandard_methods=True, | ||
) | ||
|
||
await poll_for_parent_message_status(kid, message_id, "idle", ws) | ||
es = await get_execution_state(kid, jp_fetch) | ||
assert es == "idle" | ||
ws.close() | ||
|
||
|
||
async def get_execution_state(kid, jp_fetch): | ||
r = await jp_fetch("api", "kernels", kid, method="GET") | ||
model = json.loads(r.body.decode()) | ||
execution_state = model["execution_state"] | ||
return execution_state | ||
|
||
|
||
async def poll_for_parent_message_status(kid, parent_message_id, target_status, ws): | ||
while True: | ||
resp = await ws.read_message() | ||
resp_json = json.loads(resp) | ||
print(resp_json) | ||
parent_message = resp_json.get("parent_header", {}).get("msg_id", None) | ||
if parent_message != parent_message_id: | ||
continue | ||
|
||
response_type = resp_json.get("header", {}).get("msg_type", None) | ||
if response_type != "status": | ||
continue | ||
|
||
execution_state = resp_json.get("content", {}).get("execution_state", "") | ||
if execution_state == target_status: | ||
return |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it was also discussed in depth in #1360 (comment). I am not sure if there was a resolution though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't come to a consensus on the set of message types to include, but I think we did get a pretty clear consensus that the set of message types should be configurable so that server admins can override whatever default we choose.
I don't know if we can get a consensus on the set of types to leave in the default value for the config, but if we do get a consensus that something should be added to (or removed from) this list, then please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on the call today, it might be more conservative to change this list to a
not_tracked_message_types
, as it would be less risk of accidentally forgetting that a certain message type should be counted as kernel activity. It would also mean that the risk changes from "culling a kernel that shouldn't be culled" (bad because of potential data loss) to "not culling a kernel that can be culled" (bad because of potential cost / resource use), where the former seems more disruptive.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put together a tentative list of what I think would be the default set of not-tracked message types.
I got this by going through this page, and selecting all of the message types that were either:
*_info_(request|reply)
.This is almost disjoint from the proposed list for tracked message types. The difference is that the
execute_input
message that was going to be tracked is included here in the not-tracked list because it is sent on the IOPub channel instead of the shell channel.The full list is:
comm_info_request
,comm_info_reply
,kernel_info_request
,kernel_info_reply
,shutdown_request
,shutdown_reply
,interrupt_request
,interrupt_reply
,debug_request
,debug_reply
,stream
,display_data
,update_display_data
,execute_input
,execute_result
,error
,status
,clear_output
,debug_event
,input_request
,input_reply
@krassowski @vidartf @Zsailer Can you all please help me double check this?
In particular:
Thanks, in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ojarjur. I'll give this a closer look later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list looks right to me.
Thanks, @ojarjur. I believe this is good to go.