Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit events from the Contents Service #954
Emit events from the Contents Service #954
Changes from all commits
88b4da1
607a052
f0ba8db
8890717
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague performance concern about emitting an event on every get and save. These are by far the most common ContentsManager actions called by clients. In my opinion, we should only emit events that have demonstrated a developer need. For my File ID manager, I do not need these events. If there are no known uses for these events, I suggest we remove these events entirely rather than attempt to anticipate future uses at the cost of performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too stringent of a condition.
Events certainly have use-cases outside of just developers. Operators/admins look to use this event system to audit user activity—in fact, this was the original demand driving the jupyter events/telemetry efforts mentioned in this Jupyter Telemetry enhancement proposal (jupyter/enhancement-proposals#41).
If this work is causing a noticeable performance degradation, we should address it rather than limit the usage of the event system for operators.
Also, keep in mind, in scenarios where there are no handlers or listeners, the
.emit
method immediately returns: https://github.com/jupyter/jupyter_events/blob/50746633a2adc7e41e2e0e1b0631db10ffb165db/jupyter_events/logger.py#L337-L342There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think we actually are in agreement; I forgot Jupyter maintains a distinction between developers and operators, and I was using the word "developer" to mean both. Yes, we should definitely be conscious of performance when building this, though I'm sure it's quite difficult to measure.
I took a look at that logic, and I'm not sure if that's sufficient. That checks if there is are any handlers/listeners attached to the current event logger, meaning if I add a listener to another schema or another event type, then a Contents Manager event still performs a deep copy, validation, and building an event capsule only to do nothing with it.
Of course, this is out of scope for the PR. I'll make an issue of this in Jupyter Events for further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main point is that Jupyter Server should reach a future state where it provides high volumes of fine-grained, structured event data from most areas across Jupyter Server, allowing operators to get detailed information about what's happening in a running server. In many deployment scenarios, it's required that operators keep a detailed log of everything happening in the application.
If performance is the concern, the answer isn't to avoid adding events; it's to improve performance. I think we can do it! 😎
Yeah, that's right. I think we should update jupyter_events to check if the specific event is being watched by any listeners.
This is a little trickier (not impossible) to do with handlers, since we don't keep a mapping of handlers to their specific events (consequence of using Python's logging libraries for the main EventLogger). All handlers listen to all events, unless someone adds a
logging.Filter
object to look for specific events. The challenge is that we can't inspect the filter objects easily. I have some ideas on how we can solve this issue in jupyter_events, but this shouldn't block the PR here.