Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Straw design document for EventLogging #2

Merged
merged 4 commits into from
Jun 4, 2019

Conversation

yuvipanda
Copy link
Collaborator

This is a strawman design document for telemetry / eventlogging
across the Jupyter ecosystem. The focus is particularly on
the notebook server, JupyterLab, JupyterHub, and related projects.

This is a PR to make it easy for folks to discuss, amend and comment
on. Please see this as a super-early first-pass. I expect a lot of
amendments and discussion from various stakeholders before we go
anywhere. I think a document like this is very useful to focus
discussion, but am happy to use other ways to move things forward
too.

There's a companion PR,
with a working prototype demo of the system described here.
TLDR is: check out this binder link,
and do some things, and look at the events.log file. More
info in the PR itself.

Lots of bits here are from conversations over time with @ellisonbg,
@jasongrout, @minrk, @Zsailer, @ian-r-rose, @betatim, @davclark
and many others I am surely forgetting. It's also unduly influenced
by my time at Wikimedia.

See also active discussion from many participants here,
here, here and here.
I'm sure I've missed many other places.

Please check out the demo, and comment here! <3

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/potential-collaboration-on-user-research/866/15

@yuvipanda
Copy link
Collaborator Author

The rendered version of this document is here. I'll try doing a readability editing pass tomorrow.

@jaipreet-s
Copy link
Member

@yuvipanda: This is a great start and thanks for writing this up. My initial comments below:

  1. JavaScript API

This is a convenience JS library made available to all code to emit
events for a specific schema. The library could then validate them
client side (for easier debugging), and send them to the REST API.
It can also do clientside batching and other performance improvements
in one go.

I believe the Javascript part needs to be more than just a convenience API. There are quite a few frameworks that send events directly from the browser (Google Analytics, Azure, AWS Amplify) so the round-trip to the server can be avoided. Also, many JupyterLabs extensions are frontend only and can function without any extra server component.

I'd propose a top level item to be able to have first class support for JupyterLab. This is analogous to the Python API in the server but for the JupyterLab frontend.

The sub-items here would be:

  1. The core event routing component to send events to the configured "sink". The default could simply be the EventLogging REST API endpoint that you talk about.
  2. An interface to expose to JupyterLab extension developers to be able to publish custom events
  3. An interface to expose to operators/admins writing their own custom sink (as JupyterLab extensions themselves) and register themselves with the core routing components
  1. User consent / information UI

Every application collecting data should have a way to make it
clear to the user what is being collected, and possibly ways
to turn it off. We could possibly let admins configure opt-in /
opt-out options.

This is covering 2 personas - the end user and the operator/admin. I'd suggest breaking this down into two.

  1. For the end user to be able to get visibility into what is being collected and where it is being sent to and the ability to turn that off via the UI.
  2. For the admin/operator to able to configure opt-in/opt-out as well as the events to be collected.

@yuvipanda
Copy link
Collaborator Author

Thank you for the feedback and perspective, @jaipreet-s. I've:

  1. Expanded the JS API section to mark that it can operate independently of the Python API
  2. Added an 'implementations' document which has sections for various components of the Jupyter ecosystem, and seeded the JupyterLab section with your comments. PRs to this PR around that are most welcome!

This is covering 2 personas - the end user and the operator/admin. I'd suggest breaking this down into two.

  1. For the end user to be able to get visibility into what is being collected and where it is being sent to and the ability to turn that off via the UI.
  2. For the admin/operator to able to configure opt-in/opt-out as well as the events to be collected.

This makes sense. I'll think about this a little more and add this to the doc.

The doc could also use an editing pass. I'll try to do that next week before the beginning of the kickoff meeting.

@jaipreet-s
Copy link
Member

@yuvipanda: Can we merge this in a draft state so that folks can start contributing to the design via PRs to this repo

@yuvipanda
Copy link
Collaborator Author

@jaipreet-s I like that!

Pinging @Zsailer or @ellisonbg who might be willing to merge it.

@Zsailer
Copy link
Member

Zsailer commented Jun 4, 2019

I don't have merge rights to this repo, unfortunately, but I agree. Let's merge and iterate. I'll work on getting merge rights.

@Zsailer
Copy link
Member

Zsailer commented Jun 4, 2019

Okay, I'm going to merge this so we can iterate. Thanks, Yuvi, for starting this discussion!

@Zsailer Zsailer merged commit 9b83f8b into jupyterlab:master Jun 4, 2019
@Zsailer
Copy link
Member

Zsailer commented Jun 4, 2019

Even though we already merged this, I'll leave comments here and update documents with a PR after discussion.


## JupyterHub

## Notebook Server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should move forward with Jupyter Server, since that's the future implementation of the notebook server. I think Jupyter Server will be ready before telemetry is ready, so it would be easier to focus our work there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing most of the events that are going to be emitted is going to be from the notebook REST APIs, kernel spawning, etc. Will that be also on Jupyter Server?


## Notebook Server

## Classic Notebook?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the notebook is on its way out, should we worry about adding telemetry to the notebook API and frontend? My vote is that we focus on Jupyter Server for backend and and JupyterLab for client-side telemetry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is that we focus on Jupyter Server for backend and and JupyterLab for client-side telemetry.

+1 - This is not saying we should completely drop support for Jupyter Classic, but we should be JupyterLab first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, in the pedagogy world, classic notebook isn't going away for many many years. It makes sense for the Jupyter community / JupyterLab devs to focus on JupyterLab first, but we should make sure we don't make active choices that preclude classic notebook use.


## Classic Notebook?

## Kernels?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Collecting kernel events is something I've heard many people want. Obviously, the number of events can get very large, so we'll need to design something extensible to handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants