Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterLab vision for the next few years #80

Open
lresende opened this issue Jul 23, 2020 · 78 comments
Open

JupyterLab vision for the next few years #80

lresende opened this issue Jul 23, 2020 · 78 comments

Comments

@lresende
Copy link
Member

We are seeing aggressive strategy from IDEs to penetrate and/or disrupt the Jupyter Notebook UI arena, in particular JupyterLab. In parallel, we have seen a lot of the cloud vendors enabling Jupyter Notebooks as a service with enhanced UX capabilities that are kept proprietary and not contributed back to the community.

I believe the JupyterLab community should step back and create a 1-3 year vision/strategy plan. This plan would provide directions on areas to focus aiming in maintaining its prominence as the de-facto UI for data science interactive workloads. The vision should also consider the personas we serve, and consider areas that we should prioritize to be more competitive when compared with IDEs providing support for interactive applications integrated with Notebooks.

Thoughts?

@goanpeca
Copy link
Member

goanpeca commented Jul 23, 2020

Thanks for opening this issue @lresende: some thoughts on things that might be important to broaden the scope and facilitate the use cases of many users.

  • Localization and internationalization. Work has been started on that front but there should be a plan with some milestones in terms of full localization, internationalization and globalization (fancy term for the merge of the previous 2)
  • A lot of users just want to download an exe, or a dmg and click install and have a JLab icon that launches a desktop app. Maybe some (enterprise) users do not really see a use case here, but a lot of smaller groups, companies, and the single user might benefit a lot from having a simple application download with the standard expectations and auto-update features
  • As @afshin pointed out in the meeting, JLab should provide enough IDE-like features so that staying in there when moving from a Notebook based XP to a more "standard" coding XP is comfortable and nice enough that users do not feel they cannot do what they want without providing all the bells and whistles a more dev oriented IDE would. Now how much is too much? and how many bells and whistles are acceptable for a given timeline?
  • Being built with TS, JLab might have distanced itself from its origins where the Data Science users could make some extensions in Python and hack away. It may be possible to provide a Python API to be able to perform "common use" or simple extensions, without having to do the whole TS lock-in 🤷 .

@echarles
Copy link
Member

Thx @lresende for opening this, It is IMHO very important.

I don't think trying to imitate an IDE is the way to go. IDE fat-client installation applies to a fraction of the potential users, VSCode is anyway leading the pack and JupyterLab has more to offer and can serve a lot more of different use cases.

I am seeing JLab as a collection of components that can be integrated and used by:

So a collection of client and server components:

  • With customisation, theming, binding with frameworks like React.js (or any other trending ones).
  • With bindings to online kernels hosted on e.g. binders... or other providers.
  • With pluggable AAA (Authentication, Authorization, Auditing)
  • With Real Time Collaboration

On top of that toolbox, JLab would ship 2 reference implementations like:

  • The current JLab which is for power-user and prolly intimidating for the others.
  • A real clone of the classic notebook with 100% conformance on key-bindings (users I meet often come back with that issue). I favor here a rewrite from scratch than a strip-down of the current JLab.

And would ship easy tutorials to embed those in 3rd party applications.

... and yes, one-click installation would be super-useful (but less important to me than the above).

...and hosted VSCode in the browser looks terribly like powering GitHub CodeSpace (See https://github.com/cdr/code-server for open-source implementation of VSCode as WEB application).

@echarles
Copy link
Member

echarles commented Jul 24, 2020

We often compare/refer to VSCode but RStudio is another tool worth looking at. They have fat and web flavors with the exact shining UI: An editor mixing code/text/latex/..., a console, a variable inspector and a bottow-right display for the rest (graph, filetree, help...).

Are we in a state to do that easily? We miss the editor mixing text/code (for now attaching a md or py file to a console is a uncomplete workaround). We don't have a variable inspector component (3rd party ones exist).

Assuming we have those components, we could add a third reference implementation (JStudio :)) aside the 2 ones I have listed (current jlab and classic clone).

This makes me think when Mozilla has broken its monolith distribution (which even included a html editor) into Firefox, Thunderbird... and regain market at that time.

Also, an hidden gem is located in the examples folder too much invisible and not marketed (we link to them sometimes when users ask questions on kernels...). Those examples look too me like the premise of what I would love JupyterLab to be: a reusable set of components to build secure, scalable and collaborative data driven applications in the cloud.

@echarles
Copy link
Member

Typing when thinking... I feel that the current way to integrate more and more external extensions into the default distribution has its limits.

@blink1073
Copy link
Contributor

I feel that the current way to integrate more and more external extensions into the default distribution has its limits.

Yes, we will need a clear strategy about what is in core once it is easier to install extensions. We should be following the data from things like the the 2015 UX Study.

@krassowski
Copy link
Member

Should we have a new UX study? Also, please have a look at the study "What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities" from Oregon State University, Microsoft and University of Tennessee-Knoxville. There are points (in interpretation and methodology) I disagree with, but still a valuable contribution.

I think about IDE-like features a lot. I see that there are different paths ahead but I will be working to make the LSP integration as reliable and feature-full as possible (we could certainly use a hand!). Then, there is the ease-of-installation which may be actually a challenge - currently, we require node, python extension, lab extension, and the servers; all have to be properly instaled (most needs to be installed from the command line!) and novices trip on virtual environment issues (e.g local/global installations, using conda without fully comprehending it, etc). Obviously modules federation will be a step forward - but it will not be a perfect solution for LSP extension because some servers still require node...

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/cripplingly-slow-ui-am-i-the-only-one/5351/7

@matthew-brett
Copy link

In case it's useful, a summary of recent StackOverflow IDE rankings. It seems that VSCode is eating quite a few lunches, but particularly Sublime Text, Atom, Eclipse and Notepad++.

@blink1073
Copy link
Contributor

@blink1073
Copy link
Contributor

The "Tools and features for Python development" section of the above survey has a lot of overlap with our 2015 UX Survey results.

@blink1073
Copy link
Contributor

Also, thanks for putting that together @matthew-brett!

@choldgraf
Copy link

a couple of thoughts (thanks @lresende for kicking off this discussion)

  • Differentiate from, don't compete with, VSCode. IMO JupyterLab is currently thrown around in the same category as VSCode. I see why people do this, but I don't think JupyterLab's aspiration should be to "beat" VSCode. Instead, I'd put a lot of thinking into why JupyterLab is a different kind of tool than VSCode / atom / sublime / etc, and why that kind of tool is better for data workflows than "a developer IDE that also does data stuff". If we cannot come up with an answer for why a data-focused interface is obviously better for data workflows, then JupyterLab will have a hard time competing. What are the unique aspects of data science that do not make sense in a traditional IDE? How can JupyterLab be the best at those things?

    • A related point to this - I don't think that VSCode should be shunned as an interface that nobody in the Jupyter community develops for. VSCode will continue growing its notebook support, and IMO it's in the interests of the Jupyter community to be in dialog with the people (whether they're MS employees or community volunteers) growing notebook experiences there. This will particularly be important in terms of standards - otherwise we'll end up in a future where there are VSCode-specific notebooks, JupyterLab-specific notebooks, Colab-specific notebooks, etc.
  • Find ways to make the UI flexibly opinionated. Right now JupyterLab gives you a very unopinionated UI but with a lot of flexibility. You get standard UI elements but not a lot of decisions are made for you - the user is expected to open a notebook / text editor / etc on their own. On the contrary, the notebook UI was extremely opinionated - you got what you got and that's it (JQuery extension hacks notwithstanding). I think people like opinionated UIs when they're focused for the right workflow. The trick is giving people control over which UI they see for a particular use-case. JupyterLab should make it easier to fire up more opinionated UIs (one of these needs to be a dedicated notebook UI) with the same ease as typing jupyter notebook. These more workflow-specific UIs should also work with JupyterLab extensions so that you can really tailor the experience for the use-case. Sure, it's possible for me to manually install the right extensions, arrange the windows in the right places, and save the "workspace" for use later. Most people will never do this. Instead people need to be able to do something like pip install jupyterlab-studiointerface and have this install a bunch of extensions and add a button to "activate" this interface within jupyterlab as well as a CLI entry point to activate it. Once activated, users should be able to switch back to other JupyterLab interfaces easily and extremely quickly.

    • As an aside: I also think of JupyterLab as a different kind of tool from the notebook UI. The Notebook UI feels like an end-product for a particular workflow, while JupyterLab feels more like a "UI sandbox" that is composable and configurable. I think the sandbox is great, but we need that "end product" option as well.
  • Improve the ease and fluidity of developing extensions. On that last point - developing extensions is how the community can participate in the JupyterLab ecosystem without being core developers. It allows the community to explore a much bigger design space and leverage the flexibility of the lab interface. However, in my experience developing extensions is cumbersome and confusing for the kinds of users that are in Jupyter. These folks are not Javascript developers. They could learn/hack together some JQuery on the old notebook interface, but developing an extension now requires an understanding of javascript frameworks and modern web workflows. There are definitely benefits to this, but it comes at the cost of making it much harder for the "casual enthusiast" user to become an extension developer. There is a lot of creativity and potential in that casual enthusiast community that I think we are missing out on! JupyterLab should prioritize making "developing on top of JupyterLab" to be something that is as easy, fun, agile, and creative as possible. In parallel, it should put effort into creating a lot of introductory-level material for how to develop in the JupyterLab ecosystem.

I think that if JupyterLab makes progress on both of the above pieces, it would make a huge difference. Incentivize (and make it easier) for people to build extensions on top of the composability of JupyterLab, then make it easier for users to fluidly use those extensions / interfaces. Develop a few "core" interface configurations for JupyterLab (e.g., "notebook UI", "RStudio-like UI", "Dashboard UI") and empower a developer community to figure out what other possibilities are out there.

I'll leave it there for now because I don't want to wall-of-text y'all ;-)

ps: I recommend folks check out some of the conversation in this discourse thread about jupyterlab / classic notebook UI.

In that thread someone brought up Blender as another tool to think of for inspiration. I found the Blender video they linked to give some interesting thoughts about UI design and how it can benefit complex workflows.

@saulshanabrook
Copy link
Member

I really appreciate this conversation starting up!

Hot Take

We should figure out how to move on top of VS Code / Theia.

VS Code / Theia

Theia is an Eclipse project that focuses on being open source and in your browser, but building on VS Code.

I believe the TypeFox folks were originally basing things off JupyterLab/Phosphor and have now moved to VS Code/created Theia. See eclipse-theia/theia#6501.

Why?

Because then we could focus more of our energy on the things that makes Jupyter special, like data science, education, pushing the boundaries of interactive computing, and collaboration. And we could leave the parts that are about making an IDE experience (debugging, window layout, extension infrastructure, performance) to the teams with the history and drive and mission to do that, like Microsoft and Eclipse.

How?

In this imaginary future, "JupyterLab" would simply be a certain distribution of Theia with certain extensions and settings included by default.

Why not?

One concern that some people have raised about moving to VS Code is related to the single stakeholder nature in which it is developed and governed.

It would be interesting to hear from the TypeFox and Eclipse folks how they have felt engaging with this and how they see that space.

Also, there isn't notebook support yet in Theia (eclipse-theia/theia#3186 ). The live share feature of VS code is also not present, since it isn't open source (eclipse-theia/theia#2842).

I am curious from other people who use JupyterLab, if there are other major issues from moving to this approach and obviously from the other maintainers how they would feel about it. Or if this has come up in the big institutional adopters (and funders) of JupyterLab, like Bloomberg and JPM. Have you talked internally about switching to VS Code/Theia instead of JupyterLab?

I am also cognizant of the fact that JupyterLab, to me, is a means to an end of exploratory computation and a community of people that I care about. As someone who helps maintain this library, I would like to find a path forward that lets us leverage our effort and time most effectively, by collaborating with other groups.

@bollwyvl
Copy link

The idea of "Lab" (vs Studio, or all the other things we tossed out when naming the project) was that we're trying to serve individual scientific computing users, teams, and the global scientific community. That means supporting the scientific method from education up to scientific publishing. The fact that a scientist must increasingly write code in between is relevant, and being able to support a scientist doing that to the extent needed is indeed among the challenges that will cause otherwise-enthusiastic users to abandon ship.

But, when I'm wearing my developer hat: my experience has been that Lab is a more malleable substrate than code. Teams can ship more customized things on it. I don't think doing static (a la jyve) would be possible with the Code baseline.

With my desktop support hat on: we've almost gotten users out of the nodejs woods on 3.x... even some of the language servers can be webpacked (during a build), despite the vscode upstream having wontfixed the dependency issues around it. The walled garden of extensions isn't monetized yet, but there's no telling, and downstreams like theia and vscodium can't actually support all of the extensions, even though they share an upstream. And who even knows if vscode works in Firefox?

Some areas I am less negative about:

  • computational publishing
    • we're slowly making progress on generating publication-grade PDF
    • publishing a workspace as a statically-hostable, yet still interactive site
      • jyve, stdjs, etc. with pyodide, etc. is going to be game-changing for a simplicity of deployment, especially in education
      • having a "dev mode" browser kernel for lab itself while it's running "normally" will make it far easier to understand how to do extension development
      • heck, wasm kernels on the backend are going to eventually make a lot of sense: no k8s
  • get better at bringing new interactive computing specifications, formally, into the Jupyter family
    • DAP was needfully heavy, but the kernel-based approach is probably going to work out great for LSP, and not require anything more than reserving a comm name
  • create more data-driven ways to build 80%-100% extensions, themes, syntax, commands, etc
    • doing this over JSON schema means basically all kernel languages can play
    • leave a few outs for handling that one bit of actual ts(x) you need, but leverage the insights we get back from the tooling
  • absolutely first-rate integration with hub
    • a major coup would be getting a Lumino frontend into Hub, Binder, etc. and starting to share more code, theming, etc.
    • we'd have to get the initial load time way down
  • outreach to open source projects that enable science, business, and education teams
    • jitsi, drawio, gitlab, zulip are all offerings that respect privacy

@Carreau
Copy link

Carreau commented Jul 26, 2020

  • Find ways to make the UI flexibly opinionated.

A real clone of the classic notebook with 100% conformance on key-bindings (users I meet often come back with that issue).

Yes absolutely yes for both of those except for the rewrite part). VS Code is itself a really good example, it is first and foremost a text editor, and is it way easier to recommend as it does just that already super well, and on top of that it can blossom into a full feature IDE once you are ready. And @matthew-brett comment shows it well:

VSCode is eating quite a few lunches, but particularly Sublime Text, Atom, Eclipse and Notepad++.

Eclipse is for sure an IDE, but Sublime, Notepad++ and (to a lesser extent) Atom, are way closer to good text editor.

I would try to get a version of JLAB which is focused on one document per browser tab with one of those view much closer to classic notebook, which eventually can let user grow to full lab instead of a rewrite.

I think also that clear repacking/branding of JupyterLab with a set of simple preconfiguration could go a long way to form smaller dedicated community. VSCode is really attractive to developer, but we have a bunch al data scientist in many domains.

It would be good to pick a few extensions and default configuration and specific css theme that target a number of use case, say for example

  • Astrophysic,
  • ML
  • Genomics,
  • Earth Science
  • Education
  • ...

With their googlable distinct names.

@bollwyvl
Copy link

@Carreau Right, going beyond just an installer, here are examples of cross-platform, one-click, non-root installers centered around a Lab configured with different goals, with the full required compute stack underneath:

  • robotlab: a Lab focused on writing and executing acceptance tests
    • a lot of space taken up by OpenCV, Selenium, Firefox, and some super gnarly windows stuff
  • GTCOARLab (wip): a Lab focused on quantifying the underlying mathematics of reinforcement learning models
    • tensorflow torch are heavy
    • also is going to have a full latex stack

As a first go-round, officially offering one of these kitted out for training lab developers would probably make a lot of sense: typically takes at most 20 minutes once you have the install media on the users box to get up and running.

@choldgraf
Copy link

There is also flybrainlab which uses a customized JupyterLab w/ lots of extensions etc designed for computational / systems neuroscience.

@matthew-brett
Copy link

When I'm teaching my students, I tell them they will soon need to follow the advice in "The Pragmatic Programmer"

The editor should be an extension of your hand; make sure your editor is configurable, extensible, and programmable.

https://bic-berkeley.github.io/psych-214-fall-2016/choosing_editor.html

I feel I will have served them poorly, if they take a long time to move out of the Jupyter Notebook interface.

The question I have is - is JupyterLab designed to be that editor, that is "an extension of your hand"? Is the intention that many people will use JupyterLab as their primary editor for text as well as notebooks? If so, how many people use it that way? If not, then when what role should it play?

@goanpeca
Copy link
Member

I think is important to make a distinction between the vision for developers and for end-users. Probably the user-oriented vision should take the front seat in this discussion.

@matthew-brett
Copy link

matthew-brett commented Jul 27, 2020

@goanpeca - my argument would be that it would be a mistake to concentrate on the "user" at the expense of the "developer" that the user should become, as they continue to learn. If only because, I believe it would be a dangerous place to position JupyterLab, in between the absolute beginner (who is relatively well served by the traditional notebook interface) and someone who wants to become proficient in using scientific code (who should and will start to learn VSCode, or PyCharm or Atom or Vim). The obvious risk is that students go straight from traditional notebook to VSCode or similar, and miss out JupyterLab in the middle.

@matthew-brett
Copy link

matthew-brett commented Jul 27, 2020

As a dreadful warning as to what can happen to users who get stuck in the traditional notebook interface - see the "Study 3: interviews with data analysts" section in https://dl.acm.org/doi/abs/10.1145/3173574.3173606 .

@choldgraf
Copy link

choldgraf commented Jul 27, 2020

I think it all comes back to the question: "Is it possible for a highly-flexible and extensible development environment to also be a first-class data science environment? If the answer is fundamentally "yes" then it will be hard to beat out VSCode or whatever projects w/ equal amounts of resources replace it, and we should be re-focus efforts from building an entirely separate parallel web framework to instead trying to make sure the web IDE that wins is the one that has a good open governance and community structure.

If the answer is "no, data science is a different kind of thing from development and warrants its own interface" then I think that's where JupyterLab should position itself. If this is the answer, then I'd urge the JupyterLab community to think about how to encourage and facilitate a flexible transition back-and-forth between JupyterLab and <IDE of user's choice>, with the assumption that we want people to use the best tool for the right job, and JupyterLab will never be the best tool for development. Then, focus the JupyterLab experience in a way that really highlights the "data stuff" and makes it clear why it's the best choice for that workflow, while <IDE of your choice> is the best choice for doing development.

@matthew-brett
Copy link

@choldgraf - nice summary. So, to help make that as specific as possible - what aspects of data science workflow can one not easily cover with:

  • Current VSCode?
  • Atom + Hydrogen + Jupytext?

And - once these are clear - how difficult would it be to extend these systems to cover the missing cases?

@smackesey
Copy link

smackesey commented Jul 27, 2020

I see that this is a team repo so I'm not 100% sure I should be posting, but @saulshanabrook on the Discourse forum suggested that I contribute to this discussion so here's my two cents. The below take is certainly the product of my own biases and not based on a study of JupyterLab's existing userbase. But I haven't seen the general idea proposed, so I'd like to throw it into the arena. For background, I work with JupyterLab mainly in the context of computational neuroscience. I'd also like to preface this by thanking you guys for all your hard work on the platform to date.

I would like to see JupyterLab become the hacker's data science environment. I believe that this niche is (a) open; (b) unlikely to be contested by any of JLab's current competitors; (c) achievable. On top of that, it would be awesome and potentially revolutionize data science work.

In the world of software, there are two broad categories of product:

Hacker-focused. These products are typically extremely customizable, powerful, and performant. But the interface is often complex, lacking in bells and whistles, and somewhat intimidating. Examples include Vim/Emacs text editors, Linux, the mutt email client, Ranger file browser, and many other command line programs. They are almost always open-source.

Average-consumer/enterprise-focused. These products are "friendlier" (at first glance, anyway). They are often "What-You-See-Is-What-You-Get". They have shiny GUIs, animations, and lots of nested menus. But they are often irritatingly inflexible, suffer from limiting and inefficient interfaces, and have maddeningly poor performance due to unnecessary visual effects (compare text-based Ranger to MacOS Finder...). Examples include Microsoft Office Suite, Windows/MacOS, Gmail, etc. The vast majority of the good ones are developed by business-- open-source ones usually have all the flaws listed above but worse, and are uglier to boot.

In the rapidly developing world of general-purpose data science IDEs, there are not yet any clear winners or losers in either space. Indeed, the spaces have not yet been differentiated. Jupyter Notebook (and by proxy JupyterLab) gained significant market share as one of the first apps in this world, but now its starting to face real competition. And there is every reason to expect that this particular software category will follow the same trend as others-- the "user-friendly", "for-the-masses" throne will be claimed by something corporate-backed, like VSCode. JupyterLab won't be able to compete in this space. But, we can also expect that Micosoft won't pursue the "hacker" niche, for the same reasons that corporations rarely pursue this niche in other domains.

So there is a throne waiting to be claimed for the Hacker-focused VIM/EMACS of data science environments. And JLab is well-positioned to pursue this throne. It already has a successful brand, large user base, working product, and contributor community. But, I think there would need to be some major changes for JLab to go this route.

The two most important things are to embrace customizability and focus on core performance. Right now JLab has issues in both these areas. The performance is often weak (as discussed in this Discourse thread, and the customizability and docs regarding extensions are pretty weak.

Finally, here are some specific ideas for steps in the Hacker-focused direction:

  • Support embedding of external editors. NeoVim is designed to be embedded in third-party GUIs. I'm a Vim addict/fanboy so I'm biased, but it would be a killer feature to use a real underlying NeoVim instance as an editor. CodeMirror is just not very good for heavy lifting. This would also neutralize one of VSCode's major advantages (people can use a power editor they already use for other things for notebooks).
  • Decouple notebooks from files. Notebooks should be like buffers in a text editor-- often paired with a file, but not necessarily. This allows quick temporary notebooks, either user or software-generated. Imagine having an interface that allowed you to browse a database and dynamically generate a temporary notebook, all within JupyterLab.
  • Offer at least the option of an initial experience closer to bare metal. When I open JupyterLab now, it looks like a typical GUI program-- tons of menus. It restarts my last session by default, which is annoying since it often means starting up a ton of kernels. I would prefer to launch JupyterLab to a single, empty buffer notebook and a more spartan intermediate interface (of course, you should be able to build on top of the initial "sandbox" to create rich customized environments, as @choldgraf and others suggested above).
  • Support alternative notebook renderers and further rich customization of how cells are displayed. The default display of a linear vertically-oriented series of input/output is OK for most tasks but not always ideal. Sometimes it would be nice to have input/output cells side-by-side.

I could list more but I'll stop there. What all of the above have in common is that they increase the flexibility of JupyterLab. I believe that there are amazing exploratory data analysis use cases and interface motifs waiting to be discovered. They just have to be enabled by a suitable platform. JLab is both (a) relatively well-positioned to become that platform; (b) unlikely to face much competition in this niche from VSCode et al.

@ellisonbg
Copy link
Contributor

Lots to process here. Will comments more on other aspects, but will start with this:

I believe that VSCode is incompatible with the open source vision of Jupyter, and not a suitable foundation for it. Jupyter has always been community driven and multistakeholder. VSCode is controlled by a single corporate entity. This is manifested in the following ways: 1) The VSCode Marketplace doesn't allow third party applications to use it, which is why Theia and code-server have built and maintained their own extension services, 2) key parts of the code base remain proprietary (real time collaboration, web based versions such as CodeSpaces).

More importantly, the roadmap for JupyterLab (and any other part of Jupyter) should be focused on building things for actual Jupyter users (lab, classic, huh, ipython, etc.). In this org, I believe we should take time to understand what JupyterLab users are doing, what their pain points are, what their needs are, etc. - and use that to drive the roadmap. That was how I read the initial post of @lresende and I believe that is what we should focus on in this thread. It is this focus on our users that led us to build JupyterLab in the first place, and is driving much of the work for 3.0, including the improved extension system, the debugger, and the classic notebook mode.

@ellisonbg
Copy link
Contributor

To practice what I preach, here are the main user focused things that I view as being roadmap worthy:

  • A mode that provides the effective experience of the classic notebook (doesn't have to be an exact copy, but should be close enough).
  • More seamless extension installation, search, discovery, management.
  • Real-time collaboration across all document types, with built in commenting and annotation.
  • Improved auto-completion (likely through LSP integration of some sort).
  • Improved large notebook rendering performance.
  • Accessibility.
  • Internationalization.
  • Move the command palette to being a modal interface rather than the L panel.
  • Supercharge the data grid.
  • Enable in-application development of extensions.

@ellisonbg
Copy link
Contributor

One way that I have started to prioritize issues in a user-centered manner is to sort them by reaction (comment or emoji):

https://github.com/jupyterlab/jupyterlab/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions

Users would benefit massively if we started at the top and went down the list. The same sorting on the classic notebook also gives a similarly useful signal:

https://github.com/jupyter/notebook/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions

This isn't perfect, but is a great start.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/building-uis-on-top-of-theia/5410/1

@williamstein
Copy link

williamstein commented Aug 10, 2020

Here's a data point related to your questions from one "project-Jupyter consumer" (me):

is there a need to set up a rock-solid single point of reference for Jupyter protocols that are managed by Jupyter governance process and that are clearly the single point of reference for them?

Less than two years ago I did a re-implementation of the entire Jupyter stack (both the frontend and the backend server -- everything but the kernels). I spent a massive amount of time staring at https://jupyter-client.readthedocs.io/en/stable/messaging.html

That page, along with playing around with specific kernels, was mostly sufficient to answer all of my questions, and when it wasn't I asked @jasongrout who I think then improved that page to answer my questions. The same thing happened with nbformat, where in at least one case things were ambiguous so we submitted PR's for multiple options, and one of them got chosen by an official Jupyter dev, and then we implemented that one.

There's also a list of commands in Jupyter classic that you get by hitting "h":

(etc.)

For that, I had to just guess/reverse engineer what they all actually did and implement them. Sometimes there are subtle semantics to these commands, which can't be conveyed in 3-4 words. There are also some Jupyter clients that maybe implement only a fairly small subset of these commands. It would be really useful if there was a document similar to the ones linked to above that listed more details about all the commands, the official keyboard shortcuts, and had more space to resolve any ambiguity in their definitions. Maybe someday somebody could even certify a Jupyter client as "implements X% of the commands with the same shortcuts" or something, and it would also be a useful checklist for developers like me. (I just realized that all commands have keyboard shortcuts now so made this issue.)

There are other really subtle questions about semantics, e.g., what to do with certain ANSI escape codes in output -- when we hit these in CoCalc we look at what Jupyter classic and JupyterLab do, and sometimes find they make opposite choices. It would be nice to just have "this is the official choice". It often doesn't matter to me at all which choice is official, just that there is one. My goal is making life simple and predictable for users; I rarely care what color the bikeshed is.

Edit: Above, I was actually thinking of the full command list (not just the ones with keyboard shortcuts):

image

@krassowski
Copy link
Member

@psychemedia forgive my naive questions, but I would like to understand your point better. Could you please clarify these fragment:

things like linking and brushing to allow different tools to work in concert against some shared state. Whilst it may be satisfying to develop JupyterLab components in a JupyterLab IDE, is this strictly necessary?

  • what is linking and brushing?
  • what has it to do with JupyterLab components in a JupyterLab IDE?

Many commentators have complained that Jupyter notebook is not effective as an IDE eg for developing packages that run usefully in Jupyter notebooks (eg something like pandas), and they may be right. So why should JupyterLab have to work as an IDE for developers who are writing code packages to create extensions, as well as providing an environment for users who want to combine and use tool compositions made up from those extensions at the user level?

I do not understand this argument. I do not understand what it has to do with the UI being composable. I do not understand the assumption that the goal is to enable writing packages like pandas within JupyterLab. It might be that this paragraph is just grammatically complex and uses multiple terms which can be viewed as synonymous or not... but I will try to address some of your thoughts, assuming that you meant different things above:

Assuming that you suggest that JupyterLab should not aim for a use case of developing Python packages like pandas, or JupyterLab extension from JupyterLab and therefore does not need to be an IDE, I just wanted to point out the following:

IDE-like features (linting/diagnostics, debugging, refactoring) are as important for data analysis as for tool development

I think that JupyterLab does not need to go so far to replace the best state-of-the-art IDE (say PyCharm for Python) and even if it will, it will not do that in order to cater to package (e.g. pandas) developers (as you seem to suggest). Instead, in my view, it is that there is a sufficient overlap between the features desired from:

  • a state-of-the-art IDE which would cater to the tool developers
  • the target audience of the JupyterLab, i.e. a subgroup of computational biologists, physicists and data scientists, who in order to maintain high quality of the codebase or make a transition from exploration (EDA) to production request the tools such as provided by state-of-the-art IDEs

to make it worth it to have an ecosystem (JupyterLab) where such features are supported. If we manage to perfect them to the point that the package developers will be happy to migrate - that's great, but it's not the goal in itself. My goal is to make refactoring notebooks a pleasure so that I and others can move faster with analyses; my goal is also to have state-of the-art diagnostics (like mypy) in the editor where I explore data to guard me from making silly mistakes or wasting time to correct typos. There is evidence that users do want such features.

Further, assuming that by composable UI you referred to the panel-oriented, customizable drag-and-drop interface:

The modular or panel-based interface as provided by JupyterLab, Theia, RStudio and others is desired by the target group

In my understanding the user interface benefit of JupyterLab over classic notebooks comes from the ability to have easy-to-arrange panels with, for example: Table of Contents, Variable Explorer, Spreadsheet Viewer, Plot Editor, File Editor, and two notebooks side-by-side. This is a well-established use case, and a capability desired by many data analysts; you can find it in Spyder or RStudio which are state-of-the-art tools for the user-base I consider to be the target of this mode of JupyterLab. Even PyCharm offers it as a so-called "scientific mode" (although more akin to RStudio an not as convenient/flexible).

However, assuming that by composable UI you mean that different parties can compose their own "look and feel", with their pre-defined layout and use-case specific panels bundled (e.g. having company documentation and TOC on right sidebar, while Variable Explorer on the left sidebar - because why not):

Supporting use case of third-parties building their own bundle is a good thing

as this appears to be more of a reaction to different companies wanting to do so (and sometimes doing so in "not-so-great" ways), and an attempt to consolidate the community efforts by explicitly supporting such a use-case. @psychemedia you probably did not mean that because it is not specific to catering to tool developers audience...

@krassowski
Copy link
Member

Onto my point on the public relations, the screenshot by @williamstein highlights that the Jupyter website is still at JupyterLab 1.0. Is this deliberate? Where does this website live (like does it have a repository? is there a way to send a suggestion/feedback on it)?

@choldgraf
Copy link

choldgraf commented Aug 11, 2020

@krassowski re: the website, it exists here: https://github.com/jupyter/jupyter.github.io PRs are always welcome, but there is no person / team tasked with keeping it updated, and (to my knowledge) no official rules that are followed for how to get things merged into it.

@psychemedia
Copy link

@krassowski My (personal, possible very quirky) take on JupyterLab composability is:

  • the ability for the user to compose/configure their own layouts;
  • the ability for different components to compose a view from several panels over the same data model.

The "linking and brushing" riff comes from data visualisation where you may have several distinct visualisations providing multiple views over a dataset at the same time. Linking refers to the way a which the same data element(s) is/are represented each plot (eg you highlight a data point in one plot or view and it's also highlighted in the others). Brushing refers to the selection of subset of points in one view and then also highlighting or limiting the other views to the same subset of data. (These can apply across tables as well as graphical visualisations. For example, highlight points in a scatter plot and a linked table view just shows those data rows.)

As I understand it, JupyterLab supports the ability to create composable representations where different panels link to the same data objects or files so you can:

  • have multiple views of the same data object;
  • edit or manipulate the object in one view and that state change is also rendered via the other views.

This just reminded me linking and brushing style interactions.

(In passing, I feel slightly uncomfortable intruding on this thread because I do not class myself a JupyterLab user. But I do use a lot of Jupyter tools and services and do care about how it might evolve over the next few years. A lot of community discussion is focussed on JupyterLab so if it becomes a dominant driver of Jupyter thinking and becomes the default official UI of the Jupyter project I feel I need to have a sense of that. A lot of cloud services run their own customised Jupyter UIs, and may start to use this as a point of differentiation. Third party tooling, including IDEs, also hooks into Jupyter protocols on the back. The open protocols and the architecture and open governance model are what makes Jupyter interesting and valuable to me, along with the ability for folk to create their own UIs on top of them, not necessarily the "official" UIs. Where JupyterLab is useful for me is as a demonstrator of what is and might be possible elsewhere, and as a driver/testbench/reference model for the protocols, showing how they might be used and providing hints as to where various core developers see them going.)

@ghost
Copy link

ghost commented Aug 12, 2020

Hey guys. I am going to take an inventory of the ideas in this thread and highlight the themes. From there we can align to market problems + personas + do a comparison across competitive solutions. A bit of pragmatic marketing exercise for roadmapping; my day job.

2015 Survey Data

In the meantime, I spent last night with the 2015 survey data. It was challenging to bin because it was 95% free form text data. It means I can't do things like sklearn feature importance of 'feature category' by 'job role'. For many reasons, I really think we should do another survey. Happy to write it. Here is a gist.

Segmenting Users

Insight:

  • Many of the self-identified "researchers" were not in scientific industries, but rather generic IT/tech.
  • Why so many IDE-related feature requests? Is it the presence of tech/IT firms and early adopting software developers? I expected much more feedback related to actual data science and machine learning. Was the survey sent to the wrong audience or has the tool been adopted by more software engineers than anticipated?
  • JupyterLab fixed many things over past few yrs: kernel stability, hotkey customization, matplotlib, extensibility so people could fix table of contents and code snippets themselves.

image

image

image

image

What prevents Jupyter from your workflow?

=== Dominant themes ===

  • Versioning (git, diff, built-in versioning, code review)
  • IDE-like capabilities (autocomplete, snippets, refactor/ replace, better .txt editing, pycharm, rstudio, sublime, emacs)
  • Hotkeys (vi, cells, text lines)
  • Performance (unstable kernel, Windows)

=== Macro themes ===

  • Navigating long notebooks (toc, complex, slow)
  • Multi-user collaboration (realtime, sharing, groups, projects)
  • Multi-notebook workflows
  • Managing variables (persist, explore)
  • Managing cells (multi-cell selection, rearrange, cross-notebook, easier collapse)
  • Python scripting
  • GUI executable (installer, launcher, desktop app)
  • Prefer ipython <-- maybe an ipython mode in terminal
  • R

=== Micro themes ===

  • More kernels
  • Large datasets
  • Data sources
  • Search (all notebooks)
  • Undo
  • "Convert a notebook to module"
  • Debugging
  • Layout of cells
  • Batch execution
  • Markdown (prefer not json)
  • Export (report, slides)
  • Anti-conda
  • MATLAB
  • matplotlib buggy

@ghost
Copy link

ghost commented Aug 12, 2020

JupyterCon 2020 Survey 📊
During today's JupyterLab weekly meeting, we discussed drafting a new survey that we can align to personas and from which we can derive differentiation. Brian suggested making this survey tied to JupyterCon. We'll draft it this week, review it on next week's JupyterLab call, and then involve other projects.

@ghost
Copy link

ghost commented Aug 13, 2020

The underlying problem, the reason why people are asking themselves if they are competing with an IDE, is that JupyterLab has nothing to do with the practice of data science itself. It is a fantastic UI for interactive scripting and visualization that happens to be used by data scientists. Let's change that.

[Disclaimer: These are hypotheses from my personal use that need to be validated with data.]

  • Don’t compete with VSCode

    • @choldgraf “positioned as data-science-focused, not software-development-focused”
    • Differentiate from generic software development IDEs by identifying and solving the problems of scientific computing personas.
  • Compete with MLflow (by Databricks aka Spark) and SageMaker Experiments = data science experiment tracking.

    • JupyterML
    • @bollwyvl “enable the scientific method”
    • @ellisonbg “understand what JupyterLab users are doing and what their pain points are”
    • I find myself screenshotting my hyperparameters, model topology, accuracy/ loss graphs, confusion matrices, and feature importance. Even before training takes place there is a lot to keep track of: feature selection, dataset splitting/ cross-validation, preprocessing.
    • However, this is about more than keeping track of keras/tf or fastai/pytorch. What is needed is a general-purpose experiment tracker. For example, pharma does not trust ML in genomics, but there are still PB of data being crunched with computationally heavy statistical algorithms every day.
    • UI pluggable data source connections and cluster connections for big data (S3, Postgres, Spark, Enterprise Gateway related): conn = jupyterlab.datasource(name="s3-parquet")
    • The next steps here are to rank order the problems data scientists face that Jupyter can solve.
  • Compete with RStudio-Shiny by embracing Plotly-Dash or doubling down on Voila = data-centric apps.

    • @choldgraf “opinionated UI”
    • @ellisonbg “enable in-app development of extensions”
    • UI modes force/ encourage the creation of content. Sure, it’s possible to create a Voila notebook or jupyter-plotly, but it doesn’t feel like that is officially supported or what I am supposed to do. So if we want people to build these things, make a dashboard mode and extension mode.
    • In my experience, notebooks are still too deep for 92% of people in an organization. They feel overwhelmed, they don’t read, and the value is lost on them. They need dashboards.
    • There is no publish to Gist/ Binder option in JupyterLab UI.
    • Plotly is positioned to disrupt SAP. They have already created a Python based competitor to Shiny and integrated it into your ecosystem.
  • Compete with arXiv, Nature, Medium, LaTeX = a scientific community.

    • @bollwyvl “publish computational papers… a standard of a global scientific community.”
    • The internet was invented to connect researchers.
    • After finishing the first draft of a paper I wrote for arXiv and spending years in the Jupyter UI, I recently started participating in the Jupyter community to validate an idea I call Galileo. It would be a webapp for sharing and discovering, but not running, notebooks (eventually expanding to other assets like models, extensions, etc.). I’ve learned a lot so far, especially about Binder.
    • Harness the content creation like Web 2.0 and accelerate science. Millions of notebooks are being created every day.
  • Determine minimum viable featureset to compete with Colab, Zepl, Observable.

    • Are data scientists really editing the same notebook at the same time? I don't believe it.
    • Would commenting satisfy the feedback interaction? What percent of editors only leave comments?
    • Would checking-out/ locking/ forking a temp version of a notebook for editing in shared JupyterHub storage satisfy this need equally as well as asynchronous editing? Gross and very Sharepoint-esque, but would probably work.

@williamstein
Copy link

@LayneSadler :

Determine minimum viable featureset to compete with Colab, Zepl, Observable.
Are data scientists really editing the same notebook at the same time? I don't believe it.

Two minor comments:

  • Colab does not have realtime collaboration. People think it does due to the name and that it once did.
  • There are definitely people collaboratively editing Jupyter notebooks on CoCalc, and when it doesn't work perfectly they get very annoyed...

@ghost
Copy link

ghost commented Aug 15, 2020

Here are emerging survey goals, categories, and questions for review with Jupyter team. Open commenter access:
https://docs.google.com/document/d/1M-Qod4nByssdZJMlQ1HGr4LkjuZS0MjriD4C93kZg9w/edit?usp=sharing

Based on pain points identified during survey, map problems of personas onto IDE vs data science space:
image
https://docs.google.com/drawings/d/1iGhV4fI8ITfk4KjphUhg1yE0ncvVka5-_z6h9s0vn3I/edit?usp=sharing

After that, weighted ranking exercise to determine which problems to base the roadmap on:
image
https://docs.google.com/drawings/d/168RUCN_cZvDQLQB35z5KCup04QO5mZUb7CMe-KTthxM/edit?usp=sharing

@jasongrout
Copy link
Contributor

A few thoughts:

  1. I'm following this thread, and I love hearing different viewpoints. Right now, I'm heads-down in JupyterCon planning and pushing forward on the existing roadmap for JupyterLab 3.0 (in particular, the extension system refactor). Unfortunately, I probably don't have the time/energy to engage here in the thoughtful way the topic deserves until after JupyterLab 3.0 and after (or during?) JupyterCon in October.

  2. Some of the key features of JupyterLab 3.0 address some of the issues brought up here (single document mode, extension system refactor, etc.), and this thread and other conversations with community members make it clear to me that we need to work on a JupyterLab 3.0 blog post and/or other public messaging that helps people understand much more clearly what we already are shipping as of JLab 3.0.

@fcollonval
Copy link
Member

We are seeing aggressive strategy from IDEs to penetrate and/or disrupt the Jupyter Notebook UI arena, in particular JupyterLab. In parallel, we have seen a lot of the cloud vendors enabling Jupyter Notebooks as a service with enhanced UX capabilities that are kept proprietary and not contributed back to the community.

The latest VSCode release confirms the initial statement:

In the second article, the conclusion confirms the all-in vscode trajectory:

What’s in store?
We will be working diligently to roll over the current fan favorite features such as data viewer and variable explorer as well as newly released functionality such as run-by-line and Gather to ensure feature parity with the existing Python notebooks experience!

@choldgraf
Copy link

choldgraf commented Aug 18, 2020

I want to note something about the kinds of things we've discussed here, and suggest another direction for people to think about in this issue.

I believe that "vision" thinking is broader, more abstract, and more future-facing than much of the conversation we've had here. (This includes my own comments above.)

Many of the comments here feel more like "tactical prioritization of things to work on next" (e.g. anything that fits in the category of a "user-requested feature"), which is an important and useful thing to do, but is not "vision". I think we've touched on these "big picture" things in a few posts above, but I'd love to hear more from people answering questions like:

  • In 3 years, what is JupyterLab's place in the data science community?
  • How will the world be different because of JupyterLab's existence?
  • What communities will does JupyterLab want to have the most impact on?
  • How will those communities be using JupyterLab?
  • How will it compare to / interact with / compete with other major players in this ecosystem?
  • What are its biggest risks? Who are its major competitors?
  • How will its community have grown and sustained itself?
  • What will its role be relative to the broader Jupyter community?
  • What are potential failure modes we want to avoid?

Or put another way, I want to know what the JupyterLab community's 3-year goals and strategy are. Conversations about specific features, tactics for prioritizing, etc should then be able to answer the question "how does this fit in with the strategy?".

As an example, IMO "improve debugging in JupyterLab" is a tactic, not a vision. The vision might be "in 3 years JupyterLab is a top-5 IDE for development" or "JupyterLab provides just enough developer tools to be useful to data scientists". We should make sure that there's clarity on the strategic goals to make sure that time is being most well-spent on the specific things we work toward.

I think there is value in hearing what folks think in this direction and I'd personally be really curious what people have in mind. (though as I mentioned before, I think the ideas in here so far have been really helpful for me to understand where people are coming from)

@isabela-pf
Copy link
Contributor

isabela-pf commented Oct 26, 2020

Since this was the discussion that prompted work on a community survey, I wanted to ping everyone here about the work happening on #104. Please come review the survey and give your thoughts before the issue closes next week!

@isabela-pf
Copy link
Contributor

In case you missed it, this discussion did lead to a survey. Participate in the Jupyter survey and please spread the word. Thanks in advance for your feedback!

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-annual-survey-results/7388/4

@layne-sadler
Copy link

Here is a notebook that summarizes and plots the aggregate survey results from 1,135 participants (takes a while to render):

https://layne-bucket.s3.amazonaws.com/all_responses.html

Planning on attending this week’s JupyterLab call and was hoping to:

  • Talk through the summary: insights, pain points, use cases, competition.
  • Who can provide git permissions so I can push csv files and notebooks to repo?
  • Talking about ways we can segment users to get even more insight; I can run a queries for the team.

@amit1rrr
Copy link

@layne-sadler @isabela-pf Thanks for sharing the survey results in easy digestible form. Just wanted to report a couple of small issues in the report (not sure if this is the right place but anyway) -

  • 7a1, 7b1 & 7c1 graphs are the same. Maybe because of this typo below at couple of places.

Screenshot 2021-02-17 at 12 21 25 PM

  • (not a biggie but) answers to 20f are ordered differently than all the other answers

Thank you everyone involved in creating this survey. The quality, range & depth of the questions (and in some places even the answers) is fantastic. Lot of insights to take away for the Jupyter community. Kudos!

@layne-sadler
Copy link

layne-sadler commented Feb 18, 2021

@amit1rrr thanks so much for catching that. i've addressed both issues, re-uploaded, and changed the summary bullets; this puts data wrangling in the number 2 spot behind viz where it makes more sense. the same link works.

@brianking
Copy link

We should figure out how to move on top of VS Code / Theia.

@saulshanabrook I'm just checking in and curious if further consideration has been given to this?

@saulshanabrook
Copy link
Member

@brianking Nope, nothing to report and am not actively working on it, but would be interesting!

@brianking
Copy link

Ok if there is anything I can do to help with that conversation, let me know.

@goyalyashpal
Copy link

we have seen a lot of the cloud vendors enabling Jupyter Notebooks as a service with enhanced UX capabilities that are kept proprietary and not contributed back to the community.

@fperez@fosstodon.org at twitter:
Would be wonderful to see @GoogleColab team contribute these improvements back to its open foundations, which are free for all to use within our outside of Google's infrastructure.

It's called @ProjectJupyter, you should give it a try! ;)

doesnt license protect against that?

@williamstein
Copy link

doesn't license protect against that?

No. First, there is little relationship between the actual source code of Google Colab and the code produced by the Jupyter project. They are almost completely different codebases, with some exceptions (e.g. ipywidgets). Second, the code in the project Jupyter ecosystem is licensed under BSD 3-clause (see https://jupyter.org/governance/projectlicense.html), which allows people to take it or any subset of it, and include it in a closed source project like Colab, as long as things are properly attributed, and there is no requirement to give back.

For the most part, what Colab has done is built something closed source using the protocol and design of Jupyter. There are many other non-BSD licensed projects that have done something similar, including Deepnote (closed source), Noteable (closed source and gone?), CoCalc (AGPL+common clause, my project). There are also projects that look and work like the Jupyter notebook but predate it by years, such as the Sage Notebook (my project).

Like Fernando, I wish there much more "giving back" in various forms of extensions to Jupyter. I've done little things over the years to try to encourage this, including trying hard to get people from the above projects to go to JupyterLab meetings and at least contribute the high level ideas they've learned, and in fact authors of those projects frequently have done so.

Personally, overall, I think a rich ecosystem of very different implementations of the Jupyter protocol can be good for the overall community and project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests