-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Build vega_datasets into altair? #796
Comments
I like it! |
I don't like it... Personally I consider altair as a plotting tool (like matplotlib/ggplot2), so it's better to separate the tool from the data, especially considering the fact that vega_datasets is mostly for tutorial and the user should be able to get the idea how to use altair in 2-3 days. After that, the users should be good to go and never use vega_dasets again. |
I am probably 50/50 - see both sides.
…On Mon, Apr 30, 2018 at 11:28 AM, pagpires ***@***.***> wrote:
I don't like it... Personally I consider altair as a plotting tool (like
matplotlib/ggplot2), so it's better to separate the tool from the data,
especially considering the fact that vega_datasets is mostly for tutorial
and the user should be able to get the idea how to use altair in 2-3 days.
After that, the users should be good to go and never use vega_dasets again.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#796 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABr0GS17S3coiEv6xTQ-i1sofHckP4Uks5tt1fNgaJpZM4Ts5-X>
.
--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com
|
IMHO, this is a step that has a big benefit for newbies (Easy to read, copy and understand examples), and little cost to experts (A slightly larger package). My view is that any cheap way Altair can welcome in new people is worth the price. |
Im +1, in
It's convenient for toy examples. I'm +0 if it tenfolds the size of the package (as there are many datasets). |
Not sure what happens under the hood at |
sklearn downloads the data the first time it is accessed and caches it locally in vega_datasets includes a few smaller datasets within the installation, but downloads the rest on demand without any local cacheing. The current vega_datasets release, including these bundled datasets, is about 200KB. |
ok, then I'm +1, since |
Just to be clear: I wouldn't suggest actually bunding vega_datasets into altair; rather, I'm suggesting making it a hard dependency and importing it by default in Altair's namespace. |
I just noticed that you created |
One place that it would make sense to have tighter integration, even beyond intro tutorials, is in the geographic datasets used for map backgrounds. |
If you've decided you want this I'd be happy to help with the implementation. |
I would object this because of the following reasons:
|
Since the main developers for vega/vega-lite/altair are also teaching in academia, maybe someone can have a sense on how much time students need vega_dataset to learn altair, and how frequently they will have to come back to it (to re-get familiar) after they learn this package? I assume it will take very short time for students to learn and very few will come back, but I can be wrong. Also, the case of altair is a bit different from sklearn because of 3 factors:
|
This is of course a debatable thing, without a clear answer. I am of the opinion that handlebars for newbies should be a top design priority. Especially for a package like this that aims to be used by groups of programmers with less experience and expertise. The journalists I frequently train are often writing Python for the first time when they open a Jupyter Notebook in a class like First Python Notebook. In my opinion, this package has the potential to breakthrough and draw thousands of people into Python. But to do that it needs to not just convert matplotlib experts and Python developers. It also needs to draw in people who are writing code for the very first time. I see this idea as one of many steps that can reduce technical and conceptual hurdles "What is Vega?" is a question a newbie likely will need to answer to read even the most basic examples right now. So is "Why are there two different ways to import things?" I'm sure those concepts are obvious to everyone reading this thread, but they are not apparent to the beginner and can be enough to stop someone from adopting the tool. I've seen it time and time again. For that reason, I think that the trade off is worth the price. There also might be some way to modify |
For the time being, I would recommend to at least mention this in the documentation on the front page of the project. The example on that page does not currently work out of the box without an additional |
I really appreciate everyone weighing in here. Overall, I think the key points on either side are: Pros
Cons
With those in mind, how about a compromise: make # in altair/__init__.py
class VegaDatasetsUnavailable(object):
def __getattr__(self, attr):
raise ImportError("To use datasets in Altair requires installing the vega_datasets package: "
"See https://github.com/altair-viz/vega_datasets")
__call__ = __getattr__
try:
from vega_datasets import data as datasets
except ImportError:
datasets = VegaDatasetsUnavailable() Then the hard dependencies of Altair would not change, but it would make the following available to users who do have vega_datasets installed: import altair as alt
cars = alt.datasets.cars()
alt.Chart(cars).mark_point()#etc. If We could then adjust all our "getting started" installation instructions to include installation of vega_datasets (as they probably already should). |
Yeah, I'll fix that. Up until recently, Edit: I fixed this in 6846d79 and pushed a new doc build |
I want to mention once again, because the documentation now is more complicated than it could be: |
@sebastianneubauer – I want to avoid recommending that new users install all the dev dependencies (users don't need jinja, sphinx, m2r, docutils, flake8, etc.) Additionally, I think it's much simpler to understand what's going on with |
I fully agree, in fact, I like the explicit approach also more. Sometimes people complain about things being "not convenient enough", but then if things fail they complain about the complexity buried under the convenience layer, about the magic that is happening below the surface ;-) |
The more I think about it, the more I like the compromise I mentioned above, particularly as I develop the Altair tutorial. After PyCon, I'm going to look at implementing this unless people have objections (@ellisonbg – I'd love to hear your thoughts) |
I have an implementation of this at #872. |
I believe this ticket can be closed, per the resolution in #872. |
This is something that @palewire mentioned a while ago, and the idea has been growing on me.
Currently, many of our examples are like this:
What if we were to import vega_datasets into Altair's namespace by default, so instead it would be
The advantage is fewer imports and less boilerplate.
A minor disadvantage is that vega_datasets would become a hard requirement for Altair (unless we did some kind of lazy import mechanism, which would add complication).
A more major disadvantage is that it would obscure the fact that
vega_datasets
is a separate package, rather than a part of Altair, which might confuse people.Thoughts?
The text was updated successfully, but these errors were encountered: