Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Vega-Lite's optional encoding types #2584

Closed
joelostblom opened this issue Mar 30, 2022 · 8 comments
Closed

Support Vega-Lite's optional encoding types #2584

joelostblom opened this issue Mar 30, 2022 · 8 comments

Comments

@joelostblom
Copy link
Contributor

Since 4.14, the encoding type is optional in Vega-Lite and inferred according to some simple heuristics if not given explicitly. Altair raises an error if there is no type provided, but maybe we can remove this check now and just let Vega-Lite handle missing types? This could also make error such as misspelling a data frame column name more clear in Altair (which now raises the "field specified without type" error).

Example:

import altair as alt

data = alt.Data(values=[{'x': 'A', 'y': 5},
                        {'x': 'B', 'y': 3},
                        {'x': 'C', 'y': 6},
                        {'x': 'D', 'y': 7},
                        {'x': 'E', 'y': 2}])
alt.Chart(data).mark_bar().encode(
    x='x',
    y='y:Q',
)
ValueError: x encoding field is specified without a type; the type cannot be automatically inferred because the data is not specified as a pandas.DataFrame.

Although the VegaLite spec is valid and produces a sensible figure in this case:

{
  "config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
  "data": {
    "values": [
      {"x": "A", "y": 5},
      {"x": "B", "y": 3},
      {"x": "C", "y": 6},
      {"x": "D", "y": 7},
      {"x": "E", "y": 2}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "x"},
    "y": {"field": "y", "type": "quantitative"}
  },
  "$schema": "https://vega.github.io/schema/vega-lite/v5.2.0.json"
}

Open the Chart in the Vega Editor

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 30, 2022

Thanks for raising this! That would be great.

One choice we have to make is whether to continue inferring the dtype from pandas dataframes, or just leave all type inference to Vega-Lite. I lean toward the latter, so that the behavior will be the same regardless of how the data is specified. What do you think?

@joelostblom
Copy link
Contributor Author

I can see benefits of both approaches, but overall I am leaning towards keeping (and extending) the support for pandas data types. If Vega-Lite would be able to infer quantitative and temporal data, then I would be more in favor of relying on its type inference (vega/vega-lite#8081). Here are my thoughts in more detail:

  1. As you said, it would be nice with a consistent syntax regardless of the data source. On the other hand, I think the Vega-Lite type inference is still not on par with what Altair does via pandas, particularly since it is using nominal as the default for all non-aggregated fields, which means that there would be a lot of :Q typing.

  2. I think it is easier to explain that Altair "understands the data type used in pandas" instead of explaining the default rules in Vega-Lite; especially novices might be somewhat intimidated by this:

    image

  3. With the Vega-Lite type inference, it might be confusing when one needs to be explicit about the data type. Now it is easy: "never" if using pandas. Here I could see an argument for requiring "always regardless of data source" since being explicit about the data types might cause people to think more about what they are trying to visualize, but that would also be slightly less convenient to type out.

  4. I think it would be nice to extend support for Altair data types to also include categorical ordering (my attempt in Represent pandas ordered categoricals as ordinal data #2522), since this would make it even more seamless to use pandas with Altair.

@joelostblom
Copy link
Contributor Author

To be clear, I still think it would be a big benefit to support the default Vega-Lite typing inference outside of pandas and I think it would enable us to have a clearer error message for typos in column names when using Altair.

@ChristopherDavisUCI
Copy link
Contributor

I was thinking about this a little and unfortunately I don't see a great option. I tried deleting the part of the Altair code that raises an error if there's no type, and for example using data.cars.url vs data.cars() drastically changes the chart.

cars

@joelostblom
Copy link
Contributor Author

That's a good point, in your example it would be difficult to tell what went wrong in the first chart and it would not be intuitive that a change of the type is needed when using the URL since it is not when using the dataframe. If we go ahead with making a change here, we might need to handle URLs and dataframes differently and always require types for URLs still. That could still be worthwhile if it would clear up the error messages.

@ChristopherDavisUCI
Copy link
Contributor

I just want to make note of two comments by @mattijn that are possibly relevant to this discussion:
#2868 (comment)
and
#2868 (comment)

@dangotbanned
Copy link
Member

dangotbanned commented Jan 1, 2025

#2584 (comment)

@joelostblom @ChristopherDavisUCI
Have there been any developments that would improve the Vega-Lite inference in this example?

Looking at this issue retrospectively, it seems you explored the topic and the outcome was a worse UX.
I'd lean towards closing the issue as not planned - but wanted to check if there was something I've missed?

@joelostblom
Copy link
Contributor Author

I think with the example @ChristopherDavisUCI showed, it seems that supporting the default types might be more confusing than keeping the error message the way it is, so let's close for now.

@joelostblom joelostblom closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants