Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentage rather than counts #682

Closed
kuchenrolle opened this issue Mar 30, 2018 · 17 comments
Closed

Percentage rather than counts #682

kuchenrolle opened this issue Mar 30, 2018 · 17 comments
Labels
question vega: vega-lite Requires upstream action in `vega-lite`

Comments

@kuchenrolle
Copy link

I'm trying to create a bar plot aggregating the data points into percentages. So I would like to do something like

plot = alt.Chart(data).mark_bar().encode(
    x = "CategoricalVariable",
    y = "count(*)/sum(count(*))")
)

but for the past hours I've completely failed to find a way of how to do that.

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 30, 2018

I'm not aware of any way to do this in vega-lite, and therefore by extension in Altair. Perhaps @kanitw or @domoritz would know

@jakevdp jakevdp added question vega: vega-lite Requires upstream action in `vega-lite` labels Mar 30, 2018
@domoritz
Copy link
Member

We just added window aggregates in Vega-Lite for this: vega/vega-lite#2488. They are not yet released, though.

@kuchenrolle
Copy link
Author

Hacky way of doing it now? (:

@ellisonbg
Copy link
Collaborator

ellisonbg commented Mar 31, 2018 via email

@domoritz
Copy link
Member

Yes, you can totally precompute the normalized values in Python. In this case there is no need to do it in Vega-Lite itself.

@jakevdp
Copy link
Collaborator

jakevdp commented May 2, 2018

Window transforms are now in Altair; you can see an example here: https://github.com/altair-viz/altair/blob/4c344e79ebf619e762d5d220caab1b7a58996ac5/altair/vegalite/v2/examples/percentage_of_total.py

Unfortunately, the JupyterLab renderer does not yet support vega-lite 2.4 yet, so it may not show up if you're using that frontend until a future release.

@jakevdp jakevdp closed this as completed May 2, 2018
@gh-owestesson
Copy link

@jakevdp just came across transform_window - awesome! I am wondering if there's a way to tell the window aggregator to keep certain groups separate - analogous to the detail argument? In the example you linked above, I'd like to compute a different TotalTime for different groups of data points defined by some variable.

@gh-owestesson
Copy link

I see there is a groupby argument to transform_window - is this where I can specify a grouping?

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 7, 2020

Yes, groupby is the relevant argument.

Note that since this example was created, Altair added transform_joinaggregate, which is a better way to do percentage of total at this point. We should update the linked example.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 7, 2020

Oh, turns out it has already been updated: https://github.com/altair-viz/altair/blob/3541d9531974e6d9d0772e84bc662b96b78f5a2c/altair/examples/percentage_of_total.py#L14-L16

@gh-owestesson
Copy link

Awesome, thanks @jakevdp

@gh-owestesson
Copy link

I'm wondering if it's possible to pass python functions to use as the aggregator for transform_joinaggregate to use. Maybe I'm stretching this too far, but I'd like to be able to dynamically compute a posterior on a certain parameter of a model, so that this posterior can be limited to just the points selected by the user.

If I have a dataframe storing a collection of data points replicated for each value of the parameter of interest phi, and a column logLikelihood storing the likelihood of that data point under that value of phi, I think it would look something like this:

alt.Chart( likelihoods
         ).transform_joinaggregate(
            groupby = ['phi'], phiLikelihood = 'sum(logLikelihood)'
         ).transform_joinaggregate(
            totalLikelihood = logsumexp(datum.phiLikelihood) 
         ).mark_line().encode(
            x = 'phi',
            y = 'posterior:Q'
         ).transform_calculate( 
            posterior = alt.expr.exp(datum.phiLikelihood - datum.totalLikelihood ))

But I assume passing logsumexp to the aggregator is not allowed and is causing the error ValueError: object arrays are not supported. Is there any way around this, am I just doing this wrong, or am I completely off base?

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 21, 2020

No, aggregates in the spec are evaluated in the frontend (i.e. Javascript), so it's not possible to pass arbitrary Python functions as aggregates. You can either pre-compute your aggregations in Python to compute them in the backend, or express them via the specification to compute them in the frontend renderer.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 21, 2020

You can see a list of available frontend aggregations here: https://altair-viz.github.io/user_guide/encoding.html#encoding-aggregates

@gh-owestesson
Copy link

Here's an attempt at expressing logsumexp in the available functions. I'm getting a cryptic error, KeyError: 0, and wondering if expressions and aggregator's can't be chained together in the way that I'm trying to, or if something is just wrong with my syntax. Thanks for any help!

alt.Chart( allLikelihoods
         ).transform_joinaggregate(
            groupby = ['phi'], phiLikelihood = 'sum(logLikelihood)'
         ).transform_joinaggregate(
            totalLikelihood = alt.expr.log(sum(alt.expr.exp(datum.phiLikelihood)))
         ).mark_line().encode(
            x = 'phi',
            y = 'posterior:Q'
         ).transform_calculate( 
            posterior = alt.expr.exp(datum.phiLikelihood - datum.totalLikelihood ))

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 22, 2020

The issue is alt.expr.log(sum(alt.expr.exp(datum.phiLikelihood)))

sum here is the Python builtin function; it can't be used in a transform expression. Python is trying to evaluate sum(alt.expr.exp(datum.phiLikelihood)), which results in the error you see.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 22, 2020

It looks like you want something like this (not tested):

alt.Chart(
    allLikelihoods
 ).transform_joinaggregate(
    groupby = ['phi'], phiLikelihood = 'sum(logLikelihood)'
 ).transform_calculate(
    logPhi = 'exp(datum.phiLikelihood)'
).transform_joinaggregate(
    totalLikelihod = 'sum(logPhi)'
).transform_calculate(
   logTotalLikelihood = 'log(totalLikelihood)'
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question vega: vega-lite Requires upstream action in `vega-lite`
Projects
None yet
Development

No branches or pull requests

5 participants