Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Plots.jl #256

Closed
bkamins opened this issue Mar 24, 2020 · 28 comments · Fixed by #340
Closed

Integration with Plots.jl #256

bkamins opened this issue Mar 24, 2020 · 28 comments · Fixed by #340

Comments

@bkamins
Copy link
Member

bkamins commented Mar 24, 2020

I am making an issue here, because what I discuss is not released yet.

When we deprecate CategoricalString we should coordinate with Plots.jl so that it is aware of CategoricalValue{String} and properly unwrap it (or maybe even properly unwrap any CategoricalValue), as otherwise a lot of plotting code that was working in the past will be broken.

@nalimilan
Copy link
Member

You mean that the fact that values no longer inherit from AbstractString is going to break Plots.jl? If so, then we should get in touch with them right now, as they can start fixing this using CategoricalValue (or only CategoricalValue{String}), which already exists. But they will probably be reluctant to take a dependency on CategoricalArrays.

@bkamins
Copy link
Member Author

bkamins commented Mar 26, 2020

Yes - it is going to break (actually I think not break but produce deprecation warnings) a lot of code that used to work when plotting data frames with categorical columns that are strings.

But I guess you are right that CategoricalArrays.jl as a dependency for Plots.jl is problematic. However, maybe I was not right and it is not a problem. CategoricalString is AbstractString so I guess then Plots.jl assumes it is a string and does not touch it. CategoricalValue is not AbstractString so maybe Plots.jl in such a case will convert it to string, which will automatically work. Then we have a problem only in "transition period" when CategoricalString is still used by CategoricalArrays.jl but is deprecated (then plotting will produce deprecation warnings as Plots.jl does index into strings). So maybe a "hard" switch to CategoricalValue and dropping CategoricalString immediately would be better.

@daschw - do you have any opinion on this?

@nalimilan
Copy link
Member

I don't think it should print deprecation warnings, as AFAICT Plots.jl doesn't use CategoricalString at all. But if it relies indirectly on the fact that CategoricalString <: AbstractString, then user code will break without a deprecation.

@bkamins
Copy link
Member Author

bkamins commented Mar 26, 2020

Yes it relies on CategoricalString <: AbstractString so for example it does things like:

julia> x = categorical(["abc"])
1-element CategoricalArray{String,1,UInt32}:
 "abc"

julia> first(x[1])
┌ Warning: `iterate(x::CategoricalString)` is deprecated, use `iterate(String(x))` instead.

and if x was passed to plot then such things happen. When you do:

julia> using Plots

julia> plot(1, x)

you get like 10 warnings now as:

julia> x[1]
CategoricalString{UInt32} "abc"

is this clear now where the problem is? (still maybe nothing needs to be done if
x = categorical(["abc"]) will produce CategoricalValue{String} elements)

@daschw
Copy link

daschw commented Mar 26, 2020

Thanks a lot for thinking of Plots and informing me! I don't have time right now, but I will check and report back (or ask some stupid questions) soon.

@daschw
Copy link

daschw commented Mar 26, 2020

So, from what I've tried now with CategoricalArrays master, it won't just work with CategoricalValue. Without CategoricalValue{<:AbstractString} <: AbstractString and CategoricalValue{<:Real} <: Real or something similar, Plots won't know to extract/convert the data.
A very simple solution is using a type recipe.

@recipe f(::Type{T}, ::T) where {T <: CategoricalValue} = (get, string)

It returns two functions. The first tells Plots how to convert CategoricalValues to something it can handle (String, Number) and the second defines how to format this converted values in the axes tick labels. I tested the recipe above and it works for both CategoricalValue{<:String} and CategoricalValue{<:Real}.
I think that would be the easiest way forward. Now it just has to be decided, whether

  • Plots takes a CategoricalArrays dependency and defines above recipe or
  • CategoricalArrays takes a RecipesBase dependency to add this.

Of course our preferred option would be the latter. We are currently working on supporting RecipesBase also with Makie. This will probably take a little longer, but when we have that you'd also have automatic Makie support for CategoricalArrays, if you include the recipe here.

@nalimilan
Copy link
Member

Ah that's too bad. I guess we'll have to depend on RecipesBase then (we already depend on JSON for similar reasons, which is absurd). We really need a mechanism for optional dependencies.

Something I don't really understand: why doesn't Plots simply call string on the values of unknown types to get a string representation? That would work with CategoricalValue.

@bkamins
Copy link
Member Author

bkamins commented Mar 26, 2020

why doesn't Plots simply call string on the values of unknown types to get a string representation?

This is what also assumed

@daschw
Copy link

daschw commented Mar 26, 2020

Something I don't really understand: why doesn't Plots simply call string on the values of unknown types to get a string representation? That would work with CategoricalValue.

Consider cvs = categorical([1, 2, 1000]). If we convert this to string (as a type unknown to Plots) it would have no numerical representation, so the values would be next to each other on the axis with equal distance, which is surely not what we want:

plot(string.(cvs), 1:3)

categorical

@bkamins
Copy link
Member Author

bkamins commented Mar 26, 2020

This I understand, but I guess recipes could allow to opt-out from a default behaviour which would be to convert to string (as currently I understand it errors).

@daschw
Copy link

daschw commented Mar 26, 2020

Yeah, maybe it would make sense to have string conversion as default. Honestly, I have never thought about this. Then it would error if no string method is defined for the type or lead to unexpected results like in my example above. Still, I guess you would not want that behavior for CategoricalValue{<:Real}. So the issue of depending on RecipesBase or Plots depending on CategoricalArrays remains. I'm not sure how bad a CategoricalArrays dependency would be for Plots. For sure we want to support plotting Dataframes and similar Tables.
@mkborregaard What's your opinion on this, or do you see a different solution without introducing dependencies in Plots and CategoricalArrays?
In my opinion this is a perfect showcase of what RecipesBase was created for.

@bkamins
Copy link
Member Author

bkamins commented Mar 26, 2020

I think CategoricalArrays.jl is not a good dependency for Plots.jl as of today.

The reason is that CategoricalArrays.jl invalidates a lot of compiled code from Base currently:

julia> @time using Plots
  6.199417 seconds (8.32 M allocations: 469.658 MiB, 3.66% gc time)

julia> @time plot(1:10)
  4.805318 seconds (9.98 M allocations: 497.441 MiB, 2.38% gc time)

vs

julia> @time using CategoricalArrays, Plots
 10.360881 seconds (23.50 M allocations: 1.183 GiB, 5.03% gc time)

julia> @time plot(1:10)
  5.236466 seconds (10.96 M allocations: 562.959 MiB, 3.25% gc time)

so reduction of time-to-first-plot would be more problematic again.

@daschw
Copy link

daschw commented Mar 26, 2020

OK, thanks, good to know!

@nalimilan
Copy link
Member

nalimilan commented Mar 27, 2020

Consider cvs = categorical([1, 2, 1000]). If we convert this to string (as a type unknown to Plots) it would have no numerical representation, so the values would be next to each other on the axis with equal distance, which is surely not what we want:

categorical([1, 2, 1000]) should definitely not be treated as numerical data. The point of CategoricalValue is to indicate that whatever value it wraps is categorical, i.e. such values should be considered as qualitative instead of quantitative. So it would be completely fine to call string on it.

Actually, the recipe would probably have to be:

@recipe f(::Type{T}, ::T) where {T <: CategoricalValue} = (string, string)

to ensure that numbers wrapped in categorical values are not treated as numeric.

(In practice, CategoricalValue{<:Real} is probably not that useful nor common. CategoricalValue{String} is by far the most useful case.)

@mkborregaard
Copy link

Doesn't that break ordinal? That's a fairly common use case, using categorical values to order categories (e.g. groups)

@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2020

That's a fairly common use case, using categorical values to order categories (e.g. groups)

An excellent point - thank you for raising this. Is @recipe able to also provide a custom isless? (as otherwise we would have to make CategoricalArrays.jl a dependency of Plots.jl)

@mkborregaard
Copy link

it could probably eval a custom isless into being in the package but I'm not sure why that would be required - could you expand?

@daschw
Copy link

daschw commented Mar 27, 2020

How about a new axis attribute sortstrings::Bool defaulting to false?

@daschw
Copy link

daschw commented Mar 27, 2020

and

@recipe function f(::Type{T}, ::T) where {T <: CategoricalValue}
    sortstrings --> true
    string, string
end

@nalimilan
Copy link
Member

I'm not sure how sortstrings would help, as once a CategoricalValue has been converted to a string, the custom ordering is lost (for example ["Low", "Middle", "High"] would become ["High", "Low", "Middle"]). Or do you mean that you would call string.(levels(x)) on the input data, so that the order of levels(x) is preserved?

I have no idea how Plots works, but ideally it could just treat CategoricalValue as any type without even knowing its existence: isless and sort are correctly defined to follow the user-specified order. If Plots worked with CategoricalValue objects and simply called string at the end when it needs a text representation, everything would just work.

@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2020

Now I have checked that even now we do not handle order in from CategoricalArrays.jl correctly. When you run the following code:

julia> using CategoricalArrays, Plots

julia> c = categorical(["a", "b", "c", "d"], ordered=true)
4-element CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "c"
 "d"

julia> levels!(c, ["b", "d", "a", "c"])
4-element CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "c"
 "d"

julia> scatter(c, 1:4)

you get:
Screenshot from 2020-03-27 14-13-30

while if you use the correct order by:

julia> scatter(levelcode.(c), 1:4)

you get correctly
Screenshot from 2020-03-27 14-16-30

(of course this time strings are lost and numbers on X axis are printed)

In short Plots.jl currently does not handle this correctly anyway, so:

  1. we can either leave things as is (and use a simple recipe)
  2. or we can make it correct (which will require some kind of integration between CategoricalArrays.jl and Plots.jl as Plots.jl must be in some way be able to discover the correct order of levels in categorical arrays)

@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2020

If Plots.jl worked with CategoricalValue objects and simply called string at the end when it needs a text representation, everything would just work.

This is what I have assumed that Plots.jl works like initially, but it seems Plots.jl takes a different approach (I am not sure why as I do not know the details of Plots.jl either).

@mkborregaard
Copy link

That's also what I thought...

@daschw
Copy link

daschw commented Mar 27, 2020

I guess we can add a "try to convert to string" fallback for prepareSeriesData.

@lyon-fnal
Copy link

Hi - where things at with this issue? I'm trying to make a plot where the x-axis is a categorical array with custom sorting of the levels. If I just try to plot it outright, I get the error Cannot convert CategoricalValue{String,UInt32} to series data for plotting. Looking at this issue, I understand why. I could convert the array to a string - and of course that works - but I lose the custom sorting. I would like the x-axis values to have my custom sorting. Do you all have advice as for how to proceed? Thanks! --Adam

@lyon-fnal
Copy link

Apparently I asked a similar question about 2.5 years ago ... https://discourse.julialang.org/t/ordering-the-y-axis-values-in-a-dotplot/10642 . It has a work-around. But would be nice if Categorical arrays just worked.

@bicepjai
Copy link

bicepjai commented Apr 4, 2021

Do we have a solution yet on ordering the x axis ticks for violin like categorical plots ?

Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Xeon(R) W-3275M CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)

StatsPlots v0.14.19

I get

Cannot convert CategoricalValue{String,UInt32} to series data for plotting

@nalimilan
Copy link
Member

OK, we can't leave this without a solution. See #340.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants