-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development/plot dist comparison warning #1688
Development/plot dist comparison warning #1688
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .DS_Store
files were added to the PR, they should be removed and probably have .gitignore
ignore them from now on.
arviz/plots/distcomparisonplot.py
Outdated
@@ -130,9 +131,17 @@ def plot_dist_comparison( | |||
len_plots = rcParams["plot.max_subplots"] // (len(groups) + 1) | |||
len_plots = len_plots if len_plots else 1 | |||
dc_plotters = [ | |||
list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots] | |||
list(xarray_var_iter(data, var_names=var, combined=True)) [:len_plots] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is being truncated twice now, so I think the warning will always be triggered. Can you check locally in a few cases in addition to the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I alter the .gitignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding things to the gitignore is fine, removing them is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is still one ds store left
arviz/plots/distcomparisonplot.py
Outdated
list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots] | ||
for data, var in zip(datasets, var_names) | ||
] | ||
if len(dc_plotters) + 1 > len_plots: | ||
warnings.warn( | ||
"rcParams['plot.max_subplots'] ({max_plots}) is smaller than the number " | ||
"of variables to plot ({len_plotters}), generating only {max_plots} " | ||
"plots".format(max_plots=len_plots, len_plotters=len(dc_plotters)), | ||
UserWarning, | ||
) | ||
dc_plotters = dc_plotters[:len_plots] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the xarray_var_iter
list is still getting truncated when defining dc_plotters
, and it should be truncated at that level but afterwards while printing the warning.
dc_plotters
will have length of 2 or 3 always, as it is a list of lists, it's those sublists the ones that need to be truncated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I change the code by adding a counter, summing up the length of each list and then compare it with len_plots
.
I also used a for loop to truncate the sublists. However, I am not sure about the tests and also do I have to truncate each sublist with the [:len_plots]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is confusing since len_plots
is different from rcParams["plot.max_subplots"]
, and it isn't obvious to me that
- you can plot exactly that number of subplots
- you won't plot more
I think
total_plots = sum(sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True)) for data, var in zip(datasets, var_names))
gives you the number of "desired plots". I would check if that number is bigger than rcParams["plot.max_subplots"]
, and if so, issue a warning, and say you'll plot however many things are in dc_plotters
(which i guess is sum(len(j) for j in dc_plotters)
). The calculation on line 134 already verifies that you shouldn't have to truncate further (I think?), so you'd just have to issue a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I think it is an accident that the test passes, and line 137 isn't the condition we care about at all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I also thought that at line 134 dc_plotters = [ list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots]
dc_plotters is already truncated, so I think that it should stay as is.
However, do you think I should compare total_plots
with rcParams["plot.max_subplots"]
and not with len_plots
which divides rcParams["plot.max_subplots"]
with (len(groups) + 1)
, because it was mentioned that we want to cut dc_plotters
to have at most len_plots
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alexisperakis you should disregard the comment in the issue (the one you linked) and focus on this comment (the beggining of this discussion). Sorry about the initial confusion, I did not look carefully enough at the code in the issue, but in general, if a newer comment contradicts an older one, pay atention to the new one only, at least in the parts where they contradict each other.
dc_plotters
has length 2 or 3 always and it does not have to be truncated in any case. The nested lists within dc_plotters
are the ones that need to be truncated to have at most len_plots
.
Codecov Report
@@ Coverage Diff @@
## main #1688 +/- ##
==========================================
+ Coverage 90.86% 90.90% +0.04%
==========================================
Files 108 108
Lines 11818 11767 -51
==========================================
- Hits 10738 10697 -41
+ Misses 1080 1070 -10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, I think it will also be helpful to explain a bit better what the plotting function does to avoid further confusion.
plot_dist_comparison
allows comparing distributions and can either compare 2 distributions between them or 3 of them at the same time. This is achieved by having a subplot per distribution and an extra plot where all 2 or 3 are overlayed together. Therefore, subplots are defined on a variable basis first, but each variable has 3-4 subplots, so subplots are added by blocks, not independently. In the case of comparing 2 dists, the look of the plot is the following:
====
(var1, dist1) | (var1, dist2)
----
(var1, dist1+dist2)
====
(var2, dist1) | (var2, dist2)
----
(var2, dist1+dist2)
====
...
The example in the docs has a single variable: https://arviz-devs.github.io/arviz/api/generated/arviz.plot_dist_comparison.html.
dc_plotters
is a nested list of length # of distributions to compare, each of the lists within has length # of variables to plot. As it makes no sense to compare distributions yet plot one of them and not the one it should be compared with, we can't cut the number of plots by cutting dc_plotters
we need to cut the number of variables (like we are currently doing).
len_plots
takes this block structure into account to translate the number of subplots to number of varibles so we can use that value to cut the inner list of variables. Most of the new changes do not consider the extra overlayed plot and therefore would return incorrect results.
arviz/plots/distcomparisonplot.py
Outdated
for i in range(0, len(dc_plotters)): | ||
dc_plotters[i] = dc_plotters[i][:len_plots] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary because the inner lists are truncated at creation, this would have no effect.
arviz/plots/distcomparisonplot.py
Outdated
sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True)) | ||
for data, var in zip(datasets, var_names) | ||
) | ||
maxplots = sum(len(splot) for splot in dc_plotters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxplots
as defined here is not necessarily equal to the rcParam, here plots are done in blocks of 3-4 subplots, so this will be equal or lower than the rcParam. To avoid confusion, I think the warning should use the rcParam first, then in the second occurrence it can be maxplots
. It is not necessary nor correct to loop over dc_plotters though, it should be:
maxplots = len(dc_plotters[0]) * (len(groups) + 1)
arviz/plots/distcomparisonplot.py
Outdated
total_plots = sum( | ||
sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True)) | ||
for data, var in zip(datasets, var_names) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here similar comment as for next line, the nested loop is not needed and is actually incorrect. It should be something like:
total_plots = sum(1 for _ in xarray_sel_iter(datasets[0], var_names=var_names[0], combined=True))
Also as we only want to count, we can use sel
instead of using var
, we don't need to subset the data to get the values in each position to count how many positions there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thank you for this explanation it was really useful and sorry for all the trouble caused, it's my first time contributing.
So what I have understood and already done (no commit yet) is that I have compared the total_plots
which are the desired plots by the user with the rcParams["plot.max_subplots"]
. If total_plots
are greater than this then the warning is popping up and it does not truncate the plots because they are already truncated at the creation. I also used the maxplots
only at the description of the warning. I've also included a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, and don't worry about the discussion and back-and-forth comments, I think they were useful to all of us.
arviz/plots/distcomparisonplot.py
Outdated
total_plots = sum( | ||
1 for _ in xarray_sel_iter(datasets[0], var_names=var_names[0], combined=True) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to multiply in the sample code. This counts the number of variables which needs to be multiplied by (len(groups) + 1)
to be equal to the total number of subplots that would be needed if we weren't truncating the lists of plotters (also read the next comment about the warning message).
arviz/plots/distcomparisonplot.py
Outdated
"rcParams['plot.max_subplots'] ({max_plots}) is smaller than the number " | ||
"of variables to plot ({len_plotters}), generating only {max_plots} " | ||
"plots".format(max_plots=maxplots, len_plotters=total_plots), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated, not sure what is the best way to do so, but somehow. Here are the things that need to somehow be changed:
smaller than the number of variables to plot
does not work here. In most plots the number of subplots is the same as the number of variables, but here each variable has 3-4 subplots. Maybe using subplots instead of variable is enough?rcParams['plot.max_subplots'] ({max_plots})
precisely because of this 3-4 factor,max_plots
is not necessarily equal to the rcParam. I think it will be very confusing to users to set the rcParam to 7, then get a warning saying we'll only plot 6 subplots because you have set the rcParam to 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to merge, it's only missing being added to the changelog
…opment/plot_dist_comparison_warning Resolve conflicts
…st_comparison_warning
Description
Checklist
Issue: #1680
len(dc_plotters) + 1
max_sublots
set to 6.Let me know if there is any issue regarding my work. Thank you for the help.