Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development/plot dist comparison warning #1688

Conversation

alexisperakis
Copy link
Contributor

@alexisperakis alexisperakis commented May 6, 2021

Description

Checklist

  • Follows official PR format
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

Issue: #1680

  • if statement added code to warn users if the subplots are over the limit. Notice that in the comparison, the length of the list should be increased by one:len(dc_plotters) + 1
  • There is a test case at the corresponding test file, with max_sublots set to 6.

Let me know if there is any issue regarding my work. Thank you for the help.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .DS_Store files were added to the PR, they should be removed and probably have .gitignore ignore them from now on.

@@ -130,9 +131,17 @@ def plot_dist_comparison(
len_plots = rcParams["plot.max_subplots"] // (len(groups) + 1)
len_plots = len_plots if len_plots else 1
dc_plotters = [
list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots]
list(xarray_var_iter(data, var_names=var, combined=True)) [:len_plots]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is being truncated twice now, so I think the warning will always be triggered. Can you check locally in a few cases in addition to the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will look into it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I alter the .gitignore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding things to the gitignore is fine, removing them is not

@alexisperakis alexisperakis changed the title Development/plot dist comparison warning [WIP] Development/plot dist comparison warning May 6, 2021
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is still one ds store left

Comment on lines 134 to 144
list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots]
for data, var in zip(datasets, var_names)
]
if len(dc_plotters) + 1 > len_plots:
warnings.warn(
"rcParams['plot.max_subplots'] ({max_plots}) is smaller than the number "
"of variables to plot ({len_plotters}), generating only {max_plots} "
"plots".format(max_plots=len_plots, len_plotters=len(dc_plotters)),
UserWarning,
)
dc_plotters = dc_plotters[:len_plots]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the xarray_var_iter list is still getting truncated when defining dc_plotters, and it should be truncated at that level but afterwards while printing the warning.

dc_plotters will have length of 2 or 3 always, as it is a list of lists, it's those sublists the ones that need to be truncated.

Copy link
Contributor Author

@alexisperakis alexisperakis May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I change the code by adding a counter, summing up the length of each list and then compare it with len_plots.
I also used a for loop to truncate the sublists. However, I am not sure about the tests and also do I have to truncate each sublist with the [:len_plots]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is confusing since len_plots is different from rcParams["plot.max_subplots"], and it isn't obvious to me that

  1. you can plot exactly that number of subplots
  2. you won't plot more

I think

total_plots = sum(sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True)) for data, var in zip(datasets, var_names))

gives you the number of "desired plots". I would check if that number is bigger than rcParams["plot.max_subplots"], and if so, issue a warning, and say you'll plot however many things are in dc_plotters (which i guess is sum(len(j) for j in dc_plotters)). The calculation on line 134 already verifies that you shouldn't have to truncate further (I think?), so you'd just have to issue a warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I think it is an accident that the test passes, and line 137 isn't the condition we care about at all!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also thought that at line 134 dc_plotters = [ list(xarray_var_iter(data, var_names=var, combined=True))[:len_plots]dc_plotters is already truncated, so I think that it should stay as is.
However, do you think I should compare total_plots with rcParams["plot.max_subplots"] and not with len_plots which divides rcParams["plot.max_subplots"] with (len(groups) + 1), because it was mentioned that we want to cut dc_plotters to have at most len_plots?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alexisperakis you should disregard the comment in the issue (the one you linked) and focus on this comment (the beggining of this discussion). Sorry about the initial confusion, I did not look carefully enough at the code in the issue, but in general, if a newer comment contradicts an older one, pay atention to the new one only, at least in the parts where they contradict each other.

dc_plotters has length 2 or 3 always and it does not have to be truncated in any case. The nested lists within dc_plotters are the ones that need to be truncated to have at most len_plots.

@codecov
Copy link

codecov bot commented May 9, 2021

Codecov Report

Merging #1688 (fcefcfd) into main (7ebedd2) will increase coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head fcefcfd differs from pull request most recent head 92f9947. Consider uploading reports for the commit 92f9947 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1688      +/-   ##
==========================================
+ Coverage   90.86%   90.90%   +0.04%     
==========================================
  Files         108      108              
  Lines       11818    11767      -51     
==========================================
- Hits        10738    10697      -41     
+ Misses       1080     1070      -10     
Impacted Files Coverage Δ
arviz/plots/distcomparisonplot.py 93.75% <100.00%> (+0.72%) ⬆️
arviz/data/base.py 97.87% <0.00%> (-0.35%) ⬇️
arviz/data/io_pystan.py 96.08% <0.00%> (+<0.01%) ⬆️
arviz/data/io_cmdstanpy.py 96.50% <0.00%> (+1.06%) ⬆️
arviz/data/io_cmdstan.py 90.95% <0.00%> (+1.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ebedd2...92f9947. Read the comment docs.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, I think it will also be helpful to explain a bit better what the plotting function does to avoid further confusion.

plot_dist_comparison allows comparing distributions and can either compare 2 distributions between them or 3 of them at the same time. This is achieved by having a subplot per distribution and an extra plot where all 2 or 3 are overlayed together. Therefore, subplots are defined on a variable basis first, but each variable has 3-4 subplots, so subplots are added by blocks, not independently. In the case of comparing 2 dists, the look of the plot is the following:

====
(var1, dist1)  |  (var1, dist2)
----
  (var1, dist1+dist2)
====
(var2, dist1)  |  (var2, dist2)
----
  (var2, dist1+dist2)
====
...

The example in the docs has a single variable: https://arviz-devs.github.io/arviz/api/generated/arviz.plot_dist_comparison.html.

dc_plotters is a nested list of length # of distributions to compare, each of the lists within has length # of variables to plot. As it makes no sense to compare distributions yet plot one of them and not the one it should be compared with, we can't cut the number of plots by cutting dc_plotters we need to cut the number of variables (like we are currently doing).

len_plots takes this block structure into account to translate the number of subplots to number of varibles so we can use that value to cut the inner list of variables. Most of the new changes do not consider the extra overlayed plot and therefore would return incorrect results.

Comment on lines 151 to 152
for i in range(0, len(dc_plotters)):
dc_plotters[i] = dc_plotters[i][:len_plots]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary because the inner lists are truncated at creation, this would have no effect.

sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True))
for data, var in zip(datasets, var_names)
)
maxplots = sum(len(splot) for splot in dc_plotters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxplots as defined here is not necessarily equal to the rcParam, here plots are done in blocks of 3-4 subplots, so this will be equal or lower than the rcParam. To avoid confusion, I think the warning should use the rcParam first, then in the second occurrence it can be maxplots. It is not necessary nor correct to loop over dc_plotters though, it should be:

maxplots = len(dc_plotters[0]) *  (len(groups) + 1)

Comment on lines 138 to 141
total_plots = sum(
sum(1 for _ in xarray_var_iter(data, var_names=var, combined=True))
for data, var in zip(datasets, var_names)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here similar comment as for next line, the nested loop is not needed and is actually incorrect. It should be something like:

total_plots = sum(1 for _ in xarray_sel_iter(datasets[0], var_names=var_names[0], combined=True))

Also as we only want to count, we can use sel instead of using var, we don't need to subset the data to get the values in each position to count how many positions there are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thank you for this explanation it was really useful and sorry for all the trouble caused, it's my first time contributing.
So what I have understood and already done (no commit yet) is that I have compared the total_plots which are the desired plots by the user with the rcParams["plot.max_subplots"]. If total_plots are greater than this then the warning is popping up and it does not truncate the plots because they are already truncated at the creation. I also used the maxplots only at the description of the warning. I've also included a test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, and don't worry about the discussion and back-and-forth comments, I think they were useful to all of us.

Comment on lines 138 to 140
total_plots = sum(
1 for _ in xarray_sel_iter(datasets[0], var_names=var_names[0], combined=True)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to multiply in the sample code. This counts the number of variables which needs to be multiplied by (len(groups) + 1) to be equal to the total number of subplots that would be needed if we weren't truncating the lists of plotters (also read the next comment about the warning message).

Comment on lines 145 to 147
"rcParams['plot.max_subplots'] ({max_plots}) is smaller than the number "
"of variables to plot ({len_plotters}), generating only {max_plots} "
"plots".format(max_plots=maxplots, len_plotters=total_plots),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated, not sure what is the best way to do so, but somehow. Here are the things that need to somehow be changed:

  • smaller than the number of variables to plot does not work here. In most plots the number of subplots is the same as the number of variables, but here each variable has 3-4 subplots. Maybe using subplots instead of variable is enough?
  • rcParams['plot.max_subplots'] ({max_plots}) precisely because of this 3-4 factor, max_plots is not necessarily equal to the rcParam. I think it will be very confusing to users to set the rcParam to 7, then get a warning saying we'll only plot 6 subplots because you have set the rcParam to 6.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge, it's only missing being added to the changelog

@alexisperakis alexisperakis changed the title [WIP] Development/plot dist comparison warning Development/plot dist comparison warning Jun 27, 2021
@OriolAbril OriolAbril merged commit cd256c4 into arviz-devs:main Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants