Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position = "jitter" and position = position_jitter() have different behavior #2507

Closed
clauswilke opened this issue Apr 1, 2018 · 6 comments · Fixed by #4413
Closed

position = "jitter" and position = position_jitter() have different behavior #2507

clauswilke opened this issue Apr 1, 2018 · 6 comments · Fixed by #4413
Labels
bug an unexpected problem or unintended behavior positions 🥇
Milestone

Comments

@clauswilke
Copy link
Member

I noticed today that position = "jitter" is not the same as position = position_jitter(), and the behavior described in the documentation for position = position_jitter() cannot be obtained for position = "jitter". The underlying issue is a ggproto behavior that has caught me off guard many times (see below).

First the reprex. We start with a simple plot with jitter:

df <- data.frame(x = c(1, 2, 1, 2),
                 y = c(1, 1, 2, 2))

set.seed(1234)
ggplot(df, aes(x, y)) + geom_point(position = "jitter")

screen shot 2018-04-01 at 12 11 41 am

The following code produces different jitter:

set.seed(1234)
ggplot(df, aes(x, y)) + geom_point(position = position_jitter())

screen shot 2018-04-01 at 12 11 51 am

However, this reproduces the jitter from the first example:

set.seed(1234)
ggplot(df, aes(x, y)) + geom_point(position = position_jitter(seed = NULL))

screen shot 2018-04-01 at 12 11 58 am

The problem is that position = "jitter" does not use the default arguments defined in position_jitter(), and therefore seed is set to NULL, not to NA, when we use position = "jitter". This same problem happens generally with ggproto, e.g. when we call a geom with geom_*() vs. geom = "*", and it has bitten me many times in different scenarios.

There are two ways to fix this particular problem:

  1. The simple way: Don't set seed = NA as default in position_jitter().

  2. The complicated way: Fix the ggproto behavior so the default arguments apply when objects are specified by their name.

@hadley hadley added bug an unexpected problem or unintended behavior layers 📈 labels Apr 27, 2018
@hadley hadley added this to the v2.3.0 milestone Apr 27, 2018
@hadley
Copy link
Member

hadley commented May 1, 2018

I think maybe this is only a problem with position objects not geoms and stats, because geoms and stats have their state stored elsewhere. I think if you supply position = "jitter" we should call position_jitter() rather than ggproto(NULL, PositionJitter). Does that make sense?

@clauswilke
Copy link
Member Author

I agree that position = "jitter" should call position_jitter().

In any case, I'm pretty certain the same problem arises with geoms and stats, because I've run into it repeatedly. For example, all the if (is.null(...)) lines here in ggridges are needed to make stat_density_ridges() have reasonable defaults when it is called from geom_density_ridges(). I'll see if I can find a similar case in the current ggplot2 code.

@clauswilke
Copy link
Member Author

Never mind, things seems to be working fine for geoms and stats. Not sure why it doesn't (or didn't) work with my code, but I can't reproduce the problem with StatDensity, for example. It always behaves correctly as far as I can tell.

@hadley
Copy link
Member

hadley commented May 1, 2018

I think the difference is that the params for stats and geoms are stored in the layer; the params for the position adjustment are stored in the position object. It's a weird design.

I'm pretty sure this is not a regression, so I'm going to remove it from the 2.3.0 milestone.

@hadley hadley removed this from the v2.3.0 milestone May 1, 2018
@paleolimbot
Copy link
Member

If this is still worth fixing, it could be done by moving the seed-sanitizing lines:

position_jitter <- function(width = NULL, height = NULL, seed = NA) {
if (!is.null(seed) && is.na(seed)) {
seed <- sample.int(.Machine$integer.max, 1L)
}

to PositionJitter$setup_params()

setup_params = function(self, data) {
list(
width = self$width %||% (resolution(data$x, zero = FALSE) * 0.4),
height = self$height %||% (resolution(data$y, zero = FALSE) * 0.4),
seed = self$seed
)
},

and adding seed = NA to ggproto() class definition:

PositionJitter <- ggproto("PositionJitter", Position,
required_aes = c("x", "y"),
setup_params = function(self, data) {
list(

Many ggproto() class definitions include "field" definitions (usually NULL) that are added in the constructor, so I don't think it's too far off. The fact that the constructors of ggproto() objects do not reside within the object but are external methods causes a lot of copy-and-pasting of code when objects are subclassed (unless you subclass the instance, which gets used a lot within ggplot2 but is not really defined behaviour).

@alanmejiamaza
Copy link

Hi,

Is it possible to plot the dots organised using geom_jitter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior positions 🥇
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants