Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More dplyr-like filter_samples -> samples(fds) %>% filter(...)? #1

Open
lianos opened this issue Feb 6, 2019 · 0 comments
Open

More dplyr-like filter_samples -> samples(fds) %>% filter(...)? #1

lianos opened this issue Feb 6, 2019 · 0 comments

Comments

@lianos
Copy link
Member

lianos commented Feb 6, 2019

Currently the lazyeval package is used for NSE in filter_samples.FacileDataSet. It's a poor implementation, and we can do better.

I think we can simplify the API by:

  1. dropping the filter_samples() function and make a more facile-specific filter functionIt feels like we can; and
  2. Maybe even making a more facile select() which can complement (or even replace?) with_sample_covariates()

filter_samples() and filter()

Currently, the filter_samples() function takes a FacileDataStore, or facile_frame, and enables the use of any of the "sample_covariates" in the FacileDataStore that are defined over the current active set of samples to appear in the LHS of the filter criteria.

For instance, currently we can do:

efds <- exampleFacileDataStore()
crc.samples <- filter_samples(efds, indication == "CRC")

But I think forcing the user to be a bit more explicit simplifies things over all, so am in favor of doing this instead:

crc.samples <- samples(efds) %>%
  filter(indication == "CRC")

select() and with_sample_covariates()

Given a facile_frame of samples, we can use with_sample_covariates() to widen it with covariates over these samples that haven't been extracted and materialized from the FacileDataStore.

In the same spirit as above, wouldn't it make sense to rather allow users to use the same not-yet-materialized covariates of a facile_frame in a select() call?

Continuing from the example above, if you wanted to extract the CRC indication samples and then fatten this facile_frame with "sex" and "state" covariates, currently we would do:

crc.samples <- efds %>%
  filter_samples(indication == "CRC") %>%
  with_sample_covariates(c("sex", "stage"))

But just how filter_samples can pull sample-covariates implicitly out of the FacileDataStore, shouldn't select() be able to do the same? That would look like:

crc.samples <- efds %>%
  filter_samples(indication == "CRC") %>%
  select(..., sex, stage)

But what would the ... be to indicate we want all of the covariates (columns) that are currently in the facile_frame plus sex and stage which need to be pulled out of the FacileDataStore?

Maybe

crc.samples <- efds %>%
  filter_samples(indication == "CRC") %>%
  select(current(), sex, stage)

and then retrieving all the defined covariates currently looks like this:

crc.samples <- efds %>%
  filter_samples(indication == "CRC") %>%
  with_sample_covariates())

But should it look like:

crc.samples <- efds %>%
  filter_samples(indication == "CRC") %>%
  select(everything())

Final New API

So taken together, to get the CRC samples out of the FacileDataStore and decorating with sex and stage covariates would look like:

crc.samples <- efds %>%
  samples() %>%
  filter(indication == "CRC") %>%
  select(current(), sex, stage)

... maybe ...

@lianos lianos changed the title Tackle NSE with rlang/tidyeval filter_samples should be samples(fds) %>% filter(...) -- needs super NSE w/ rlang Sep 9, 2019
@lianos lianos changed the title filter_samples should be samples(fds) %>% filter(...) -- needs super NSE w/ rlang More dplyr-like filter_samples -> samples(fds) %>% filter(...)? Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant