-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splat pop #106
Splat pop #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azodichr These are mostly minor things that I will fix so you can ignore them.
Only thing I would ask you to do before we merge is to add yourself as a contributor (role = c("ctb")
) in the DESCRIPTION
. You have definitely contributed enough to deserve some acknowledgement!
@@ -0,0 +1,418 @@ | |||
--- | |||
title: "splatPop: simulating single-cell data for populations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proofread vignette
Great. I’ve added myself to the description.
… On Oct 21, 2020, at 7:37 PM, Luke Zappia ***@***.***> wrote:
@lazappi commented on this pull request.
@azodichr <https://github.com/azodichr> These are mostly minor things that I will fix so you can ignore them.
Only thing I would ask you to do before we merge is to add yourself as a contributor (role = c("ctb")) in the DESCRIPTION. You have definitely contributed enough to deserve some acknowledgement!
In R/mock-data.R <#106 (comment)>:
> +#' @importFrom stats setNames
+#'
+#' @export
+#'
+mockVCF <- function(n.snps = 1e4, n.samples = 10, chromosome = 22){
+
+
+ if (!requireNamespace("VariantAnnotation", quietly = TRUE)) {
+ stop("Creating a mock VCF requires the 'VariantAnnotation' package.")
+ }
+
+ sample_names <- paste0("sample_", formatC(1:n.samples,
+ width = nchar(n.samples),
+ format = "d",
+ flag = "0"))
+ snp_names <- paste0("snp_", formatC(1:n.snps,
Change to seq_len()
In R/mock-data.R <#106 (comment)>:
> +#' @param chromosome Chromosome name
+#'
+#' @return data.frame containing mock gff data.
+#'
+#' @importFrom stats setNames
+#'
+#' @export
+#'
+mockVCF <- function(n.snps = 1e4, n.samples = 10, chromosome = 22){
+
+
+ if (!requireNamespace("VariantAnnotation", quietly = TRUE)) {
+ stop("Creating a mock VCF requires the 'VariantAnnotation' package.")
+ }
+
+ sample_names <- paste0("sample_", formatC(1:n.samples,
Change to seq_len()
In R/splat-estimate.R <#106 (comment)>:
> @@ -13,7 +13,8 @@
#' \code{\link{splatEstOutlier}}, \code{\link{splatEstBCV}},
#' \code{\link{splatEstDropout}}
#'
-#' @return SplatParams object containing the estimated parameters.
+#' @return SplatParams or splatPopParams object containing the estimated
Think it's fine to just say SplatParams here because SplatPopParams is a subclass.
In R/splat-estimate.R <#106 (comment)>:
> + if(class(params) == "splatPopParams"){
+ checkmate::assertClass(params, "splatPopParams")
+ }else{checkmate::assertClass(params, "SplatParams")}
Check how the class check works
In R/splat-simulate.R <#106 (comment)>:
> #' trajectories (eg. differentiation processes).
-#' @param verbose logical. Whether to print progress messages.
+#' @param verbose logical. Whether to print progress messages (Default=TRUE).
Remove default
In R/splatPop-estimate.R <#106 (comment)>:
> @@ -0,0 +1,178 @@
+#' Estimate population/eQTL simulation parameters
+#'
+#' Estimate simulation parameters for the eQTL population simulation from
+#' real data. See the individual estimation functions for more details on
+#' how this is done.
+#'
+#' @param params splatPopParams object containing parameters for the
Capitalise SplatPopParams
In R/splatPop-estimate.R <#106 (comment)>:
> +
+ # Get parameters for population wide gene mean and variance distributions
+ if (!is.null(means)) {
+ params <- splatPopEstimateMeanCV(params, means)
+ }
+
+ return(params)
+
+}
+
+#' Estimate eQTL Effect Size parameters
+#'
+#' Estimate rate and shape parameters for the gamma distribution used to
+#' simulate eQTL (eSNP-eGene) effect sizes.
+#'
+#' @param eqtl Data.frame with all or top eQTL pairs from a real eQTL analysis.
data.frame
In R/splatPop-estimate.R <#106 (comment)>:
> +#' The Shapiro-Wilks test is used to determine if the library sizes are
+#' normally distributed. If so a normal distribution is fitted to the library
+#' sizes, if not (most cases) a log-normal distribution is fitted and the
+#' estimated parameters are added to the params object. See
+#' \code{\link[fitdistrplus]{fitdist}} for details on the fitting.
Maybe left over from somewhere else?
In R/splatPop-simulate.R <#106 (comment)>:
> + glob.genes <- subset(key, key$eQTL.type == "global")$geneID
+ g.specific <- sample(glob.genes, size = n.specific.each)
+ key[["eQTL.type"]][key$geneID %in% g.specific] <- g
+ }
+
+ # Assign group-specific effects (differential expression, not eQTL)
+ nGenes <- nrow(key)
+ de.prob <- getParam(params, "de.prob")
+ de.downProb <- getParam(params, "de.downProb")
+ de.facLoc <- getParam(params, "de.facLoc")
+ de.facScale <- getParam(params, "de.facScale")
+
+ for (idx in seq_len(n.groups)) {
+ de.facs <- getLNormFactors(nGenes, de.prob, de.downProb,
+ de.facLoc, de.facScale)
+ key[, paste0(groups[idx], ".GroupEffect")] <- de.facs
Remove "." from column names
In R/splatPop-simulate.R <#106 (comment)>:
> +splatPopCleanSCE <- function(sim.all){
+
+ # Remove redundant sce info
+ keep.id <- gsub("_Gene", "", names(rowData(sim.all))[1])
+
+ shared.out.factor <- rowData(sim.all)[[paste0(keep.id, "_OutlierFactor")]]
+ rowData(sim.all)[grepl("_OutlierFactor", names(rowData(sim.all)))] <- NULL
+ rowData(sim.all)$Shared_OutlierFactor <- shared.out.factor
+
+ rowData(sim.all)[grepl("_Gene", names(rowData(sim.all)))] <- NULL
+ metadata(sim.all)[2:length(names(metadata(sim.all)))] <- NULL
+
+ return(sim.all)
+}
Can probably replace this with the new minmiseSCE() function (which I have written but is in another branch.
In R/splatPopParams-methods.R <#106 (comment)>:
> + for (pkg in c("VariantAnnotation", "preprocessCore")) {
+ if (!requireNamespace(pkg, quietly = TRUE)) {
+ stop("The splatPop simulation requires the ", pkg, " package.")
+ }
+ }
Check should probably go first
In R/splatPopParams-methods.R <#106 (comment)>:
> + valid <- TRUE
+ } else {
+ valid <- checks[checks != TRUE]
+ valid <- paste(names(valid), valid, sep = ": ")
+ }
+
+ return(valid)
+})
+
+
+#' @importFrom methods callNextMethod
+setMethod("show", "splatPopParams", function(object) {
+
+ pp <- list("Population params:" = c("(mean.shape)" = "pop.mean.shape",
+ "(mean.rate)" = "pop.mean.rate",
+ #"[group.prop]" = "group.prop",
Delete comment
In R/splatPopParams-methods.R <#106 (comment)>:
> + "(cv.params)" = "pop.cv.param"),
+ "eQTL params:" = c("[eqtl.n]" = "eqtl.n",
+ "[eqtl.dist]" = "eqtl.dist",
+ "[eqtl.maf.min]" = "eqtl.maf.min",
+ "[eqtl.maf.max]" = "eqtl.maf.max",
+ "[eqtl.group.specific]" = "eqtl.group.specific",
+ "(eqtl.ES.shape)" = "eqtl.ES.shape",
+ "(eqtl.ES.rate)" = "eqtl.ES.rate"))
+
+ callNextMethod()
+ showPP(object, pp)
+})
+
+
+#' @Rdname setParam
+setMethod("setParam", "splatPopParams", function(object, name, value) {
I think most of this should be covered by the SplatParams method, but need to check.
In R/utils.R <#106 (comment)>:
> +#' @importFrom utils globalVariables
+utils::globalVariables(c("pval_nominal", 'perc', 'gene_id', 'type', 'eQTL',
+ 'chr', 'gene_mid', "maf")) #, "data"))
+
These should be able to be deleted now
In tests/testthat/test-splatPopParams.R <#106 (comment)>:
> +### These tests are also run in test-splatParams.R, please update both
+test_that("path.from checks work", {
+ pp <- setParams(params, group.prob = c(0.5, 0.5))
+ pp <- setParamUnchecked(pp, "path.from", c(0, 1))
+ expect_silent(validObject(pp))
+ pp <- setParamUnchecked(pp, "path.from", c(0, 3))
+ expect_error(validObject(pp), "invalid class")
+ pp <- setParamUnchecked(pp, "path.from", c(1, 0))
+ expect_error(validObject(pp), "path cannot begin at itself")
+ pp <- newSplatParams()
+ pp <- setParams(pp, group.prob = c(0.3, 0.3, 0.4))
+ pp <- setParamUnchecked(pp, "path.from", c(2, 1, 1))
+ expect_error(validObject(pp), "origin must be specified in path.from")
+ pp <- setParams(params, group.prob = c(0.5, 0.5), path.from = c(0, 1))
+ expect_warning(setParam(pp, "group.prob", 1),
+ "nGroups has changed, resetting path.from")
+ pp <- newSplatParams()
+ pp <- setParams(pp, group.prob = c(0.25, 0.25, 0.25, 0.25))
+ pp <- setParamUnchecked(pp, "path.from", c(0, 4, 2, 3))
+ expect_error(validObject(pp), "path.from cannot contain cycles")
+})
+
+### These tests are also run in test-splatParams.R, please update both
+test_that("dropout.type checks work", {
+ expect_error(setParam(params, "dropout.type", "cell"),
+ "dropout.type cannot be set to 'cell'")
+ pp <- setParams(params, dropout.mid = rep(1, 100),
+ dropout.shape = rep(1, 100))
+ expect_silent(setParam(pp, "dropout.type", "cell"))
+ expect_error(setParam(params, "dropout.type", "a"),
+ "dropout.type must be one of: ")
+})
Not sure we need to recheck these
In vignettes/splatPop.Rmd <#106 (comment)>:
> @@ -0,0 +1,418 @@
+---
+title: "splatPop: simulating single-cell data for populations"
Proofread vignette
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#106 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZEAFQYVS2HLLFEJJ6OR5DSL2MUZANCNFSM4RYESU5A>.
|
Updated pull request following Luke's instructions for submitting a PR from Luke's splatPop branch.