Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dissimilarity functions #609

Merged
merged 66 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
9ed9847
up
TuomasBorman Jul 9, 2024
eec738e
up
TuomasBorman Jul 9, 2024
b701bc1
small fix
thpral Jul 12, 2024
ea6cef7
addDissimilarity function
thpral Jul 12, 2024
4eadecb
fix addDissimilarity
thpral Jul 12, 2024
92fc388
rename JSD and overlap functions
thpral Jul 12, 2024
318501c
man page getDissimilarity
thpral Jul 12, 2024
023636a
rename unifrac functions
thpral Jul 16, 2024
b77acda
Merge branch 'devel' into getDissimilarity
thpral Jul 16, 2024
3d572fa
fix R check warnings
thpral Jul 16, 2024
3685b9b
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Jul 16, 2024
94c13dd
fix getUnifrac SE method
thpral Jul 18, 2024
5af8d57
Merge branch 'devel' into getDissimilarity
antagomir Jul 22, 2024
e988335
fix naming .get_jsd
thpral Jul 22, 2024
ea73d22
remove generic methods for jsd, overlap and unifrac
thpral Jul 22, 2024
b36362c
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Jul 23, 2024
0f84fbb
fix argument types in documentation
thpral Jul 23, 2024
3e1d774
merge devel
thpral Jul 24, 2024
0629c58
fix overlap tests with getDissimilarity
thpral Jul 24, 2024
faf0308
fix error in unifrac
thpral Jul 24, 2024
4dd452f
update pckgdown
thpral Jul 25, 2024
fef4dcd
modify inheritance in man pages
thpral Jul 29, 2024
272baa6
fix documentation
thpral Jul 29, 2024
888c3ee
try to fix unifrac node.label argument
thpral Jul 29, 2024
171354c
up
TuomasBorman Jul 30, 2024
a574a58
Merge branch 'devel' into getDissimilarity
thpral Aug 1, 2024
0db5240
add unifrac, overlap and jsd documentation in getDissimilarity manual…
thpral Aug 1, 2024
f3af07f
merge devel
thpral Aug 1, 2024
53f0d1b
update doc getDissimilarity
thpral Aug 1, 2024
ebfcaaf
deprecate dissimilarity functions
thpral Aug 1, 2024
7b1186e
add check to .add_values_to_reducedDims
thpral Aug 1, 2024
327fd4f
fix deprecated functions and .add_values_to_reducedDims
thpral Aug 2, 2024
01f7fd4
Merge branch 'devel' into getDissimilarity
antagomir Aug 5, 2024
fc05582
up
TuomasBorman Aug 6, 2024
043ffcc
up
TuomasBorman Aug 6, 2024
99fc8a4
Merge branch 'devel' into getDissimilarity
antagomir Aug 6, 2024
09f78cc
address review comments
thpral Aug 7, 2024
b1a67e8
up
thpral Aug 7, 2024
488ff09
add tests for getDissimililarity
thpral Aug 7, 2024
19b3d5f
fix braces in .add_values_to_reducedDims
thpral Aug 7, 2024
d13e17a
add missing argument to runDPCoA documentation
thpral Aug 7, 2024
3aa881d
up
TuomasBorman Aug 8, 2024
914ecae
Simplify. Add method for TreeSE.
TuomasBorman Aug 8, 2024
763cf8c
address review comments
thpral Aug 8, 2024
3038c6b
deprecate runUnifrac
thpral Aug 8, 2024
b04ebf7
add dis matrix to metadata instead of reducedDims
thpral Aug 8, 2024
d89377a
fix getDissimilarity examples and doc
thpral Aug 8, 2024
374c73a
Merge branch 'devel' into getDissimilarity
TuomasBorman Aug 9, 2024
fc44f98
up
TuomasBorman Aug 9, 2024
3d94de9
up
TuomasBorman Aug 9, 2024
c30bf17
up
TuomasBorman Aug 9, 2024
4f16e17
up
TuomasBorman Aug 9, 2024
a27259b
add rarefaction test
thpral Aug 9, 2024
fba150f
up
thpral Aug 9, 2024
cb373d8
Merge branch 'devel' into getDissimilarity
thpral Aug 10, 2024
90ef724
address review comment
thpral Aug 10, 2024
a72dbc2
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Aug 10, 2024
5a2c0be
setting seed in rarefaction tests
thpral Aug 10, 2024
1937528
merge devel
thpral Aug 11, 2024
d6b348a
up
TuomasBorman Aug 13, 2024
376792d
up
TuomasBorman Aug 13, 2024
5c84f17
up
TuomasBorman Aug 13, 2024
dd6bd53
fix rarefaction tests
thpral Aug 13, 2024
ee1557e
up
thpral Aug 13, 2024
616bec2
up
thpral Aug 13, 2024
585cc9b
up
TuomasBorman Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 4 additions & 10 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ export(addAlpha)
export(addCluster)
export(addContaminantQC)
export(addDPCoA)
export(addDissimilarity)
export(addDivergence)
export(addDominant)
export(addLDA)
Expand All @@ -25,11 +26,8 @@ export(calculateCCA)
export(calculateDMN)
export(calculateDMNgroup)
export(calculateDPCoA)
export(calculateJSD)
export(calculateNMDS)
export(calculateOverlap)
export(calculateRDA)
export(calculateUnifrac)
export(cluster)
export(convertFromBIOM)
export(convertFromDADA2)
Expand All @@ -49,6 +47,7 @@ export(getBestDMNFit)
export(getCrossAssociation)
export(getDMN)
export(getDPCoA)
export(getDissimilarity)
export(getDominant)
export(getExperimentCrossAssociation)
export(getExperimentCrossCorrelation)
Expand Down Expand Up @@ -113,11 +112,8 @@ export(right_join)
export(runCCA)
export(runDMN)
export(runDPCoA)
export(runJSD)
export(runNMDS)
export(runOverlap)
export(runRDA)
export(runUnifrac)
thpral marked this conversation as resolved.
Show resolved Hide resolved
export(setTaxonomyRanks)
export(splitByRanks)
export(splitOn)
Expand Down Expand Up @@ -147,6 +143,7 @@ exportMethods(addAlpha)
exportMethods(addCCA)
exportMethods(addCluster)
exportMethods(addContaminantQC)
exportMethods(addDissimilarity)
exportMethods(addDivergence)
exportMethods(addDominant)
exportMethods(addHierarchyTree)
Expand All @@ -164,9 +161,6 @@ exportMethods(agglomerateByVariable)
exportMethods(bestDMNFit)
exportMethods(calculateDMN)
exportMethods(calculateDMNgroup)
exportMethods(calculateJSD)
exportMethods(calculateOverlap)
exportMethods(calculateUnifrac)
thpral marked this conversation as resolved.
Show resolved Hide resolved
exportMethods(checkTaxonomy)
exportMethods(cluster)
exportMethods(convertToBIOM)
Expand All @@ -185,6 +179,7 @@ exportMethods(getCCA)
exportMethods(getCrossAssociation)
exportMethods(getDMN)
exportMethods(getDPCoA)
exportMethods(getDissimilarity)
exportMethods(getDominant)
exportMethods(getExperimentCrossAssociation)
exportMethods(getExperimentCrossCorrelation)
Expand Down Expand Up @@ -232,7 +227,6 @@ exportMethods(rarefyAssay)
exportMethods(relAbundanceCounts)
exportMethods(relabundance)
exportMethods(right_join)
exportMethods(runOverlap)
exportMethods(splitOn)
exportMethods(subsampleCounts)
exportMethods(subsetByPrevalent)
Expand Down
205 changes: 200 additions & 5 deletions R/calculateDistance.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,201 @@
# calculateDistance function is removed. This function is left for other functions
# to use
.calculate_distance <- function(mat, FUN = stats::dist, ...){
# Distance between all samples against all samples
do.call(FUN, c(list(mat),list(...)))
#' Calculate dissimilarities
#'
#' These functions calculate dissimilarities on data stored in a
#' \code{\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}}
#' object.
#'
#' @param x a \code{\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}}
#' object.
#'
#' @param method \code{Character scalar}. Specifies which distance to calculate.
#'
#' @param name \code{Character scalar}. The name to be used to store the result
#' in the reducedDims of the output. (Default: \code{method})
#'
#' @param assay.type \code{Character scalar}. Specifies which assay to use for
#' calculation. (Default: \code{"counts"})
#'
#' @param assay_name \code{Character scalar}. Specifies which assay to use for
#' calculation. (Please use \code{assay.type} instead. At
#' some point \code{assay_name} will be disabled.)
#'
#' @param tree.name \code{Character scalar}. Specifies which tree will be used in
#' calculation. (Default: \code{"phylo"})
#'
#' @param transposed \code{Logical scalar}. Specifies if x is transposed with cells in
#' rows. (Default: \code{FALSE})
#'
#' @param ... other arguments passed onto \code{\link[vegan:vegdist]{vegdist}}
#'
#' @return
#' \code{getDissimilarity} returns a distance matrix.
#'
#' \code{addDissimilarity} returns a
#' \code{\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}}
#' with distance matrix added to reducedDim slot.
#'
#' @name getDissimilarity
#'
#' @export
#'
#' @examples
#' data(GlobalPatterns)
#' tse <- GlobalPatterns
#' tse <- addDissimilarity(tse, method = "overlap", detection = 0.25)
#' reducedDim(tse, "overlap")
#'
NULL

#' @rdname getDissimilarity
#' @export
setGeneric(
"getDissimilarity", signature = c("x"), function(x, method, ...)
standardGeneric("getDissimilarity"))

#' @rdname getDissimilarity
#' @export
setMethod(
"getDissimilarity", signature = c(x = "TreeSummarizedExperiment"),
function(
x, method, assay_name = "counts", assay.type = assay_name,
tree.name = "phylo", transposed = FALSE, ...){
mat <- assay(x, assay.type)
if(!transposed){
mat <- t(mat)
}
tree <- rowTree(x, tree.name)
res <- getDissimilarity(mat, method = method, tree = tree, ...)
return(res)
}
)

#' @rdname getDissimilarity
#' @export
setMethod(
"getDissimilarity", signature = c(x = "SummarizedExperiment"),
function(
x, method, assay_name = "counts", assay.type = assay_name,
transposed = FALSE, ...){
browser()
# Input checks
if(!.is_non_empty_string(assay.type)){
stop("'assay.type' must be a non-empty single character value",
call. = FALSE)
}
if(!.is_non_empty_string(method)){
stop("'method' must be a non-empty single character value",
call. = FALSE)
}
if(!.is_a_bool(transposed)){
stop("'na.rm' must be TRUE or FALSE.", call. = FALSE)
}
mat <- assay(x, assay.type)
if(!transposed){
mat <- t(mat)
}
if( method == "unifrac"){
node.label <- .get_rowlinks(x, assay.type)
mat <- .calculate_dissimilarity(mat = x, method = method,
node.label = node.label, ...)
}
else{
mat <- .calculate_dissimilarity(mat = x, method = method, ...)
}
return(mat)
}
)

#' @rdname getDissimilarity
#' @export
setMethod(
"getDissimilarity", signature = c(x = "ANY"),
function(
x, method, assay_name = "counts", assay.type = assay_name, ...){
# Input check
if( !.is_a_string(method) ){
stop("'method' must be a single character value.", call. = FALSE)
}
#
# Calculate dissimilarity
mat <- .calculate_dissimilarity(mat = x, method = method, ...)
return(mat)
}
)

#' @rdname getDissimilarity
#' @export
setGeneric(
"addDissimilarity", signature = c("x"), function(x, method, ...)
standardGeneric("addDissimilarity"))

#' @rdname getDissimilarity
#' @export
setMethod(
"addDissimilarity", signature = c(x = "SummarizedExperiment"),
function(
x, method, assay_name = "counts", assay.type = assay_name, name = method,
transposed = FALSE, ...){
res <- getDissimilarity(x, method = method, assay.type = assay.type,
transposed = transposed, ...)
# Input checks
if(!.is_non_empty_string(assay.type)){
stop("'assay.type' must be a non-empty single character value",
call. = FALSE)
}
if(!.is_non_empty_string(method)){
stop("'method' must be a non-empty single character value",
call. = FALSE)
}
if(!.is_non_empty_string(name)){
stop("'name' must be a non-empty single character value",
call. = FALSE)
}
if(!.is_a_bool(transposed)){
stop("'na.rm' must be TRUE or FALSE.", call. = FALSE)
}
if( !identical(rownames(as.matrix(res)), colnames(assay(x, assay.type))) ){
warning("Samples of the dissimilarity matrix should be the same as the
samples in columns of the assay specified with 'assay.type'. The
result is not added to reducedDim.")
return(res)
}
else{
.add_values_to_reducedDims(x, as.matrix(res), name)
}

thpral marked this conversation as resolved.
Show resolved Hide resolved
}
)

.calculate_dissimilarity <- function(
mat, method, node.label = NULL, diss.fun = NULL, tree = NULL, ...){
browser()
# input check
if( !(is.null(diss.fun) || is.function(diss.fun)) ){
stop("'diss.fun' must be NULL or a function.", call. = FALSE)
}
#
args <- c(list(mat, method = method, node.label = node.label, list(...)))
# If the dissimilarity functon is not specified, get default choice
if( is.null(diss.fun) ){
if( method %in% c("overlap") ){
diss.fun <- .get_overlap
message("'diss.fun' defaults to .get_overlap.")
} else if( method %in% c("unifrac") ){
args[["tree"]] <- tree
diss.fun <- .get_unifrac
message("'diss.fun' defaults to .get_unifrac.")
} else if( method %in% c("jsd") ){
diss.fun <- .get_jsd
message("'diss.fun' defaults to .get_jsd.")
} else if( requireNamespace("vegan") ){
diss.fun <- vegan::vegdist
message("'diss.fun' defaults to vegan::vegdist.")
} else{
diss.fun <- stats::dist
message("'diss.fun' defaults to stats::dist.")
}
}
# Calculate dissimilarity with specified function
res <- do.call(diss.fun, args)
return(res)
}
101 changes: 2 additions & 99 deletions R/calculateJSD.R
Original file line number Diff line number Diff line change
@@ -1,97 +1,3 @@
#' Calculate the Jensen-Shannon Divergence
#'
#' This function calculates the Jensen-Shannon Divergence (JSD) in a
#' \code{\link[SummarizedExperiment:SummarizedExperiment-class]{SummarizedExperiment}}
#' object.
#'
#' @param x a numeric matrix with samples as rows or a
#' \code{\link[SummarizedExperiment:SummarizedExperiment-class]{SummarizedExperiment}}
#' object.
#'
#' @param assay.type \code{Character scalar}. Specifies the name of the
#' assay used in calculation. (Default: \code{"counts"})
#'
#' @param exprs_values Deprecated. Use \code{assay.type} instead.
#'
#' @param assay_name Deprecated. Use \code{assay.type} instead.
#'
#' @param chunkSize \code{Integer scalar}. Defines the size of data send
#' to the individual worker. Only has an effect, if \code{BPPARAM} defines
#' more than one worker. (Default: \code{nrow(x)})
#'
#' @param BPPARAM A
#' \code{\link[BiocParallel:BiocParallelParam-class]{BiocParallelParam}}
#' object specifying whether the calculation should be parallelized.
#'
#' @param transposed \code{Logical scalar}. Is x transposed with samples in rows?
#' (Default: \code{FALSE})
#'
#' @param ... optional arguments not used.
#'
#' @return a sample-by-sample distance matrix, suitable for NMDS, etc.
#'
#' @seealso
#' \url{http://en.wikipedia.org/wiki/Jensen-Shannon_divergence}
#'
#' @references
#' Jensen-Shannon Divergence and Hilbert space embedding.
#' Bent Fuglede and Flemming Topsoe University of Copenhagen,
#' Department of Mathematics
#' \url{http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf}
#'
#' @name calculateJSD
#'
#' @author
#' Susan Holmes \email{susan@@stat.stanford.edu}.
#' Adapted for phyloseq by Paul J. McMurdie.
#' Adapted for mia by Felix G.M. Ernst
#'
#' @export
#'
#' @examples
#' data(enterotype)
#' library(scater)
#'
#'
#' jsd <- calculateJSD(enterotype)
#' class(jsd)
#' head(jsd)
#'
#' enterotype <- runMDS(enterotype, FUN = calculateJSD, name = "JSD",
#' exprs_values = "counts")
#' head(reducedDim(enterotype))
#' head(attr(reducedDim(enterotype),"eig"))
#' attr(reducedDim(enterotype),"GOF")
NULL

setGeneric("calculateJSD", signature = c("x"),
function(x, ...)
standardGeneric("calculateJSD"))

#' @rdname calculateJSD
#' @export
setMethod("calculateJSD", signature = c(x = "ANY"),
function(x, ...){
.calculate_distance(x, FUN = runJSD, ...)
}
)

#' @rdname calculateJSD
#'
#' @importFrom SummarizedExperiment assay
#'
#' @export
setMethod("calculateJSD", signature = c(x = "SummarizedExperiment"),
function(x, assay.type = assay_name, assay_name = exprs_values,
exprs_values = "counts", transposed = FALSE, ...){
mat <- assay(x, assay.type)
if(!transposed){
mat <- t(mat)
}
calculateJSD(mat, ...)
}
)

# written by Susan Holmes \email{susan@@stat.stanford.edu}.
# Adapted for phyloseq by Paul J. McMurdie.
# Adapted for mia by Felix G.M. Ernst
Expand All @@ -113,15 +19,12 @@ setMethod("calculateJSD", signature = c(x = "SummarizedExperiment"),
return(rowSums(d, na.rm = TRUE))
}

#' @rdname calculateJSD
#'
#' @importFrom utils combn
#' @importFrom stats as.dist
#' @importFrom BiocParallel SerialParam register bplapply bpisup bpstart bpstop
#' @importFrom DelayedArray getAutoBPPARAM setAutoBPPARAM
#'
#' @export
runJSD <- function(x, BPPARAM = SerialParam(), chunkSize = nrow(x)){
#'
.get_jsd <- function(x, BPPARAM = SerialParam(), chunkSize = nrow(x), ...){
# input check
if(is.null(rownames(x))){
rownames(x) <- seq_len(nrow(x))
Expand Down
Loading
Loading