Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dissimilarity functions #609

Merged
merged 66 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
9ed9847
up
TuomasBorman Jul 9, 2024
eec738e
up
TuomasBorman Jul 9, 2024
b701bc1
small fix
thpral Jul 12, 2024
ea6cef7
addDissimilarity function
thpral Jul 12, 2024
4eadecb
fix addDissimilarity
thpral Jul 12, 2024
92fc388
rename JSD and overlap functions
thpral Jul 12, 2024
318501c
man page getDissimilarity
thpral Jul 12, 2024
023636a
rename unifrac functions
thpral Jul 16, 2024
b77acda
Merge branch 'devel' into getDissimilarity
thpral Jul 16, 2024
3d572fa
fix R check warnings
thpral Jul 16, 2024
3685b9b
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Jul 16, 2024
94c13dd
fix getUnifrac SE method
thpral Jul 18, 2024
5af8d57
Merge branch 'devel' into getDissimilarity
antagomir Jul 22, 2024
e988335
fix naming .get_jsd
thpral Jul 22, 2024
ea73d22
remove generic methods for jsd, overlap and unifrac
thpral Jul 22, 2024
b36362c
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Jul 23, 2024
0f84fbb
fix argument types in documentation
thpral Jul 23, 2024
3e1d774
merge devel
thpral Jul 24, 2024
0629c58
fix overlap tests with getDissimilarity
thpral Jul 24, 2024
faf0308
fix error in unifrac
thpral Jul 24, 2024
4dd452f
update pckgdown
thpral Jul 25, 2024
fef4dcd
modify inheritance in man pages
thpral Jul 29, 2024
272baa6
fix documentation
thpral Jul 29, 2024
888c3ee
try to fix unifrac node.label argument
thpral Jul 29, 2024
171354c
up
TuomasBorman Jul 30, 2024
a574a58
Merge branch 'devel' into getDissimilarity
thpral Aug 1, 2024
0db5240
add unifrac, overlap and jsd documentation in getDissimilarity manual…
thpral Aug 1, 2024
f3af07f
merge devel
thpral Aug 1, 2024
53f0d1b
update doc getDissimilarity
thpral Aug 1, 2024
ebfcaaf
deprecate dissimilarity functions
thpral Aug 1, 2024
7b1186e
add check to .add_values_to_reducedDims
thpral Aug 1, 2024
327fd4f
fix deprecated functions and .add_values_to_reducedDims
thpral Aug 2, 2024
01f7fd4
Merge branch 'devel' into getDissimilarity
antagomir Aug 5, 2024
fc05582
up
TuomasBorman Aug 6, 2024
043ffcc
up
TuomasBorman Aug 6, 2024
99fc8a4
Merge branch 'devel' into getDissimilarity
antagomir Aug 6, 2024
09f78cc
address review comments
thpral Aug 7, 2024
b1a67e8
up
thpral Aug 7, 2024
488ff09
add tests for getDissimililarity
thpral Aug 7, 2024
19b3d5f
fix braces in .add_values_to_reducedDims
thpral Aug 7, 2024
d13e17a
add missing argument to runDPCoA documentation
thpral Aug 7, 2024
3aa881d
up
TuomasBorman Aug 8, 2024
914ecae
Simplify. Add method for TreeSE.
TuomasBorman Aug 8, 2024
763cf8c
address review comments
thpral Aug 8, 2024
3038c6b
deprecate runUnifrac
thpral Aug 8, 2024
b04ebf7
add dis matrix to metadata instead of reducedDims
thpral Aug 8, 2024
d89377a
fix getDissimilarity examples and doc
thpral Aug 8, 2024
374c73a
Merge branch 'devel' into getDissimilarity
TuomasBorman Aug 9, 2024
fc44f98
up
TuomasBorman Aug 9, 2024
3d94de9
up
TuomasBorman Aug 9, 2024
c30bf17
up
TuomasBorman Aug 9, 2024
4f16e17
up
TuomasBorman Aug 9, 2024
a27259b
add rarefaction test
thpral Aug 9, 2024
fba150f
up
thpral Aug 9, 2024
cb373d8
Merge branch 'devel' into getDissimilarity
thpral Aug 10, 2024
90ef724
address review comment
thpral Aug 10, 2024
a72dbc2
Merge branch 'getDissimilarity' of https://github.com/microbiome/mia …
thpral Aug 10, 2024
5a2c0be
setting seed in rarefaction tests
thpral Aug 10, 2024
1937528
merge devel
thpral Aug 11, 2024
d6b348a
up
TuomasBorman Aug 13, 2024
376792d
up
TuomasBorman Aug 13, 2024
5c84f17
up
TuomasBorman Aug 13, 2024
dd6bd53
fix rarefaction tests
thpral Aug 13, 2024
ee1557e
up
thpral Aug 13, 2024
616bec2
up
thpral Aug 13, 2024
585cc9b
up
TuomasBorman Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ export(addAlpha)
export(addCluster)
export(addContaminantQC)
export(addDPCoA)
export(addDissimilarity)
export(addDivergence)
export(addDominant)
export(addLDA)
Expand Down Expand Up @@ -49,6 +50,7 @@ export(getBestDMNFit)
export(getCrossAssociation)
export(getDMN)
export(getDPCoA)
export(getDissimilarity)
export(getDominant)
export(getExperimentCrossAssociation)
export(getExperimentCrossCorrelation)
Expand Down Expand Up @@ -117,7 +119,6 @@ export(runJSD)
export(runNMDS)
export(runOverlap)
export(runRDA)
export(runUnifrac)
thpral marked this conversation as resolved.
Show resolved Hide resolved
export(setTaxonomyRanks)
export(splitByRanks)
export(splitOn)
Expand Down Expand Up @@ -147,6 +148,7 @@ exportMethods(addAlpha)
exportMethods(addCCA)
exportMethods(addCluster)
exportMethods(addContaminantQC)
exportMethods(addDissimilarity)
exportMethods(addDivergence)
exportMethods(addDominant)
exportMethods(addHierarchyTree)
Expand Down Expand Up @@ -185,6 +187,7 @@ exportMethods(getCCA)
exportMethods(getCrossAssociation)
exportMethods(getDMN)
exportMethods(getDPCoA)
exportMethods(getDissimilarity)
exportMethods(getDominant)
exportMethods(getExperimentCrossAssociation)
exportMethods(getExperimentCrossCorrelation)
Expand Down Expand Up @@ -232,6 +235,7 @@ exportMethods(rarefyAssay)
exportMethods(relAbundanceCounts)
exportMethods(relabundance)
exportMethods(right_join)
exportMethods(runJSD)
exportMethods(runOverlap)
exportMethods(splitOn)
exportMethods(subsampleCounts)
Expand Down Expand Up @@ -340,9 +344,7 @@ importFrom(ape,collapse.singles)
importFrom(ape,cophenetic.phylo)
importFrom(ape,drop.tip)
importFrom(ape,has.singles)
importFrom(ape,is.rooted)
importFrom(ape,read.tree)
importFrom(ape,root)
importFrom(bluster,clusterRows)
importFrom(decontam,isContaminant)
importFrom(decontam,isNotContaminant)
Expand Down
319 changes: 314 additions & 5 deletions R/calculateDistance.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,315 @@
# calculateDistance function is removed. This function is left for other functions
# to use
.calculate_distance <- function(mat, FUN = stats::dist, ...){
# Distance between all samples against all samples
do.call(FUN, c(list(mat),list(...)))
#' Calculate dissimilarities
#'
#' These functions are designed to calculate dissimilarities on data stored
#' within a
#' \code{\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}}
#' object. For overlap, Unifrac, and Jensen-Shannon Divergence (JSD)
#' dissimilarities, the functions use mia internal functions, while for other
#' types of dissimilarities, they rely on \code{\link[vegan:vegdist]{vegdist}}.
#'
#' @param x a \code{\link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}}
#' object.
#'
#' @param method \code{Character scalar}. Specifies which distance to calculate.
#'
#' @param name \code{Character scalar}. The name to be used to store the result
#' in the reducedDims of the output. (Default: \code{method})
#'
#' @param assay.type \code{Character scalar}. Specifies which assay to use for
#' calculation. (Default: \code{"counts"})
#'
#' @param assay_name Deprecated. Use \code{assay.type} instead.
#'
#' @param exprs_values Deprecated. Use \code{assay.type} instead.
#'
#' @param transposed \code{Logical scalar}. Specifies if x is transposed with cells in
#' rows. (Default: \code{FALSE})
#'
#' @param tree.name \code{Character scalar}. Specific to unifrac dissimilarity.
#' Specifies the name of the tree used in calculation. (Default: \code{"phylo"})
#'
#' @param tree_name Deprecated. Use \code{tree.name} instead.
#'
thpral marked this conversation as resolved.
Show resolved Hide resolved
#' @param ... other arguments passed onto \code{\link[vegan:vegdist]{vegdist}},
#' or the following arguments passed onto mia internal functions for overlap,
#' unifrac and JSD dissimilarities:
#' \itemize{
#' \item \code{weighted}: Specific to unifrac dissimilarity.
#' \code{Logical scalar}. Should use weighted-Unifrac calculation?
#' Weighted-Unifrac takes into account the relative abundance of
#' species/taxa shared between samples, whereas unweighted-Unifrac only
#' considers presence/absence. Default is \code{FALSE}, meaning the
#' unweighted-Unifrac distance is calculated for all pairs of samples.
#' (Default: \code{FALSE})
#'
#' \item \code{chunkSize}: \code{Integer scalar}. Defines the size of data
#' send to the individual worker. Only has an effect, if \code{BPPARAM}
#' defines more than one worker. (Default: \code{nrow(x)})
#'
#' \item \code{BPPARAM}:
#' \code{\link[BiocParallel:BiocParallelParam-class]{BiocParallelParam} object}.
#' Specifies whether the calculation should be parallelized.
#'
#' \item \code{detection}: \code{Numeric scalar}. Specific to overlap dissimilarity.
#' Defines detection threshold for absence/presence of features. Feature that
#' has abundance under threshold in either of samples, will be discarded when
#' evaluating overlap between samples. (Default: \code{0})
#' }
#'
#' @return
#' \code{getDissimilarity} returns a sample-by-sample distance matrix, suitable
#' for NMDS, etc.
#'
#' \code{addDissimilarity} returns \code{x} that includes distance matrix in its
#' reducedDim.
#'
#' @details
#' Overlap reflects similarity between sample-pairs. When overlap is
#' calculated using relative abundances, the higher the value the higher the
#' similarity is.
#' When using relative abundances, overlap value 1 means that
#' all the abundances of features are equal between two samples, and 0 means
#' that samples have completely different relative abundances.
#'
#' Mia unifrac internal function utilizes
#' \code{\link[rbiom:unifrac]{rbiom:unifrac()}}.
#'
#' @name getDissimilarity
#'
#' @author
#' For overlap implementation:
#' Leo Lahti and Tuomas Borman. Contact: \url{microbiome.github.io}
#'
#' For JSD implementation:
#' Susan Holmes \email{susan@@stat.stanford.edu}.
#' Adapted for phyloseq by Paul J. McMurdie.
#' Adapted for mia by Felix G.M. Ernst
#'
#' @seealso
#' \url{http://en.wikipedia.org/wiki/Jensen-Shannon_divergence}
#'
#' @references
#' For unifrac dissimilarity: \url{http://bmf.colorado.edu/unifrac/}
#'
#' See also additional descriptions of Unifrac in the following articles:
#'
#' Lozupone, Hamady and Knight, ``Unifrac - An Online Tool for Comparing
#' Microbial Community Diversity in a Phylogenetic Context.'', BMC
#' Bioinformatics 2006, 7:371
#'
#' Lozupone, Hamady, Kelley and Knight, ``Quantitative and qualitative (beta)
#' diversity measures lead to different insights into factors that structure
#' microbial communities.'' Appl Environ Microbiol. 2007
#'
#' Lozupone C, Knight R. ``Unifrac: a new phylogenetic method for comparing
#' microbial communities.'' Appl Environ Microbiol. 2005 71 (12):8228-35.
#'
#' For JSD dissimilarity:
#' Jensen-Shannon Divergence and Hilbert space embedding.
#' Bent Fuglede and Flemming Topsoe University of Copenhagen,
#' Department of Mathematics
#' \url{http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf}
#'
#' @export
#'
#' @examples
#' library(mia)
#' library(scater)
#'
#' # load dataset
#' data(GlobalPatterns)
#' tse <- GlobalPatterns
#'
#' ### Overlap dissimilarity
#'
#' tse <- addDissimilarity(tse, method = "overlap", detection = 0.25)
#' reducedDim(tse, "overlap")[1:6, 1:6]
#'
#' ### JSD dissimilarity
#'
#' tse <- addDissimilarity(tse, method = "jsd")
#' reducedDim(tse, "jsd")[1:6, 1:6]
#'
#' # Multi Dimensional Scaling applied to JSD distance matrix
#' tse <- runMDS(tse, FUN = getDissimilarity, method = "overlap",
#' assay.type = "counts")
#' reducedDim(tse, "MDS")[1:6, ]
#'
#' ### Unifrac dissimilarity
#'
#' res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE)
#' dim(as.matrix((res)))
#'
#' tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE)
#' reducedDim(tse, "unifrac")[1:6, 1:6]
#'
NULL

#' @rdname getDissimilarity
#' @export
setGeneric(
"addDissimilarity", signature = c("x"), function(x, method, ...)
standardGeneric("addDissimilarity"))

#' @rdname getDissimilarity
#' @export
setMethod(
"addDissimilarity", signature = c(x = "SummarizedExperiment"),
function(
x, method, assay_name = "counts", assay.type = assay_name, name = method,
transposed = FALSE, tree_name = "phylo", tree.name = tree_name, ...){
#
thpral marked this conversation as resolved.
Show resolved Hide resolved
res <- getDissimilarity(
x, method = method, assay.type = assay.type, transposed = transposed,
tree.name = tree.name, ...)

.add_values_to_reducedDims(x, values = as.matrix(res), name = name,
assay.type = assay.type)
}
)

#' @rdname getDissimilarity
#' @export
setGeneric(
"getDissimilarity", signature = c("x"), function(x, method, ...)
standardGeneric("getDissimilarity"))

#' @rdname getDissimilarity
#' @export
setMethod(
"getDissimilarity", signature = c(x = "SummarizedExperiment"),
function(
x, method, exprs_values = "counts", assay_name = exprs_values,
assay.type = assay_name, transposed = FALSE, tree_name = "phylo",
tree.name = tree_name, ...){
# Input checks
.check_assay_present(assay.type, x)
if( !.is_non_empty_string(method) ){
stop("'method' must be a non-empty single character value",
call. = FALSE)
}
if( !.is_a_bool(transposed) ){
stop("'na.rm' must be TRUE or FALSE.", call. = FALSE)
}
#
# If mrthod is unifrac, the object is TreeSE and tree was not provided by
# user, get tree arguments from TreeSE in addition to matrix and method.
if( method %in% c("unifrac") && !"tree" %in% names(list(...)) &&
is(x, "TreeSummarizedExperiment") ){
args <- .get_tree_args(
x, method = method, assay.type = assay.type,
transposed = transposed, tree.name = tree.name, ...)
} else{
# For other methods, get only matrix and method for arguments.
mat <- assay(x, assay.type)
if( !transposed ){
mat <- t(mat)
}
args <- c(list(x = mat, method = method), list(...))
}
# Calculate dissimilarity based on matrix
mat <- do.call(getDissimilarity, args)
return(mat)
}
)

#' @rdname getDissimilarity
#' @export
setMethod(
"getDissimilarity", signature = c(x = "ANY"),
function(
x, method, exprs_values = "counts", assay_name = exprs_values,
assay.type = assay_name, ...){
# Input check
if( !.is_a_string(method) ){
stop("'method' must be a single character value.", call. = FALSE)
}
#
# Calculate dissimilarity
mat <- .calculate_dissimilarity(mat = x, method = method, ...)
return(mat)
}
)

.calculate_dissimilarity <- function(
mat, method, node.label = NULL, diss.fun = NULL, tree = NULL, ...){
# input check
if( !(is.null(diss.fun) || is.function(diss.fun)) ){
thpral marked this conversation as resolved.
Show resolved Hide resolved
stop("'diss.fun' must be NULL or a function.", call. = FALSE)
}
#
args <- c(list(mat, method = method), list(...))
# If the dissimilarity functon is not specified, get default choice
if( is.null(diss.fun) ){
if( method %in% c("overlap") ){
diss.fun <- .get_overlap
message("'diss.fun' defaults to .get_overlap.")
} else if( method %in% c("unifrac") ){
args <- c(args, list(tree = tree, node.label = node.label))
diss.fun <- .get_unifrac
message("'diss.fun' defaults to .get_unifrac.")
} else if( method %in% c("jsd") ){
diss.fun <- .get_jsd
message("'diss.fun' defaults to mia:::.get_jsd.")
} else if( requireNamespace("vegan") ){
diss.fun <- vegan::vegdist
message("'diss.fun' defaults to vegan::vegdist.")
} else{
diss.fun <- stats::dist
message("'diss.fun' defaults to stats::dist.")
}
}
# Calculate dissimilarity with specified function
res <- do.call(diss.fun, args)
return(res)
}

.get_tree_args <- function(
x, method, assay.type = assay_name, assay_name = exprs_values,
exprs_values = "counts", tree.name = tree_name,
tree_name = "phylo", transposed = FALSE, ...){
# Get functions and parameters based on direction
tree_present_FUN <- if (transposed) .check_colTree_present
else .check_rowTree_present
tree_FUN <- if (transposed) colTree else rowTree
links_FUN <- if (transposed) colLinks else rowLinks
margin_name <- if (transposed) "col" else "row"
# Check assay.type
.check_assay_present(assay.type, x)
# Check tree.name
tree_present_FUN(tree.name, x)
#
# Select only those features/samples that are in the tree
links <- links_FUN(x)
present_in_tree <- links[, "whichTree"] == tree.name
if( any(!present_in_tree) ){
warning(
"Not all ", margin_name, "s were present in the ", margin_name,
"Tree specified by 'tree.name'. 'x' is subsetted.",
call. = FALSE)
# Subset the data
if( transposed ){
x <- x[, present_in_tree]
} else{
x <- x[present_in_tree, ]
}
}
# Get assay. By default, dissimilarity between samples is calculated. In
# dissimilarity functions, features must be in columns and samples in rows
# in this case.
mat <- assay(x, assay.type)
if( !transposed ){
mat <- t(mat)
}
# Get tree
tree <- tree_FUN(x, tree.name)
# Get links and take only nodeLabs
links <- links_FUN(x)
links <- links[ , "nodeLab" ]
node.label <- links

# Create an arument list that includes matrix, and tree-related parameters.
args <- list(x = mat, method = method, tree = tree, node.label = node.label)
args <- c(args, list(...))
return(args)
}
Loading
Loading