Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scDblFinder error when using aggregateFeatures and knownDoublets #82

Open
dottercp opened this issue Jul 19, 2023 · 2 comments
Open

scDblFinder error when using aggregateFeatures and knownDoublets #82

dottercp opened this issue Jul 19, 2023 · 2 comments

Comments

@dottercp
Copy link

dottercp commented Jul 19, 2023

Dear developers,

I'm working with multiplexed (CMO) scATAC-seq data (one 10X-run has 6 samples) which gives me information on known doublets from overlap of hashtags. When using the scDblFinder function for this data I wanted to provide these doublets as knownDoublets and aggregate features as recommended in the vignette. However, I found that this combination of parameters does not work and throws an error (see below). After some debugging I found that the source of the issue might be that the splitting of the dataset in known doublets (sce.dbl) and others (sce) is performed before aggregation which leads to a mismatch of row names between the two subsets.

MRE -- Minimal example to reproduce the bug

scDblFinder(
  sce = sce,
  dims = 50,
  aggregateFeatures = TRUE,
  knownDoublets = (sce$ident == doublet_sample), 
  knownUse = "discard"
)

Traceback

6: stop(sprintf(fmt, msg))
5: SummarizedExperiment:::.SummarizedExperiment.charbound(subset, 
       names, fmt)
4: .convert_subset_index(i, rownames(x))
3: sce.dbl[sel_features, ]
2: sce.dbl[sel_features, ]
1: scDblFinder::scDblFinder(sce = sce, dims = 50, aggregateFeatures = TRUE, 
       knownDoublets = sce$ident == doublet_sample, knownUse = knownUse)`

Session info

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] qs_0.25.5                          furrr_0.3.1                        future_1.32.0                     
 [4] intrinsicDimension_1.2.0           yaImpute_1.0-33                    glmGamPoi_1.12.1                  
 [7] Palo_1.1                           here_1.0.1                         ComplexHeatmap_2.16.0             
[10] pheatmap_1.0.12                    ggpp_0.5.2                         BSgenome.Mmusculus.UCSC.mm10_1.4.3
[13] BSgenome_1.67.4                    rtracklayer_1.59.1                 Biostrings_2.67.2                 
[16] XVector_0.39.0                     tarchetypes_0.7.6                  scuttle_1.10.1                    
[19] Signac_1.10.0                      scDblFinder_1.14.0                 SingleCellExperiment_1.22.0       
[22] SummarizedExperiment_1.29.1        Biobase_2.59.0                     GenomicRanges_1.51.4              
[25] GenomeInfoDb_1.35.17               IRanges_2.33.1                     S4Vectors_0.38.1                  
[28] BiocGenerics_0.45.3                MatrixGenerics_1.12.2              matrixStats_1.0.0                 
[31] targets_1.1.3                      SeuratObject_4.1.3                 Seurat_4.3.0                      
[34] lubridate_1.9.2                    forcats_1.0.0                      stringr_1.5.0                     
[37] dplyr_1.1.2                        purrr_1.0.1                        readr_2.1.4                       
[40] tidyr_1.3.0                        tibble_3.2.1                       ggplot2_3.4.2                     
[43] tidyverse_2.0.0                   
@plger
Copy link
Owner

plger commented Jul 20, 2023

Hi,
thanks for reporting this.
Until I fix it, what you can do is run the aggregation separately, e.g. this should reproduce what you're trying to do:

sce.ag <- aggregateFeatures(sce, k=50)
sce.ag <- scDblFinder(sce.ag, processing="normFeatures",
                      knownDoublets = (sce.ag$ident == doublet_sample))
sce$scDblFinder.score <- sce.ag$scDblFinder.score

As a note, I'm also exploring now doing it with a high k (e.g. 500) and with the normal processing, e.g.:

sce.ag <- aggregateFeatures(sce, k=500)
sce.ag <- scDblFinder(sce.ag)

Although I still haven't tested systematically that it's better...

@dottercp
Copy link
Author

Thanks a lot for the quick answer and the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants