Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FINEMAP credible sets not included in the results #7

Open
jdblischak opened this issue Apr 8, 2021 · 13 comments
Open

FINEMAP credible sets not included in the results #7

jdblischak opened this issue Apr 8, 2021 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@jdblischak
Copy link

While exploring the results in the echolocatoR Shiny app, I noticed that FINEMAP never returns more than 1 credible set per locus. This was confusing since SuSiE often returns more than 1 credible set.

I started diving into the code, and I think the issue is that it only looks for data.cred, when FINEMAP returns data.cred# where # is the number of credible sets. PolyFun obtains the credible sets by searching backwards from the maximum number of causal SNPs until it finds a .cred file (source):

        #add causal set info
        df_finemap['CREDIBLE_SET'] = 0
        cred_file = None
        for m in range(num_causal_snps, 0, -1):
            if os.path.exists(cred_filename+str(m)):
                cred_file = cred_filename+str(m)
                break
        if cred_file is None:
            raise IOError('cred file not found')
        df_cred = pd.read_table(cred_file, sep=' ', usecols=(lambda c: c.startswith('cred')), comment='#')
        df_finemap.set_index('SNP', inplace=True, drop=False)
        for c_i, c in enumerate(df_cred.columns):
            df_finemap.loc[df_cred[c].dropna(), 'CREDIBLE_SET'] = c_i+1
        df_finemap.reset_index(inplace=True, drop=True)

echolocatoR only checks for data.cred:

https://github.com/RajLabMSSM/echolocatoR/blob/b055ac0fb74c914d7600e3afc650a2ffd7149396/R/FINEMAP.R#L336

And then when it is extracting the PIPs from the .snp file, it assigns any SNP that meets the PIP threshold to CS 1.

https://github.com/RajLabMSSM/echolocatoR/blob/b055ac0fb74c914d7600e3afc650a2ffd7149396/R/FINEMAP.R#L363-L364

I attempted to put together a minimal, reproducible example to demonstrate this behavior. However, using the latest version on master, FINEMAP is currently returning NA for both the .CS and the .PP columns.

See here for minimal, reproducible example
library(echolocatoR)
stopifnot(packageVersion("echolocatoR") == "0.1.2")

data("Nalls_top_SNPs")
top_SNPs <- import_topSNPs(
  topSS = Nalls_top_SNPs,
  position_col = "BP",
  pval_col = "P, all studies",
  effect_col = "Beta, all studies",
  gene_col = "Nearest Gene",
  locus_col = "Nearest Gene",
  remove_variants = "rs34637584"
)
fullSS_path <- example_fullSS()

Nalls23andMe_2019.results <- finemap_loci(
  top_SNPs = top_SNPs,
  results_dir = file.path(getwd(), "results"),
  loci = "BST1",
  dataset_name = "Nalls23andMe_2019",
  remove_tmps = FALSE,
  fullSS_path = fullSS_path,
  query_by = "tabix",
  snp_col = "RSID",
  pval_col = "p",
  effect_col = "beta",
  stderr_col = "se",
  freq_col = "freq",
  MAF_col = "calculate",
  bp_distance = 10000,
  min_MAF = 0.001,
  finemap_methods = c("FINEMAP", "SUSIE"),
  LD_reference = "UKB",
  download_method = "axel",
  plot.types = c()
)

Nalls23andMe_2019.results[SUSIE.CS > 0, list(SNP, FINEMAP.CS, FINEMAP.PP, SUSIE.CS, SUSIE.PP)]
readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.snp", n = 3)
readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.cred5")

sessionInfo()
See here for the results
> source("reprex.R", echo = TRUE)

> library(echolocatoR)
Registered S3 method overwritten by 'GGally':
  method from
  +.gg   ggplot2
Possible Ensembl SSL connectivity problems detected.
Please see the 'Connection Troubleshooting' section of the biomaRt vignette
vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :
  SSL certificate problem: self signed certificate in certificate chain

Bioconductor version 3.12 (BiocManager 1.30.12), ?BiocManager::install for help

> stopifnot(packageVersion("echolocatoR") == "0.1.2")

> data("Nalls_top_SNPs")

> top_SNPs <- import_topSNPs(
+   topSS = Nalls_top_SNPs,
+   position_col = "BP",
+   pval_col = "P, all studies",
+   effect_col = "Beta, all studie ..." ... [TRUNCATED]
[1] "+ Assigning gene_col and locus_col independently"

> fullSS_path <- example_fullSS()

> Nalls23andMe_2019.results <- finemap_loci(
+   top_SNPs = top_SNPs,
+   results_dir = file.path(getwd(), "results"),
+   loci = "BST1",
+   dataset_ .... [TRUNCATED]
[1] "+ CONDA:: Activating conda env 'echoR'"
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
BST1 (1 / 1)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "+ Extracting relevant variants from fullSS..."
[1] "+ Query Method: tabix"
[1] "+ QUERY: Chromosome = 4 ; Min position = 15727348 ; Max position = 15747348"
[1] "TABIX:: Converting full summary stats file to tabix format for fast querying..."
[1] "+ CONDA:: Identified bgzip executable in echoR env."
[1] "( grep 'CHR' ./Nalls23andMe_2019.fullSS_subset.tsv; grep -v ^'CHR' ./Nalls23andMe_2019.fullSS_subset.tsv | sort -k1,1 -k2,2n ) | ~/mambaforge/envs/echoR/bin/bgzip -f > ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz"
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "TABIX:: Indexing"
[1] "~/mambaforge/envs/echoR/bin/tabix -f -S 1 -s 1 -b 2 -e 2 ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz"
[1] "Determining chrom type from file header"
[1] "Chromosome format = 1"
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "TABIX:: Extracting subset of sum stats"
[1] "+ TABIX:: ~/mambaforge/envs/echoR/bin/tabix -h ~/echolocatoR/results/GWAS/Nalls23andMe_2019/Nalls23andMe_2019.fullSS_subset.tsv.gz 4:15727348-15747348"
[1] "+ TABIX:: Returning 115 x 11 data.table"
[1] "++ Saving query ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz"
[1] "LD:: Standardizing summary statistics subset."
[1] "++ Preparing Gene col"
[1] "++ Preparing A1,A1 cols"
[1] "++ Preparing MAF,Freq cols"
[1] "++ Inferring MAF from frequency column..."
[1] "++ Preparing N_cases,N_controls cols"
[1] "++ Preparing `proportion_cases` col"
[1] "++ Calculating `proportion_cases`."
[1] "++ Preparing N col"
[1] "+ Preparing sample_size (N) column"
[1] "++ Computing effective sample size."
[1] "++ Preparing t-stat col"
[1] "+ Calculating t-statistic from Effect and StdErr..."
[1] "++ Assigning lead SNP"
[1] "++ Ensuring Effect, StdErr, P are numeric"
[1] "++ Ensuring 1 SNP per row"
[1] "++ Removing extra whitespace"
[1] "++ Saving subset ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz"
[1] "+ Extraction completed in 6.72 seconds"
[1] "+ 115 SNPs x  16 columns"
[1] "LD:: Using UK Biobank LD reference panel."
[1] "+ UKB LD file name: chr4_15000001_18000001"
[1] "+ LD:: Downloading full .gz/.npz UKB files and saving to disk."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ LD:: load_ld() python function input: ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001"
[1] "+ LD:: Reading LD matrix into memory. This could take some time..."
~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001.gz
~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001.npz
Processed URL: ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_15000001_18000001
Some other message at the end
[1] "+ Full UKB LD matrix: 20815 x 20815"
[1] "+ Full UKB LD SNP data.table: 20815 x 5"
[1] "+ LD:: Saving LD matrix ==> ~/echolocatoR/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.UKB_LD.RDS"
[1] "115 x 115 LD_matrix (sparse)"
[1] "+ FILTER:: Filtering by LD features."
[1] "FILTER:: Filtering by SNP features."
[1] "+ FILTER:: Removing SNPs with MAF < 0.001"
[1] "+ FILTER:: Post-filtered data: 115 x 16"
vvvvv-- FINEMAP --vvvvvAll required columns present.All suggested columns present.
vvvvv-- SUSIE --vvvvvAll required columns present.All suggested columns present.
[1] "++ Fine-mapping using multiple tools: FINEMAP, SUSIE"

+++ Multi-finemap:: FINEMAP +++
[1] "++ FINEMAP:: Constructing master file."
[1] "++ FINEMAP:: Constructing data.z file."
[1] "++ FINEMAP:: Constructing data.ld file."
[1] "+ Using FINEMAP v1.4"

|--------------------------------------|
| Welcome to FINEMAP v1.4              |
|                                      |
| (c) 2015-2020 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - christian.benner@helsinki.fi       |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 1
- prior-k0           : 0
- prior-std          : 0.05
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!

- Number of GWAS samples           : 216621
- Number of SNPs                   : 115
- Prior-Pr(# of causal SNPs is k)  :
  (0 -> 0)
   1 -> 0.584
   2 -> 0.292
   3 -> 0.0964
   4 -> 0.0237
   5 -> 0.00461
- 1616 configurations evaluated (0.104/100%) : converged after 104 iterations
- Computing causal SNP statistics  : done!
- Regional SNP heritability        : 0.00763 (SD: 0.000422 ; 95% CI: [0.00679,0.0085])
- Log10-BF of >= one causal SNP    : 2.28e+04
- Post-expected # of causal SNPs   : 5
- Post-Pr(# of causal SNPs is k)   :
  (0 -> 0)
   1 -> 0
   2 -> 0
   3 -> 0
   4 -> 2.87e-262
   5 -> 1
- Computing credible sets          : done!
- Writing output                   : done!
- Run time                         : 0 hours, 0 minutes, 0 seconds
[1] ".cred not detected. Using .snp instead."
[1] "+ FINEMAP:: Importing prob (.snp)..."
Error in `[.data.table`(x, r, vars, with = FALSE) :
  column(s) not found: prob
[1] "++ Credible Set SNPs identified = 0"
[1] "++ Merging FINEMAP results with multi-finemap data."

+++ Multi-finemap:: SUSIE +++
[1] "+ Preparing sample_size (N) column"
[1] "+ `N` column already present in data."
[1] "+ SUSIE:: sample_size= 216621"
[1] "+ SUSIE:: max_causal = 5"
[1] "+ Subsetting LD matrix and finemap_dat to common SNPs..."
[1] "+ LD:: Removing unnamed rows/cols"
[1] "+ LD:: Replacing NAs with 0"
[1] "+ LD_matrix = 115 SNPs."
[1] "+ finemap_dat = 115 SNPs."
[1] "+ 115 SNPs in common."
[1] "+ SUSIE:: Using `susie_suff_stat()` from susieR v0.10.1"
[1] "+ SUSIE:: Extracting Credible Sets..."
[1] "++ Credible Set SNPs identified = 3"
[1] "++ Merging SUSIE results with multi-finemap data."
[1] "+ Identifying Consensus SNPs..."
[1] "++ support_thresh = 2"
[1] "++ top_CS_only=FALSE"
[1] "+ Calculating mean Posterior Probability (mean.PP)..."
[1] "+ Replacing PP==NA with 0"
[1] "++ 2 fine-mapping methods used."
[1] "++ 3 Credible Set SNPs identified."
[1] "++ 0 Consensus SNPs identified."
[1] "+ Fine-mapping with ' FINEMAP, SUSIE ' completed:"
Time difference of 3.1 secs

Fine-mapping complete in:
Time difference of 1.7 mins
[1] "+ Identifying Consensus SNPs..."
[1] "++ support_thresh = 2"
[1] "++ top_CS_only=FALSE"
[1] "+ Calculating mean Posterior Probability (mean.PP)..."
[1] "+ Replacing PP==NA with 0"
[1] "++ 2 fine-mapping methods used."
[1] "++ 3 Credible Set SNPs identified."
[1] "++ 0 Consensus SNPs identified."

> Nalls23andMe_2019.results[SUSIE.CS > 0, list(SNP, FINEMAP.CS, FINEMAP.PP, SUSIE.CS, SUSIE.PP)]
          SNP FINEMAP.CS FINEMAP.PP SUSIE.CS  SUSIE.PP
1: rs34559912         NA         NA        3 1.0000000
2:  rs4389574         NA         NA        1 1.0000000
3:  rs6852450         NA         NA        2 0.9999992

> readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.snp", n = 3)
[1] "index rsid chromosome position allele1 allele2 maf beta se z prob log10bf mean sd mean_incl sd_incl"
[2] "1 rs6828144 4 15727389 T C 0.0263 0.1765 0.0309 5.71197 1 11.8522 1.40649 0.0355444 1.40649 0.0355444"
[3] "111 rs11947310 4 15744576 A C 0.1909 0.0823 0.012 6.85833 1 11.8522 1.40649 0.0355444 1.40649 0.0355444"

> readLines("results/GWAS/Nalls23andMe_2019/BST1/FINEMAP/data.cred5")
[1] "# Post-Pr(# of causal SNPs is 5) = 1"
[2] "#log10bf 22448.9 NA 22432.1 NA 20843.6 NA 556.379 NA 264.033 NA"
[3] "#min(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[4] "#mean(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[5] "#median(|ld|) 1 NA 1 NA 1 NA 1 NA 1 NA"
[6] "index cred1 prob1 cred2 prob2 cred3 prob3 cred4 prob4 cred5 prob5"
[7] "1 rs11947310 1 rs11933202 1 rs1807250 1 rs6828144 1 rs73123615 0.999999"

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: ~/mambaforge/envs/echoR/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] echolocatoR_0.1.2

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0            ellipsis_0.3.1
  [3] class_7.3-18                biovizBase_1.38.0
  [5] htmlTable_2.1.0             XVector_0.30.0
  [7] GenomicRanges_1.42.0        base64enc_0.1-3
  [9] gld_2.6.2                   dichromat_2.0-0
 [11] rstudioapi_0.13             proxy_0.4-25
 [13] DT_0.17                     bit64_4.0.5
 [15] AnnotationDbi_1.52.0        fansi_0.4.2
 [17] mvtnorm_1.1-1               xml2_1.3.2
 [19] R.methodsS3_1.8.1           splines_4.0.3
 [21] ggbio_1.38.0                cachem_1.0.4
 [23] rootSolve_1.8.2.1           knitr_1.31
 [25] jsonlite_1.7.2              Formula_1.2-4
 [27] Rsamtools_2.6.0             dbplyr_2.1.1
 [29] cluster_2.1.1               R.oo_1.24.0
 [31] png_0.1-7                   graph_1.68.0
 [33] BiocManager_1.30.12         compiler_4.0.3
 [35] httr_1.4.2                  backports_1.2.1
 [37] assertthat_0.2.1            Matrix_1.3-2
 [39] fastmap_1.1.0               lazyeval_0.2.2
 [41] htmltools_0.5.1.1           prettyunits_1.1.1
 [43] tools_4.0.3                 gtable_0.3.0
 [45] glue_1.4.2                  lmom_2.8
 [47] GenomeInfoDbData_1.2.4      reshape2_1.4.4
 [49] dplyr_1.0.5                 rappdirs_0.3.3
 [51] Rcpp_1.0.6                  Biobase_2.50.0
 [53] vctrs_0.3.7                 Biostrings_2.58.0
 [55] rtracklayer_1.50.0          crosstalk_1.1.1
 [57] xfun_0.20                   stringr_1.4.0
 [59] lifecycle_1.0.0             ensembldb_2.14.0
 [61] XML_3.99-0.6                zlibbioc_1.36.0
 [63] MASS_7.3-53.1               scales_1.1.1
 [65] BSgenome_1.58.0             VariantAnnotation_1.36.0
 [67] ProtGenerics_1.22.0         hms_1.0.0
 [69] MatrixGenerics_1.2.1        RBGL_1.66.0
 [71] parallel_4.0.3              SummarizedExperiment_1.20.0
 [73] expm_0.999-6                susieR_0.10.1
 [75] AnnotationFilter_1.14.0     RColorBrewer_1.1-2
 [77] curl_4.3                    Exact_2.1
 [79] reticulate_1.18             memoise_2.0.0
 [81] gridExtra_2.3               ggplot2_3.3.3
 [83] biomaRt_2.46.3              rpart_4.1-15
 [85] reshape_0.8.8               latticeExtra_0.6-29
 [87] stringi_1.5.3               RSQLite_2.2.5
 [89] S4Vectors_0.28.1            e1071_1.7-6
 [91] checkmate_2.0.0             GenomicFeatures_1.42.2
 [93] BiocGenerics_0.36.0         boot_1.3-27
 [95] BiocParallel_1.24.1         GenomeInfoDb_1.26.4
 [97] rlang_0.4.10                pkgconfig_2.0.3
 [99] matrixStats_0.58.0          bitops_1.0-6
[101] lattice_0.20-41             purrr_0.3.4
[103] GenomicAlignments_1.26.0    htmlwidgets_1.5.3
[105] bit_4.0.4                   tidyselect_1.1.0
[107] GGally_2.1.1                plyr_1.8.6
[109] magrittr_2.0.1              R6_2.5.0
[111] IRanges_2.24.1              DescTools_0.99.40
[113] generics_0.1.0              Hmisc_4.5-0
[115] DelayedArray_0.16.3         DBI_1.1.1
[117] pillar_1.5.1                foreign_0.8-81
[119] survival_3.2-10             RCurl_1.98-1.3
[121] nnet_7.3-15                 tibble_3.1.0
[123] crayon_1.4.1                utf8_1.2.1
[125] OrganismDbi_1.32.0          BiocFileCache_1.14.0
[127] jpeg_0.1-8.1                progress_1.2.2
[129] grid_4.0.3                  data.table_1.14.0
[131] blob_1.2.1                  digest_0.6.27
[133] R.utils_2.10.1              openssl_1.4.3
[135] stats4_4.0.3                munsell_0.5.0
[137] askpass_1.1
Warning message:
In susie_func(bhat = subset_DT$Effect, shat = subset_DT$StdErr,  :
  IBSS algorithm did not converge in 100 iterations!
@bschilder
Copy link
Member

Hi @jdblischak, thanks for reporting this in such detail!

Here are my thoughts currently, any suggestions are welcome.

Credible Sets

Currently, I only record the top credible set in the multi-finemap file bc I was trying to avoid assigning multiple PPs to each SNP to keep FINEMAP.PP as a single column (all my other echolocatoR code is built around this structure). So basically, if SNPs are in the top CS they're assigned FINEMAP.CS=1 else 0. Perhaps there's a better way to do this that preserves this information without the user having to read in multiple .cred files themselves after running finemap_loci().

I can think of two strategies currently:

  1. Assign each FINEMAP CS to its own column (e.g. FINEMAP1.PP, FINEMAP1.CS, FINEMAP2.PP, FINEMAP2.CS, FINEMAP3.PP, FINEMAP3.CS, etc.) and revise the rest of echolocatoR's code to account for this change.
  2. Continue using only one col for each: FINEMAP.PP and FINEMAP.PP. When a SNP occurs in multiple CS, only record its FINEMAP.PP within the first CS it appears and assign its FINEMAP.CS to that first CS. This will work with the rest of the code but isn't exactly a faithful representation of the FINEMAP results.

FINEMAP version differences

  • There's some pretty substantial differences in arguments/file outputs/formats across the various versions of FINEMAP. I've tried to account for all of these but I may have missed some.
  • In FINEMAP v1.4, multiple .cred files are produced (e.g. .cred5), whereas in older versions, only a single .cred file is produced (with no number after it). For the reasons I mentioned above, I only took the top CS (corresponding to the first .cred# file in v1.4). But i think I accidentally changed the code at some point such that it didn't account for the # suffix in v1.4. I can at least make this more flexible with a grep() search rather than endsWith().

zstandard

  • This is something I've been having ongoing issues with, for some reason FINEMAP v1.4 throws dylib errors despite the fact that I have zstd installed and updated via brew (see Issue Finemap missing dependency in macOS #9 ). I also tried with a conda env version, but FINEMAP doesn't seem to recognize this.
  • It seems this only occurs on Mac since you've been able to run it fine on CentOS Linux (which has been my experience too). Still trying to figure this one out, but I may have to to email the original author since there doesn't seem to be any GitHub or issues forum for FINEMAP (at least that I've come across).

@jdblischak
Copy link
Author

@bschilder Thanks for the detailed follow-up. My thoughts below:

So basically, if SNPs are in the top CS they're assigned FINEMAP.CS=1 else 0.

I didn't realize that FINEMAP.CS was only supposed to include the top CS. This is confusing when comparing to, e.g. the results from SuSiE. When the echolocatoR results show multiple CS from SuSiE but a single one from FINEMAP, this indicates a difference in behavior. But I think that is just due to echolocatoR, not a difference in the methods.

Also, I don't think that is what the code does, at least not for FINEMAP 1.4. As far as I can tell, the CS info from FINEAMP 1.4 is not consulted at all. Instead the algorithm appears to be if PIP > threshold, assign FINEMAP.CS = 1 else NA. Thus every SNP with a large PIP gets assigned to CS=1, even if they were assigned to separate CS.

Assign each FINEMAP CS to its own column (e.g. FINEMAP1.PP, FINEMAP1.CS, FINEMAP2.PP, FINEMAP2.CS, FINEMAP3.PP, FINEMAP3.CS, etc.) and revise the rest of echolocatoR's code to account for this change.

I don't understand this. The PIP for a SNP is unaffected by the CS it is assigned. My recommendation would be to obtain the information the same as the PolyFun authors do, unless you have a compelling reason that is incorrect:

  1. Obtain FINEMAP.PP from the column "prob" in the.snp output file
  2. Obtain FINEMAP.CS from the .cred# output file, where # is the largest one available

I'm happy to help with this. And my apologies if I am missing something obvious.

I may have to to email the original author since there doesn't seem to be any GitHub or issues forum for FINEMAP

You should absolutely email the author for help troubleshooting the installation on macOS. There is no public forum for discussing these issues, and the code is closed source, so we can't build a conda package to ease installation.

@bschilder
Copy link
Member

Thanks for the feedback @jdblischak!

This does get rather confusing so I may have not have explained some things very well. I've tried a number of different ways about this so it's entirely possible I could be mixing up strategies from different version of echolocatoR and FINEMAP (apologies if this is the case).

Credible Sets and Posterior Probabilities

One of the main decision points I've gone back and forth on is whether to use .snp (marginal probabilities across all SNPs) vs .cred (conditional probabilities only for SNPs within a specific credible set) when constructing the final multi-finemap results file.

  • I think your suggestion makes quite a bit of sense to me. But I've also received some feedback from an anonymous reviewer that seems to have pretty intimate knowledge of FINEMAP who was adamant about using the conditional probabilities from .cred (I was previously using marg probs from .snp).
  • In the FINEMAP output, each configuration (each with a different # of causal SNPs modeled) is assigned a probability. So if I specified the max # of causal SNPs as 5, FINEMAP v1.4 would return 5 .cred# files (though I think only 1 .cred file is returned in v1.3, and no .cred file is returned in older versions). For FINEMAP v1.4, The reviewer suggested that I
    use the .cred# file that shows largest posterior probability for the number of causal variants (line 1 of the file) and extract credible sets and SNP-wise conditional probabilities from that file. In my experience, FINEMAP usually gives the .cred# file with the highest # of assumed causal SNPs the highest probability, but this isn't necessarily always the case.
  • More recent versions of FINEMAP also return a .config file, which I think could also be used to select the top configuration based on its PP.

This is all a bit tricky to explain in words so I'm going to push some of these proposed changes to the dev branch. If you don't mind, perhaps you could then look these over and see if they seem sensible to you?

FINEMAP support

I tried reaching out to the FINEMAP author a couple weeks ago via email but haven't heard back yet (christian.benner@helsinki.fi and cc'ed finemap@christianbenner.com). Not sure if he still uses these emails but I can't seem to find any others. Also tried reaching out via LinkedIn. I'll keep you posted if I have any luck.

Thanks again for your help!
-Brian

@jdblischak
Copy link
Author

I think your suggestion makes quite a bit of sense to me. But I've also received some feedback from an anonymous reviewer that seems to have pretty intimate knowledge of FINEMAP who was adamant about using the conditional probabilities from .cred (I was previously using marg probs from .snp).

This doesn't make sense to me. If the goal is to produce results that can be compared across the different fine-mapping methods, I don't see why the conditional probabilities should be reported. For susieR you are reporting the PIPs in SUSIE.PP.

https://github.com/RajLabMSSM/echolocatoR/blob/b055ac0fb74c914d7600e3afc650a2ffd7149396/R/SUSIE.R#L194-L195

From your documentation:

https://github.com/RajLabMSSM/echolocatoR/blob/6b9086d6755be0cc802027b76fb8314058fa0bf1/README.Rmd#L311

From the FINEMAP documentation for data.snp (emphasis mine):

prob column contains the marginal Posterior Inclusion Probabilities (PIP). The PIP for the l th SNP is the posterior probability that this SNP is causal.

From the FINEMAP documentation for data.cred:

The l th credible set shows the best candidate SNPs and their posterior probability of being in a k-SNP causal configuration that additionally contains k - 1 SNPs.

If you want to report the posterior probability that a given SNP is causal, I think you should report prob from data.snp.

In cases where there are only 1 SNP per CS, I don't think it matters whether you report the PIP from data.snp or the PP from data.cred#. The PP will be 1, and the PIP will be 1 or very close to it. However, when there are multiple SNPs in a CS, the PIP can be different from the PP.

As an example, here's code to run the example shipped with FINEMAP:

mkdir /tmp/fmexample
cd /tmp/fmexample
wget http://christianbenner.com/finemap_v1.4_x86_64.tgz
tar xf finemap_v1.4_x86_64.tgz
cd finemap_v1.4_x86_64/
./finemap_v1.4_x86_64 --sss --in-files example/master -dataset 1

Taking a quick look at the results in R, PIP and PP are not identical (though they are correlated). The most dramatic difference is rs6. It's PIP is only 0.04. However, it's PP for being included in cred3 is 0.17.

library(data.table)
snp <- fread("example/data.snp")

snp[prob > 0.009, .(rsid, prob)]
#    rsid      prob
# 1:  rs7 1.0000000
# 2: rs30 0.9999690
# 3:  rs6 0.0417854
# 4: rs31 0.0107216
# 5:  rs8 0.0105046
# 6: rs48 0.0100717

cred2 <- fread("example/data.cred2", skip = 5)
cred2
#    index cred1 prob1 cred2 prob2
# 1:     1   rs7     1  rs30     1

cred3 <- fread("example/data.cred3", skip = 5)
cred3[prob3 > 0.02, ]
#     index cred1 prob1 cred2 prob2 cred3     prob3
#  1:     1   rs7     1  rs30     1   rs6 0.1733260
#  2:     2  <NA>    NA  <NA>    NA  rs31 0.0433775
#  3:     3  <NA>    NA  <NA>    NA   rs8 0.0424081
#  4:     4  <NA>    NA  <NA>    NA  rs48 0.0405975
#  5:     5  <NA>    NA  <NA>    NA  rs46 0.0361650
#  6:     6  <NA>    NA  <NA>    NA  rs14 0.0281224
#  7:     7  <NA>    NA  <NA>    NA  rs10 0.0253583
#  8:     8  <NA>    NA  <NA>    NA  rs11 0.0248557
#  9:     9  <NA>    NA  <NA>    NA  rs19 0.0246048
# 10:    10  <NA>    NA  <NA>    NA  rs32 0.0242520
# 11:    11  <NA>    NA  <NA>    NA  rs20 0.0241354
# 12:    12  <NA>    NA  <NA>    NA  rs17 0.0237760
# 13:    13  <NA>    NA  <NA>    NA  rs13 0.0233587

For FINEMAP v1.4, The reviewer suggested that I
use the .cred# file that shows largest posterior probability for the number of causal variants (line 1 of the file) and extract credible sets and SNP-wise conditional probabilities from that file.

This is a reasonable suggestion, but may not be worth the effort. If you report the highest number of credible sets, the user can always apply a PIP threshold to remove the SNPs that are less convincing (and thus remove the less convincing credible sets as well). The earlier credible sets should be unchanged.

The FINEMAP example above illustrates this. The PP that there are only 2 causal SNPs is much higher than the PP for 3 causal SNPs. However, if you were to report the results from data.cred3, and the user applied even a minimal PIP cutoff per SNP (e.g. >= 0.05), the final results would be identical.

% head -n 1 example/data.cred2
# Post-Pr(# of causal SNPs is 2) = 0.773769

% head -n 1 example/data.cred3
# Post-Pr(# of causal SNPs is 3) = 0.202562

This is all a bit tricky to explain in words so I'm going to push some of these proposed changes to the dev branch. If you don't mind, perhaps you could then look these over and see if they seem sensible to you?

Looking at b351c10b4cb25c21446abb3791a8cb0bb4084eda, I think it's good you now detect cred files other than data.cred.

https://github.com/RajLabMSSM/echolocatoR/blob/6b9086d6755be0cc802027b76fb8314058fa0bf1/R/FINEMAP.R#L337-L340

Though for the reason I stated above (and for consistency with PolyFun), I think you should select the cred file with the largest number of configurations, not the fewest, i.e. cred_path <- cred_path[length(cred_path)]

https://github.com/RajLabMSSM/echolocatoR/blob/6b9086d6755be0cc802027b76fb8314058fa0bf1/R/FINEMAP.R#L380-L382

@bschilder bschilder transferred this issue from RajLabMSSM/echolocatoR Aug 27, 2022
@bschilder bschilder self-assigned this Aug 27, 2022
@bschilder bschilder added the bug Something isn't working label Aug 27, 2022
@bschilder
Copy link
Member

bschilder commented Aug 27, 2022

echolocatoR (via its subpackage echofinemap) now handles FINEMAP outputs differently, by including multiple columns which are now described in ?echofinemap::FINEMAP. Do let me know if you think any of these descriptions are lacking or not quite accurate:

Screenshot 2022-08-27 at 19 25 38

Note:

When performing fine-mapping via echofinemap::multifinemap or echlocatoR::finemap_loci, these columns will have a "FINEMAP." prefix to distinguish them from the results of other fine-mapping methods.

reprex

locus_dir <- file.path(tempdir(),echodata::locus_dir)
dat <- echofinemap::drop_finemap_cols(echodata::BST1)
LD_matrix <- echodata::BST1_LD_matrix
out <- echoLD::subset_common_snps(LD_matrix, dat)
LD_matrix <- out$LD
dat <- out$DT

dat2 <- echofinemap::FINEMAP(dat=dat,
                             locus_dir=locus_dir,
                             LD_matrix=LD_matrix)
Loading required namespace: MungeSumstats
Preparing sample size column (N).
Computing effective sample size using the LDSC method:
 Neff = (N_CAS+N_CON) * (N_CAS/(N_CAS+N_CON)) / mean((N_CAS/(N_CAS+N_CON))[(N_CAS+N_CON)==max(N_CAS+N_CON)]))
+ Mapping colnames from MungeSumstats ==> echolocatoR
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 95 SNPs.
+ dat = 95 SNPs.
+ 95 SNPs in common.
Converting obj to sparseMatrix.
Constructing master file.
Constructing data.z file.
Constructing data.ld file.
FINEMAP path: /Users/schilder/Library/Caches/org.R-project.R/R/echofinemap/FINEMAP/finemap_v1.4.1_MacOSX/finemap_v1.4.1_MacOSX
dyld[20528]: Library not loaded: /usr/local/opt/gcc/lib/gcc/10/libgfortran.5.dylib
  Referenced from: /Users/schilder/Library/Caches/org.R-project.R/R/echofinemap/FINEMAP/finemap_v1.4.1_MacOSX/finemap_v1.4.1_MacOSX
  Reason: tried: '/usr/local/opt/gcc/lib/gcc/10/libgfortran.5.dylib' (no such file), '/usr/local/lib/libgfortran.5.dylib' (no such file), '/usr/lib/libgfortran.5.dylib' (no such file)
Inferred FINEMAP version: 
Running FINEMAP.
cd .../BST1 &&
    .../finemap_v1.3.1_MacOSX
   
    --sss
   
    --in-files .../master
   
    --log
   
    --n-causal-snps 5

|--------------------------------------|
| Welcome to FINEMAP v1.3.1            |
|                                      |
| (c) 2015-2018 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - christian.benner@helsinki.fi       |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset         : all
- corr-config     : 0.95
- n-causal-snps   : 5
- n-configs-top   : 50000
- n-convergence   : 1000
- n-iterations    : 100000
- prior-k0        : 0
- prior-std       : 0.05 
- prob-tol        : 0.001

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!   

- Number of GWAS samples           : 1474097
- Number of SNPs                   : 95
- Prior-Pr(# of causal SNPs is k)  : 
  (0 -> 0)
   1 -> 0.584
   2 -> 0.292
   3 -> 0.0963
   4 -> 0.0236
   5 -> 0.00456
- 2994 configurations evaluated (1.008/100%) : converged after 1008 iterations
- Computing causal SNP statistics  : done!   
- Regional SNP heritability        : 0.0033 (SD: 9.5e-05 ; 95% CI: [0.00311,0.00349])
- Log10-BF of >= one causal SNP    : 1.04e+03
- Post-Pr(# of causal SNPs is k)   : 
  (0 -> 0)
   1 -> 0
   2 -> 0
   3 -> 0
   4 -> 9.67e-160
   5 -> 1
- Writing output                   : done    
- Run time                         : 0 hours, 0 minutes, 0 seconds
1 data.cred* file(s) found in the same subfolder.
Selected file based on postPr_k: data.cred
Importing conditional probabilities (.cred file).
Importing marginal probabilities (.snp file).
Importing configuration probabilities (.config file).
Warning message:
In echofinemap::FINEMAP(dat = dat, locus_dir = locus_dir, LD_matrix = LD_matrix) :
  
*********
 Error detected: 'dyld: Library not loaded: /usr/local/lib/libzstd.1.dylib'
 If you are using Mac OS, please install Zstandard
 (https://facebook.github.io/zstd/).
 e.g. `brew install zstd`

 Also, ensure that you have gcc v8 installed,
 as FINEMAP v1.4 is only compatible with this version.

 If Zstandard is already installed and this error persists,
 please see the main FINEMAP website for additional support
 (http://www.christianbenner.com). 
*********

results

head(dat2)

Image

FINEMAP version differences

I've spent quite a bit of time examining the outputs of each FINEMAP's versions outputs in different conditions, and I think I've accounted for these inechofinemap's internal processing functions. But do let me know if I've missed something or a new difference comes up in future versions of FINEMAP.

@bschilder bschilder moved this from Todo to In Progress in 🦇🦇 echoverse 🦇🦇 Aug 27, 2022
@jdblischak
Copy link
Author

@bschilder Thanks for following up and for putting all this work into new functionality and documentation.

Do let me know if you think any of these descriptions are lacking or not quite accurate

In general I think returning more results to the end user is the right thing to do.

With your new setup, how does the number of credible sets compare between SuSiE and FINEMAP? That was the main issue I had with the previous iteration (ie FINEMAP always showed 1 CS even if it actually identified more). I attempted to run your reprex, but I couldn't get echoconda installed:

> remotes::install_github("RajLabMSSM/echoconda", upgrade = FALSE)
─  building ‘echoconda_0.99.6.tar.gz’

* installing *source* package ‘echoconda’ ...
** using non-staged installation via StagedInstall field
Error in data.table::fread(system.file(package = "echoconda", "conda/echoR_versions.tsv.gz")) :
  Input is empty or only contains BOM or terminal control characters
Calls: <Anonymous> -> eval -> eval -> get_echoR -> <Anonymous>
Execution halted
ERROR: configuration failed for package ‘echoconda’

@bschilder
Copy link
Member

bschilder commented Sep 3, 2022

Regarding the conda installation issue, could you include your OS specs? Haven't encountered this issue on MacOS or Linux docker containers, and can't seem to replicate it now.

jdblischak added a commit to jdblischak/echoconda that referenced this issue Sep 16, 2022
@jdblischak
Copy link
Author

Regarding the conda installation issue, could you include your OS specs? Haven't encountered this issue on MacOS or Linux docker containers, and can't seem to replicate it now.

That error happened in a peculiar setup. R is running inside of a singularity container on an HPC environment. In case it's helpful, below I reproduced the installation error and appended the session information:

> remotes::install_github("RajLabMSSM/echoconda", upgrade = FALSE)
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo RajLabMSSM/echoconda@HEADchecking for file/local/tmp/RtmphmtsG7/remotes561138236260/RajLabMSSM-echoconda-49cac2a/DESCRIPTION’ (351ms)
─  preparingechoconda:checking DESCRIPTION meta-information ...checking for LF line-endings in source and make files and shell scriptschecking for empty or unneeded directories
   OmittedLazyDatafrom DESCRIPTIONbuildingechoconda_0.99.7.tar.gzInstalling package into/mnt/user-library/r420-bioc315-20220506’
(aslibis unspecified)
* installing *source* packageechoconda...
** using non-staged installation via StagedInstall field
Error in data.table::fread(system.file(package = "echoconda", "conda/echoR_versions.tsv.gz")) :
  Input is empty or only contains BOM or terminal control characters
Calls: <Anonymous> -> eval -> eval -> get_echoR -> <Anonymous>
Execution halted
ERROR: configuration failed for packageechoconda* removing/mnt/user-library/r420-bioc315-20220506/echocondaWarning message:
In i.p(...) :
  installation of package/local/tmp/RtmphmtsG7/file5611322a38a1/echoconda_0.99.7.tar.gzhad non-zero exit status

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] processx_3.7.0    compiler_4.2.0    R6_2.5.1          rprojroot_2.0.3
 [5] cli_3.3.0         prettyunits_1.1.1 tools_4.2.0       withr_2.5.0
 [9] curl_4.3.2        crayon_1.5.1      remotes_2.4.2     callr_3.7.0
[13] pkgbuild_1.3.1    ps_1.7.1

@jdblischak
Copy link
Author

Also, I was able to install echoconda and echofinemap in a more traditional setup with R installed in a conda environment. I just had to install echodata prior to installing echoconda (see PR RajLabMSSM/echoconda#6)

> remotes::install_github("RajLabMSSM/echodata", upgrade = FALSE)
> remotes::install_github("RajLabMSSM/echoconda", upgrade = FALSE)
> library(echoconda)
> remotes::install_github("RajLabMSSM/echofinemap", upgrade = FALSE)
> library(echofinemap)
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: ~/.conda/envs/test-echo/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] echofinemap_0.99.3 echoconda_0.99.7

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            rjson_0.2.21
  [3] ellipsis_0.3.2              downloadR_0.99.4
  [5] XVector_0.34.0              GenomicRanges_1.46.1
  [7] remotes_2.4.2               DT_0.25
  [9] bit64_4.0.5                 AnnotationDbi_1.56.1
 [11] fansi_1.0.3                 xml2_1.3.3
 [13] splines_4.1.3               snpStats_1.44.0
 [15] R.methodsS3_1.8.2           cachem_1.0.6
 [17] echoLD_0.99.7               jsonlite_1.8.0
 [19] Rsamtools_2.10.0            dbplyr_2.2.1
 [21] png_0.1-7                   R.oo_1.25.0
 [23] echodata_0.99.12            BiocManager_1.30.18
 [25] readr_2.1.2                 compiler_4.1.3
 [27] httr_1.4.4                  basilisk_1.6.0
 [29] assertthat_0.2.1            Matrix_1.4-1
 [31] fastmap_1.1.0               cli_3.4.0
 [33] htmltools_0.5.3             prettyunits_1.1.1
 [35] tools_4.1.3                 gtable_0.3.1
 [37] glue_1.6.2                  GenomeInfoDbData_1.2.7
 [39] dplyr_1.0.10                rappdirs_0.3.3
 [41] Rcpp_1.0.9                  Biobase_2.54.0
 [43] vctrs_0.4.1                 Biostrings_2.62.0
 [45] echotabix_0.99.8            rtracklayer_1.54.0
 [47] stringr_1.4.1               openxlsx_4.2.5
 [49] lifecycle_1.0.2             irlba_2.3.5
 [51] restfulr_0.0.15             XML_3.99-0.10
 [53] zlibbioc_1.40.0             scales_1.2.1
 [55] basilisk.utils_1.6.0        BSgenome_1.62.0
 [57] VariantAnnotation_1.40.0    coloc_5.1.0.1
 [59] hms_1.1.2                   MatrixGenerics_1.6.0
 [61] parallel_4.1.3              SummarizedExperiment_1.24.0
 [63] susieR_0.12.27              yaml_2.3.5
 [65] curl_4.3.2                  memoise_2.0.1
 [67] reticulate_1.26             gridExtra_2.3
 [69] ggplot2_3.3.6               biomaRt_2.50.0
 [71] reshape_0.8.9               stringi_1.7.8
 [73] RSQLite_2.2.8               S4Vectors_0.32.4
 [75] BiocIO_1.4.0                GenomicFeatures_1.46.1
 [77] BiocGenerics_0.40.0         filelock_1.0.2
 [79] zip_2.2.1                   BiocParallel_1.28.3
 [81] GenomeInfoDb_1.30.0         rlang_1.0.5
 [83] pkgconfig_2.0.3             matrixStats_0.62.0
 [85] bitops_1.0-7                lattice_0.20-45
 [87] purrr_0.3.4                 GenomicAlignments_1.30.0
 [89] htmlwidgets_1.5.4           bit_4.0.4
 [91] tidyselect_1.1.2            plyr_1.8.7
 [93] magrittr_2.0.3              R6_2.5.1
 [95] IRanges_2.28.0              generics_0.1.3
 [97] piggyback_0.1.4             DelayedArray_0.20.0
 [99] DBI_1.1.3                   pillar_1.8.1
[101] survival_3.4-0              KEGGREST_1.34.0
[103] RCurl_1.98-1.8              mixsqp_0.3-43
[105] tibble_3.1.8                dir.expiry_1.2.0
[107] crayon_1.5.1                utf8_1.2.2
[109] BiocFileCache_2.2.0         tzdb_0.3.0
[111] viridis_0.6.2               progress_1.2.2
[113] grid_4.1.3                  data.table_1.14.2
[115] blob_1.2.3                  digest_0.6.29
[117] tidyr_1.2.1                 R.utils_2.12.0
[119] stats4_4.1.3                munsell_0.5.0
[121] viridisLite_0.4.1

@jdblischak
Copy link
Author

Now that I have echofinemap installed, I tried to run the reprex. I got an error from MungeSumstats. I probably need to update all the packages to their latest versions, but I don't have the time to continue troubleshooting this at the moment

locus_dir <- file.path(tempdir(),echodata::locus_dir)
dat <- echofinemap::drop_finemap_cols(echodata::BST1)
LD_matrix <- echodata::BST1_LD_matrix
out <- echoLD::subset_common_snps(LD_matrix, dat)
LD_matrix <- out$LD
dat <- out$DT

dat2 <- echofinemap::FINEMAP(dat=dat,
                             locus_dir=locus_dir,
                             LD_matrix=LD_matrix)
Loading required namespace: MungeSumstats
Preparing sample size column (N).
Error: 'compute_nsize' is not an exported object from 'namespace:MungeSumstats'

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /gstore/home/blischaj/.conda/envs/test-echo/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] echofinemap_0.99.3 echoconda_0.99.7

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            rjson_0.2.21
  [3] ellipsis_0.3.2              downloadR_0.99.4
  [5] XVector_0.34.0              fs_1.5.2
  [7] GenomicRanges_1.46.1        remotes_2.4.2
  [9] DT_0.25                     bit64_4.0.5
 [11] AnnotationDbi_1.56.1        fansi_1.0.3
 [13] xml2_1.3.3                  splines_4.1.3
 [15] snpStats_1.44.0             R.methodsS3_1.8.2
 [17] cachem_1.0.6                echoLD_0.99.7
 [19] jsonlite_1.8.0              Rsamtools_2.10.0
 [21] dbplyr_2.2.1                png_0.1-7
 [23] R.oo_1.25.0                 echodata_0.99.12
 [25] BiocManager_1.30.18         readr_2.1.2
 [27] compiler_4.1.3              httr_1.4.4
 [29] basilisk_1.6.0              assertthat_0.2.1
 [31] Matrix_1.4-1                fastmap_1.1.0
 [33] gargle_1.2.1                cli_3.4.0
 [35] htmltools_0.5.3             prettyunits_1.1.1
 [37] tools_4.1.3                 gtable_0.3.1
 [39] glue_1.6.2                  GenomeInfoDbData_1.2.7
 [41] dplyr_1.0.10                rappdirs_0.3.3
 [43] Rcpp_1.0.9                  Biobase_2.54.0
 [45] vctrs_0.4.1                 Biostrings_2.62.0
 [47] echotabix_0.99.8            rtracklayer_1.54.0
 [49] stringr_1.4.1               MungeSumstats_1.2.0
 [51] openxlsx_4.2.5              lifecycle_1.0.2
 [53] irlba_2.3.5                 restfulr_0.0.15
 [55] XML_3.99-0.10               googleAuthR_2.0.0
 [57] zlibbioc_1.40.0             scales_1.2.1
 [59] basilisk.utils_1.6.0        BSgenome_1.62.0
 [61] VariantAnnotation_1.40.0    coloc_5.1.0.1
 [63] hms_1.1.2                   MatrixGenerics_1.6.0
 [65] parallel_4.1.3              SummarizedExperiment_1.24.0
 [67] susieR_0.12.27              yaml_2.3.5
 [69] curl_4.3.2                  memoise_2.0.1
 [71] reticulate_1.26             gridExtra_2.3
 [73] ggplot2_3.3.6               biomaRt_2.50.0
 [75] reshape_0.8.9               stringi_1.7.8
 [77] RSQLite_2.2.8               S4Vectors_0.32.4
 [79] BiocIO_1.4.0                GenomicFeatures_1.46.1
 [81] BiocGenerics_0.40.0         filelock_1.0.2
 [83] zip_2.2.1                   BiocParallel_1.28.3
 [85] GenomeInfoDb_1.30.0         rlang_1.0.5
 [87] pkgconfig_2.0.3             matrixStats_0.62.0
 [89] bitops_1.0-7                lattice_0.20-45
 [91] purrr_0.3.4                 GenomicAlignments_1.30.0
 [93] htmlwidgets_1.5.4           bit_4.0.4
 [95] tidyselect_1.1.2            plyr_1.8.7
 [97] magrittr_2.0.3              R6_2.5.1
 [99] IRanges_2.28.0              generics_0.1.3
[101] piggyback_0.1.4             DelayedArray_0.20.0
[103] DBI_1.1.3                   pillar_1.8.1
[105] survival_3.4-0              KEGGREST_1.34.0
[107] RCurl_1.98-1.8              mixsqp_0.3-43
[109] tibble_3.1.8                dir.expiry_1.2.0
[111] crayon_1.5.1                utf8_1.2.2
[113] BiocFileCache_2.2.0         tzdb_0.3.0
[115] viridis_0.6.2               progress_1.2.2
[117] grid_4.1.3                  data.table_1.14.2
[119] blob_1.2.3                  digest_0.6.29
[121] tidyr_1.2.1                 R.utils_2.12.0
[123] stats4_4.1.3                munsell_0.5.0
[125] viridisLite_0.4.1

@bschilder
Copy link
Member

bschilder commented Sep 20, 2022

1.

That error happened in a peculiar setup. R is running inside of a singularity container on an HPC environment. In case it's helpful, below I reproduced the installation error and appended the session information:

Error in data.table::fread(system.file(package = "echoconda", "conda/echoR_versions.tsv.gz")) :
  Input is empty or only contains BOM or terminal control characters

This happens bc data.table requires R.utils to be installed to read in gz-compressed files on certain OS. I've now made R.utils an Import of echodata to avoid this in the future.

2.

Also, I was able to install echoconda and echofinemap in a more traditional setup with R installed in a conda environment. I just had to install echodata prior to installing echoconda (see PR RajLabMSSM/echoconda#6)

Great, thanks for the PR! I must have added echodata functions later and forgot to add it as a dep. Will merge this soon.

3.

Now that I have echofinemap installed, I tried to run the reprex. I got an error from MungeSumstats. I probably need to update all the packages to their latest versions, but I don't have the time to continue troubleshooting this at the moment

You're exactly right, MungeSumstats exported compute_nsize in v1.3.9.

Though it's odd this is coming up for you since I have a minimum version required for echodata in the DESCRIPTION file: MungeSumstats (>= 1.3.14). I've added an additional safeguard within echodata::get_sample_size to let users know about this version requirement.

 requireNamespace("MungeSumstats")
    if(packageVersion("MungeSumstats")<"1.3.14"){
        stp <- "echodata requires MungeSumstats >=1.3.14"
        stop(stp)
    }

@jdblischak
Copy link
Author

This happens bc data.table requires R.utils to be installed to read in gz-compressed files on certain OS. I've now made R.utils an Import of echodata to avoid this in the future.

This isn't blocking me, so feel free to ignore. But I already had R.utils installed in that environment, so I am skeptical that was the problem. In case it could be useful for future troubleshooting, here are the versions of data.table and R.utils I had:

library("data.table")
## data.table 1.14.2 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
library("R.utils")
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.25.0 (2022-06-12 02:20:02 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: ‘R.oo’
## 
## The following object is masked from ‘package:R.methodsS3’:
## 
##     throw
## 
## The following objects are masked from ‘package:methods’:
## 
##     getClasses, getMethods
## 
## The following objects are masked from ‘package:base’:
## 
##     attach, detach, load, save
## 
## R.utils v2.12.0 (2022-06-28 03:20:05 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: ‘R.utils’
## 
## The following object is masked from ‘package:utils’:
## 
##     timestamp
## 
## The following objects are masked from ‘package:base’:
## 
##     cat, commandArgs, getOption, isOpen, nullfile, parse, warnings
## 
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
## 
## other attached packages:
## [1] R.utils_2.12.0    R.oo_1.25.0       R.methodsS3_1.8.2 data.table_1.14.2
## 
## loaded via a namespace (and not attached):
## [1] compiler_4.2.0 tools_4.2.0

@jdblischak
Copy link
Author

Also, would you be able to export the dat2 object from your reprex as a plain text file and attach it to this Issue? I'd like to be able to look at the new results without having to run the reprex myself locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants