-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PureCN writes only a fraction of the variants reported in vcf #329
Comments
Hi, probably also #320 related? Try installing the latest GitHub version and re-run PureCN.R with --min-base-quality 20. |
Also, for test runs using --normal in PureCN.R is ok, but for ideal results, create a normal database with NormalDB.R and then NOT use --normal. This will reduce the noise by a factor of approx. 2. You can use our Docker image that can read GATK GenomicsDB files for the mapping bias database (that one provides the expected allelic fractions of all heterozygous SNPs in your normal samples and adjusts accordingly). |
Closing now, feel free to reopen if unrelated to #320 |
Dear Markus I tried to lower the --min-base-quality to 20 but this didn't improve the results |
In case this may be of help INFO [2023-10-24 08:55:04] Loading coverage files... |
Dear Team
I ran the following pipeline to estimate the CCF of the variants reported in a mutect 2.1 VCF.
The pipeline goes well but in the "sorted_variants.csv" output are printed out only 30000 variants (almost all are germline)
I expected that all the variants be printed in the final report.
Any tips?
thank you
This is my pipeline
I/home/elisa/anaconda3/envs/musica/bin/Rscript $PURECN/IntervalFile.R --infile exome.bed --fasta /data/PIPELINE/GENOMES/hg38/hg38.fa --outfile $OUT_REF/baits_hg38_intervals.txt --offtarget --genome hg38 --export $OUT_REF/baits_optimized_hg38.bed --force;
awk 'NR==1{print}NR>1{print $0}' REF/baits_hg38_intervals.txt >REF/baits_hg38_intervals2.txt
Calculate and GC-normalize coverage from a BAM file
/home/elisa/anaconda3/envs/musica/bin/Rscript $PURECN/Coverage.R --outdir $OUT_REF --bam $INPUTDIR/EXOME/ALIGNMENT/BAM/final_uniqueTUMOR_sorted.bam --intervals $OUT_REF/baits_hg38_intervals2.txt --cores 40;
/home/elisa/anaconda3/envs/musica/bin/Rscript $PURECN/Coverage.R --outdir $OUT_REF --bam $INPUTDIR/EXOME/ALIGNMENT/BAM/final_uniqueCONTROL_sorted.bam --intervals $OUT_REF/baits_hg38_intervals2.txt --cores 40;
wait;
java $temp -jar $gatk Mutect2 -R $GENOMEDIR/hg38.fa -I $INPUTDIR/EXOME/ALIGNMENT/BAM/final_uniqueTUMOR_sorted.bam -tumor TUMOR -I $INPUTDIR/EXOME/ALIGNMENT/BAM/final_uniqueCONTROL_sorted.bam -normal CONTROL -O mutect.vcf --genotype-germline-sites true --genotype-pon-sites true --native-pair-hmm-threads 50 --germline-resource /data/PIPELINE/GENOMES/hg38/somatic-hg38_af-only-gnomad.hg38.vcf.gz --minimum-allele-fraction 0.10
/home/elisa/anaconda3/envs/musica/bin/Rscript $PURECN/PureCN.R --out $OUT_REF --tumor $OUT_REF/final_uniqueTUMOR_sorted_coverage_loess.txt.gz --normal $OUT_REF/final_uniqueCONTROL_sorted_coverage_loess.txt.gz --sampleid final_uniqueTUMOR_sorted --vcf mutect.vcf --intervals $OUT_REF/baits_hg38_intervals2.txt --genome hg38
this is my sessionInfo
Matrix products: default
BLAS/LAPACK: /home/elisa/anaconda3/envs/musica/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] PureCN_1.20.0 VariantAnnotation_1.36.0
[3] Rsamtools_2.6.0 Biostrings_2.58.0
[5] XVector_0.30.0 SummarizedExperiment_1.20.0
[7] Biobase_2.50.0 GenomicRanges_1.42.0
[9] GenomeInfoDb_1.26.7 IRanges_2.24.1
[11] S4Vectors_0.28.1 MatrixGenerics_1.2.1
[13] matrixStats_1.0.0 BiocGenerics_0.36.1
[15] DNAcopy_1.64.0
loaded via a namespace (and not attached):
[1] httr_1.4.7 VGAM_1.1-9 bit64_4.0.5
[4] splines_4.0.5 askpass_1.2.0 BiocFileCache_1.14.0
[7] blob_1.2.4 BSgenome_1.58.0 GenomeInfoDbData_1.2.4
[10] progress_1.2.2 pillar_1.9.0 RSQLite_2.3.1
[13] lattice_0.21-9 glue_1.6.2 RColorBrewer_1.1-3
[16] colorspace_2.1-0 Matrix_1.6-1.1 XML_3.99-0.14
[19] pkgconfig_2.0.3 biomaRt_2.46.3 zlibbioc_1.36.0
[22] scales_1.2.1 BiocParallel_1.24.1 tibble_3.2.1
[25] openssl_2.1.1 generics_0.1.3 ggplot2_3.4.3
[28] cachem_1.0.8 GenomicFeatures_1.42.3 cli_3.6.1
[31] magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1
[34] fansi_1.0.4 xml2_1.3.5 tools_4.0.5
[37] data.table_1.14.8 prettyunits_1.2.0 hms_1.1.3
[40] formatR_1.14 lifecycle_1.0.3 stringr_1.5.0
[43] Rhdf5lib_1.12.1 munsell_0.5.0 DelayedArray_0.16.3
[46] AnnotationDbi_1.52.0 lambda.r_1.2.4 compiler_4.0.5
[49] rlang_1.1.1 rhdf5_2.34.0 futile.logger_1.4.3
[52] grid_4.0.5 RCurl_1.98-1.12 rhdf5filters_1.2.1
[55] rappdirs_0.3.3 bitops_1.0-7 gtable_0.3.4
[58] DBI_1.1.3 curl_4.3.2 R6_2.5.1
[61] gridExtra_2.3 GenomicAlignments_1.26.0 dplyr_1.1.3
[64] rtracklayer_1.50.0 fastmap_1.1.1 bit_4.0.5
[67] utf8_1.2.3 futile.options_1.0.1 stringi_1.7.12
[70] vctrs_0.6.3 dbplyr_2.3.4 tidyselect_1.2.0
The text was updated successfully, but these errors were encountered: