Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'long flag "seg-file" is invalid' #322

Closed
h170607 opened this issue Sep 14, 2023 · 9 comments
Closed

'long flag "seg-file" is invalid' #322

h170607 opened this issue Sep 14, 2023 · 9 comments

Comments

@h170607
Copy link

h170607 commented Sep 14, 2023

Hi,
I Run PureCN with CNVkit segmentation:
cnvkit.py export seg ${SAMPLE}_cnvkit.cns --enumerate-chroms -o ${SAMPLE}_cnvkit.seg

`Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --out ./${SAMPLE} \

--sampleid $SAMPLE \

--tumor ${SAMPLE}_cnvkit.cnr \

--seg-file ${SAMPLE}_cnvkit.seg \

--vcf ${SAMPLE}_mutect.vcf \

--snp-blacklist hg19_simpleRepeats.bed \

--genome hg19 \

--fun-segmentation Hclust \

--force --post-optimize --seed 123`

The VCF contains matched normal information. Then got the error, with no results:
/data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R: error: Error in getopt(spec = spec, opt = args) : long flag "seg-file" is invalid

@lima1
Copy link
Owner

lima1 commented Sep 14, 2023

That's weird. What's the output of Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --help ?

@h170607
Copy link
Author

h170607 commented Sep 15, 2023

There seems to be some problem:
/data/envsoft/R-4.0.0/bin/Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --help

`Possible Ensembl SSL connectivity problems detected.
Please see the 'Connection Troubleshooting' section of the biomaRt vignette
vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :
Peer certificate cannot be authenticated with given CA certificates: [uswest.ensembl.org] Peer's Certificate has expired.

Usage: /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R [options]
Options:
-i SAMPLEID, --sampleid=SAMPLEID
Sample id
--normal=NORMAL
Input: normal coverage, GC-normalized. Optional if normaldb or segfile is provided.
--tumor=TUMOR
Input: tumor coverage, GC-normalized
--vcf=VCF
Input: VCF file
--rds=RDS
Input: PureCN output RDS file, used to regenerate plots and files after manual curation
--mappingbiasfile=MAPPINGBIASFILE
Input: Mapping bias RDS file, generated by NormalDB.R (or calculateMappingBiasVcf)
--normal_panel=NORMAL_PANEL
Input: Deprecated, use --mappingbiasfile instead...`

need to reinstall this r package?

@lima1
Copy link
Owner

lima1 commented Sep 15, 2023

Yes, this looks like an ancient version. Try following the README here to get the most recent version.

@h170607
Copy link
Author

h170607 commented Sep 15, 2023

I upgraded the software to the latest version 2.7.11, using BiocManager::install("lima1/PureCN"),
and still have the following tips:

Possible Ensembl SSL connectivity problems detected.

Please see the 'Connection Troubleshooting' section of the biomaRt vignette

vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :

Peer certificate cannot be authenticated with given CA certificates: [uswest.ensembl.org] Peer's Certificate has expired.`

I tried running it and still got errors:

WARN [2023-09-15 14:08:22] vcf.file has no DB info field for membership in germline databases. Found and used somatic status instead.

INFO [2023-09-15 14:08:22] 0 (0.0%) variants annotated as likely germline (DB INFO flag).

FATAL [2023-09-15 14:08:22] VCF either contains no germline variants or variants are not properly annotated.

My vcf file was generated by analyzing normal.bam and tumor.bam files with mutect2 and was filtered by the FilterMutectCalls.
I don't quite understand why germline mutations are needed. Does this mean using the result of the HaplotypeCaller pipeline?

@ddrichel
Copy link
Contributor

I am currently looking into this as well.
Maybe you ran Mutect2 without --germline-resource, e.g. gnomAD? Mutect2 converts AF from germline resource to POPAF=-log10(AF), which PureCN can use for better fitting of copy numbers and estimates of purity/ploidy.
PureCN can be run without --vcf if you want to do it without germline mutations, but I expect a loss of performance.

@lima1
Copy link
Owner

lima1 commented Sep 15, 2023

@h170607, have a look here, you will need to tell Mutect to get germline sites. https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html#3_Create_VCF_files

SNPs provide the allele-specific copy number signal, have a look at our or the ABSOLUTE paper.

@h170607
Copy link
Author

h170607 commented Sep 23, 2023

We reran Mutect2 with --genotype-germline-sites true, --genotype-pon-sites true and --interval-padding 75, and got the result.
It doesn't seem to need gnomAD.

@lima1
Copy link
Owner

lima1 commented Sep 23, 2023

Yes, if you have a matched normal, it uses that for the germline vs somatic priors.

@lima1 lima1 closed this as completed Sep 23, 2023
@lima1
Copy link
Owner

lima1 commented Sep 23, 2023

If you get lots of variants removed by the BQ < 25 filter, see #320.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants