Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in Post Process Germline Calls (Germline CNV Caller Workflow) #9072

Closed
hkirmak opened this issue Jan 9, 2025 · 2 comments
Closed

Comments

@hkirmak
Copy link

hkirmak commented Jan 9, 2025

Hi,
I am using Germline CNV Caller in a case mode (encouıntered the same error with batch mode as well). The analysis returns Keyerror for the sample name. All the files are present and all the paths were given as absolute. I also appended my problem to another issue from 2018, as this has happened before. However the solutions offered in that issue did not worked. Here is the issue url: #4724


Bug Report

Affected tool(s) or class(es)

gatk PostprocessGermlineCNVCalls --calls-shard-path data/cnv_raw/germlinecnvcaller/germlinecnvcaller-calls --model-shard-path data/COHORT_germline_cnv_caller_cohort/COHORT_model-model --sample-index 0 --autosomal-ref-copy-number 2 --allosomal-contig chrX --allosomal-contig chrY --contig-ploidy-calls data/determine_ploidy/determine_ploidy-calls/SAMPLE_0 --output-genotyped-intervals data/cnv_call/genotyped_intervals.vcf --output-genotyped-segments data/cnv_call/genotyped_segments.vcf --output-denoised-copy-ratios data/cnv_call/genotyped_denoised_copy_ratios.vcf

Affected version(s)

  • snakemake-wrapper-utils==0.4
  • gatk4==4.6.1.0
  • gcnvkernel==0.9

I also tried with the latest version 4.3.0.0 and 4.5.0.0 versions. I think the tool uses THEANO_FLAGS instead of PYTENSOR_FLAGS in the earlier versions, which had the same error.

Description

I have checked the files. All the files are present and contain the sample name that is indicated in the keyerror.

Steps to reproduce

After collecting the read counts for germline CNV calling workflow, the read counts are processed with DetermineGermlineContigPloidy and GermlineCNVCaller tools. The output of both tools are provided the PostProcessGermlineCalls tool as a directory with the absolute paths given as it is suggested in the earlier issues and GATK forum. This still generates the error I pasted below.

Expected behavior

All the files of the input directories are complete and all contain the sample name as is.

Actual behavior

The sample name cannot be found for some reason.


2025-01-08 13:56 INFO: CNV case call: data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls
2025-01-08 13:56 INFO: CNV model: hg38_acnv_models/roche_4100_ces/cnv_model/model
2025-01-08 13:56 INFO: Contig ploidy call: data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0
2025-01-08 13:56 INFO: gatk PostprocessGermlineCNVCalls --calls-shard-path data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --model-shard-path hg38_acnv_models/roche_4100_ces/cnv_model/model --sample-index 0 --autosomal-ref-copy-number 2 --allosomal-contig chrX --allosomal-contig chrY --contig-ploidy-calls data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --output-genotyped-intervals data/cnv_call/S29_genotyped_intervals.vcf --output-genotyped-segments data/cnv_call/S29_genotyped_segments.vcf --output-denoised-copy-ratios data/cnv_call/S29_genotyped_denoised_copy_ratios.vcf
2025-01-08 13:57 INFO: Using GATK jar .snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar .snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar PostprocessGermlineCNVCalls --calls-shard-path data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --model-shard-path hg38_acnv_models/roche_4100_ces/cnv_model/model --sample-index 0 --autosomal-ref-copy-number 2 --allosomal-contig chrX --allosomal-contig chrY --contig-ploidy-calls data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --output-genotyped-intervals data/cnv_call/S29_genotyped_intervals.vcf --output-genotyped-segments data/cnv_call/S29_genotyped_segments.vcf --output-denoised-copy-ratios data/cnv_call/S29_genotyped_denoised_copy_ratios.vcf
13:56:42.145 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:.snakemake/conda/febadccea00892907b6e487236c1170a_/share/gatk4-4.6.1.0-0/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@682e422c]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@5bb8e6fc]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@682e422c]
13:56:42.191 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - The Genome Analysis Toolkit (GATK) v4.6.1.0
13:56:42.192 INFO  PostprocessGermlineCNVCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Executing as hatice@hatice on Linux v6.8.0-51-generic amd64
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
13:56:42.192 INFO  PostprocessGermlineCNVCalls - Start Date/Time: January 8, 2025 at 1:56:42 PM TRT
13:56:42.192 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - ------------------------------------------------------------
13:56:42.192 INFO  PostprocessGermlineCNVCalls - HTSJDK Version: 4.1.3
13:56:42.193 INFO  PostprocessGermlineCNVCalls - Picard Version: 3.3.0
13:56:42.193 INFO  PostprocessGermlineCNVCalls - Built for Spark Version: 3.5.0
13:56:42.193 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:56:42.193 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:56:42.194 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:56:42.194 INFO  PostprocessGermlineCNVCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Deflater: IntelDeflater
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Inflater: IntelInflater
13:56:42.194 INFO  PostprocessGermlineCNVCalls - GCS max retries/reopens: 20
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Requester pays: disabled
13:56:42.194 INFO  PostprocessGermlineCNVCalls - Initializing engine
13:56:55.662 INFO  PostprocessGermlineCNVCalls - Done initializing engine
13:56:55.848 INFO  ProgressMeter - Starting traversal
13:56:55.848 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute
13:56:55.848 INFO  ProgressMeter -             unmapped              0.0                     0              NaN
13:56:55.848 INFO  ProgressMeter - Traversal complete. Processed 0 total records in 0.0 minutes.
13:56:55.848 INFO  PostprocessGermlineCNVCalls - Generating intervals VCF file...
13:56:55.967 INFO  PostprocessGermlineCNVCalls - Writing intervals VCF file to data/cnv_call/S29_genotyped_intervals.vcf...
13:56:55.967 INFO  PostprocessGermlineCNVCalls - Analyzing shard 1 / 1...
13:56:56.555 INFO  PostprocessGermlineCNVCalls - Generating segments...
13:57:38.724 INFO  PostprocessGermlineCNVCalls - Shutting down engine
[January 8, 2025 at 1:57:38 PM TRT] org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls done. Elapsed time: 0.94 minutes.
Runtime.totalMemory()=1207959552
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: 
python exited with 1
Command Line: python /tmp/segment_gcnv_calls.1199447359357923802.py --ploidy_calls_path data/S29_determine_ploidy/S29_determine_ploidy-calls/SAMPLE_0 --model_shards hg38_acnv_models/roche_4100_ces/cnv_model/model --calls_shards data/cnv_raw/S29_germlinecnvcaller/S29_germlinecnvcaller-calls --output_path /tmp/gcnv-segmented-calls14400794845073966734 --sample_index 0
Stdout: 13:57:05.312 INFO segment_gcnv_calls - PYTENSOR_FLAGS environment variable has been set to: device=cpu,floatX=float64,optimizer=fast_run,compute_test_value=ignore,openmp=true,blas__ldflags=-lmkl_rt,openmp_elemwise_minsize=10,exception_verbosity=high
13:57:05.312 INFO segment_gcnv_calls - Loading ploidy calls...
13:57:05.312 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
13:57:05.312 INFO segment_gcnv_calls - Instantiating the Viterbi segmentation engine...
13:57:05.370 INFO gcnvkernel.postprocess.viterbi_segmentation - Assembling interval list and copy-number class posterior from model shards...
13:57:05.536 INFO gcnvkernel.io.io_intervals_and_counts - The given interval list provides the following interval annotations: {'GC_CONTENT'}
13:57:05.730 INFO gcnvkernel.structs.metadata - Generating intervals metadata...
13:57:05.799 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor forward-backward function...
13:57:22.603 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor Viterbi function...
13:57:28.831 INFO gcnvkernel.postprocess.viterbi_segmentation - Compiling pytensor variational HHMM...
13:57:38.394 INFO gcnvkernel.postprocess.viterbi_segmentation - Processing sample index: 0, sample name: S29...
13:57:38.422 INFO gcnvkernel.postprocess.viterbi_segmentation - Segmenting contig (1/24) (contig name: chr1)...

Stderr: Traceback (most recent call last):
  File "/tmp/segment_gcnv_calls.1199447359357923802.py", line 93, in <module>
    viterbi_engine.write_copy_number_segments()
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/postprocess/viterbi_segmentation.py", line 256, in write_copy_number_segments
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/postprocess/viterbi_segmentation.py", line 141, in _viterbi_segments_generator
  File ".snakemake/conda/febadccea00892907b6e487236c1170a_/lib/python3.10/site-packages/gcnvkernel-0.9-py3.10.egg/gcnvkernel/structs/metadata.py", line 263, in get_sample_ploidy_metadata
KeyError: 'S29'

	at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
	at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:193)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:168)
	at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:139)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.executeSegmentGermlineCNVCallsPythonScript(PostprocessGermlineCNVCalls.java:739)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.generateSegmentsVCFFileFromAllShards(PostprocessGermlineCNVCalls.java:485)
	at org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCalls.onTraversalSuccess(PostprocessGermlineCNVCalls.java:456)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1123)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
	at org.broadinstitute.hellbender.Main.main(Main.java:306)
``

@gokalpcelik
Copy link
Contributor

Hi @hkirmak
When --contig-ploidy-calls path is given you need to give the path to the folder that keeps all SAMPLE_0 SAMPLE_1 etc folders inside. Can you fix your parameter and try to see if it works?

@hkirmak
Copy link
Author

hkirmak commented Jan 9, 2025

Hi,

Yes that solved the problem, since the instructions were about giving the absolute path to SAMPLE_x files, I misunderstood and included that in the parameter part. Thank you for you help.

@hkirmak hkirmak closed this as completed Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants