Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: filter manta calls #1371

Merged
merged 36 commits into from
Feb 10, 2024
Merged

feat: filter manta calls #1371

merged 36 commits into from
Feb 10, 2024

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Jan 26, 2024

Description

In here a number of different issues will be solved by a few small changes.

  1. The original issue where the solution of keeping MaxDepth filtered Manta variants only had TGA cases, and the reason they were labeled as MaxDepth is because they weren't run with "--exome" option in Manta as is recommended by Manta docs. By adding this --exome flag we shouldn't be required anymore to have extra bcftools filters to rescue MaxDepth variants.
  2. The bcftools filter that aims to keep MaxDepth variants is not correct which results in keeping variants with additional filters as well, by removing the MaxDepth rescue filter entirely, these should be correctly filtered out.
  3. Manta variants called on TGA has a lot of variants with MaxDepth with an additional side effect of having PR:SR 0,0:0,0. Here I add --exome tag to Manta rules for TGA to get the correct stats.
  4. The majority of Manta calls in some samples have very low evidence, with sometimes more than 90% of variants being below 1% VAF and with very little read support. To solve this a bcftools filter command was added to the Manta rules, requiring at least 4 reads to support the variant (which is now possible to do because MaxDepth isn't added, which preserves the PR:SR info)

Relevant issues:

Added

  • bcftools filter to all Manta rules to remove all variants with support from fewer than 4 reads.
  • "--exome" flag to Manta runs on TGA cases to remove MaxDepth filter

Changed

  • None

Fixed

  • None

Removed

  • MaxDepth from list of filters to keep in bcftools view commands

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • docs/balsamic_sv_cnv.rst (added information about new filters, and removed info about keeping MaxDepth)

Tests

Feature Tests

Verify correct Manta settings and filters

Manta setting --exome should only be present in TGA cases

See TGA settings in screenshot below:

cleanfowl T only:

image

(exome argument set)

See WGS setting in screenshot below:

Screenshot 2024-02-09 at 08 58 45

(no exome argument set)

Bcftools filter low_pr_sr_count should be correctly applied:

See screenshot of example header from Manta T+N rule and Manta T-only rules below:

cleanfowl TGA T-only

image

case H WGS T-only

image

See example of applied filter in VCF example variant below in case sweetelf:

low_pr_sr_count
5 84761092 MantaINV:13082:1:2:0:1:0 C <INV> . low_pr_sr_count END=99239925;SVTYPE=INV;SVLEN=14478833;CIPOS=0,1;CIEND=-1,0;HOMLEN=1;HOMSEQ=A;INV3 PR:SR 774,1:2188,2

PR:SR = 774,1:2188,2
Alt PR:SR = 1 + 2 = 3 smaller than 4 filter should be set!

PASS
22 37332762 MantaBND:23188:1:2:0:0:0:0 T T]10:35058943] . PASS SVTYPE=BND;MATEID=MantaBND:23188:1:2:0:0:0:1;CIPOS=0,6;HOMLEN=6;HOMSEQ=CAGGGA;BND_DEPTH=3401;MATE_BND_DEPTH=6 PR:SR 475,1:1146,4

PR:SR = 475,1:1146,4
Alt PR:SR = 1 + 5 = 6 larger than 3 filter should not be set!

Validation status:

  • Successful

Validation of FLT3 detection in TGA cases

Criteria

In all downsampled HD829 cases tested the following criteria is met:

  • FLT3:p.ITD300 is detected with manta in the same cases it was detected previously.
  • FLT3:p.ITD300 is detected in sweetelf
Results

First row is validation of V13.0.0
Bottom row is the results from this PR

Gene:p.aa Chr Position Mutation type cleanfowl proudsquid modestjaguar dearmarmot holykid pass/fail
FLT3:p.ITD300 13:28608030-28608060 (around 28608043) DUP (~300bp) VarDict 297 length (pos: 28608040), Manta 284 length (pos: 28608046) Manta+DellySV: 284 length Vardict: 294 length, Manta and DellySV 284 length (pos: 28608046) Vardict: 292 length, Manta and DellySV length 284 (pos: 28608046) Manta+DellySV: 284 length pass
FLT3:p.ITD300 13:28608030-28608060 (around 28608043) DUP (~300bp) VarDict 297 length (pos: 28608040), Manta 284 length (pos: 28608046) Manta and DellySV 284 length (pos: 28608046) VarDict 294 length (pos: 28608042), Manta and DellySV 284 length (pos: 28608046) VarDict 292 (pos: 28608043), Manta and DellySV length 284 (pos: 28608046) Manta and DellySV 284 length (pos: 28608046) pass

As a side-note some of the above variants were not detected in Manta or DellySV previous to version 13.0.0, and were most likely rescued due to the inclusion of more informative reads for SV calling.

Sweetelf comparison of FLT3 detection in v13.0.0 and current PR.

caller case variant chromosome position ref alt VAF
v13.0.0 VarDict sweetelf p.ITD300 13 28608043 - 291bp-ins 0,0266
v13.0.0 Manta and DellySV sweetelf p.ITD300 13 28608046 - 284bp-ins 0 due to MaxDepth
this PR VarDict sweetelf p.ITD300 13 28608043 - 291bp-ins 0.0266
this PR Manta and DellySV  sweetelf p.ITD300 13 28608046 - 284bp-ins PR:SR 1605,32:5837,137

Validation status:

  • Successful

Validation for detection of known SVs in WGS cases

Criteria

The following criteria need to be met:

  • For all known variants in the table below, all need to have a PR:SR above 3 in the tumor, and not have MaxDepth filter set in the previous validation. As nothing else has changed in the current PR, verifying this is sufficient to understand that these variants would not be filtered out in the update.
Results

original case-ids hidden, find real case names here: https://docs.google.com/document/d/12x3ozQE62bk_yWNTBW_u8vZMqhpKFRMW3yizzM_mWZs/edit

Mutation Case SVTYPE Detected ininitally in balsamic version Detected previously by tool v13.0.0 Chr;Pos 1 v13.0.0 Chr;Pos 2 v13.0.0 PR:SR values v13.0.0 filter Pass / Fail
LAMB1-AXIN1 fusion protein A BND 10.0.2 manta, dellysv, tiddit 7;107609345 16;365170 145,31:208,29 PASS PASS
NAB2-STAT6 fusion B INV 10.0.2 manta, dellysv, tiddit 12;57488292 12;57493354 89,28:118,33 PASS PASS
GAB1-ABL1 C BND 11.0.2 manta, dellysv, tiddit 4;144373942 9;133657503 60,38:86,40 PASS PASS
inv(2);RANBP2::ALK D INV 11.0.2 manta, dellysv, tiddit 2;102611407 2;109375698 86,37:116,43 PASS PASS
Del: CDKN2A+CDKN2B E DEL 11.0.2 manta, dellysv, tiddit, dellycnv 9;21878995 9;22148438 30,33:35,25 PASS PASS
Del: IKZF1 (exon 2-7) F DEL 10.0.2 manta, dellysv, tiddit, dellycnv 7;50347374 7;50463638 21,68:23,44 PASS PASS

Validation status:

  • Successful

Validation of detection of FLT3 in WGS cases

original case-ids hidden, find real case names here: https://docs.google.com/document/d/12x3ozQE62bk_yWNTBW_u8vZMqhpKFRMW3yizzM_mWZs/edit

This has not been previously tested in Balsamic, but WGS cases with known FLT3-ITD has been requested from cust110 and 4 WGS cases with known FLT3-ITD have been shared:

  • G, tumor only - previously found in WGS and in myeloid panel
  • H, tumor only - previously found in WGS and in myeloid panel
  • I, tumor only - previously found at 8% VAF in panel, but missed in WGS previously
  • J, tumor + normal - previously found at 90% VAF in panel, but missed in WGS previously

Out of these 4 shared cases however, only 1, H had a FLT3-ITD variant called by Manta, which will have to serve to investigate the effect of removing MaxDepth on WGS cases with FLT3-ITD.

Criteria

The following criteria need to be met:

  • FLT3-ITD variant in WGS case H is detected in this PR.
Results
observed in balsamic version case found by caller genomic position VAF variant type
12.0.0 H Manta, TNscope 13, 28608221 Manta PR:SR 0,0:20,37 ; TNscope 0.405 69 base insertion
this PR H Manta, TNscope 13, 28608221 Manta PR:SR 0,0:20,37 ; TNscope 0.405 69 base insertion

Validation status:

  • Successful

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified (in i.e sweetelf)
  • TGA T/N Workflow
    • N/A
    • Verified (in K)
  • UMI T/O Workflow
    • N/A
    • Verified (in uphippo)
  • UMI T/N Workflow
    • N/A
    • Verified (in equalbug)
  • WGS T/O Workflow
    • N/A
    • Verified (in case H)
  • WGS T/N Workflow
    • N/A
    • Verified (in i.e case B)
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

  • Atlas documentation
    • N/A
    • Updated: [Link]
  • Web portal for Clinical Genomics
    • N/A
    • Updated: [Link]

User Changes

  • N/A
  • This PR affects the output files or results.
    • User feedback is considered unnecessary because [Justification: All clinically relevant variants for the affected workflow has been verified to be detected, and a minimum read-support to support variant-calling is mostly an issue of quality and a decision that ought to be left for the bioinformatician to make -- unless desires has been made clear by geneticist to receive more variant-calls for SVs of dubious quality, which in this case has been the opposite. Feedback has been provided by geneticists that we are delivering too Many SVs of low quality and that they want help with some pre-filtering which this is doing in a limited way.].
    • Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Integration Tests

  • N/A
  • Test [Description]
    • [Screenshot]

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (790ad8a) 99.42% compared to head (2337ec0) 99.44%.
Report is 87 commits behind head on develop.

Files Patch % Lines
BALSAMIC/models/validators.py 85.71% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1371      +/-   ##
===========================================
+ Coverage    99.42%   99.44%   +0.02%     
===========================================
  Files           41       39       -2     
  Lines         1916     1986      +70     
===========================================
+ Hits          1905     1975      +70     
  Misses          11       11              
Flag Coverage Δ
unittests 99.44% <99.73%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathiasbio mathiasbio changed the title feat: fix max depth feat: fix manta calls Jan 26, 2024
@mathiasbio mathiasbio changed the title feat: fix manta calls feat: filter manta calls Jan 26, 2024
@mathiasbio mathiasbio linked an issue Feb 6, 2024 that may be closed by this pull request
@mathiasbio mathiasbio marked this pull request as ready for review February 6, 2024 09:30
@mathiasbio mathiasbio requested a review from a team as a code owner February 6, 2024 09:30
@mathiasbio mathiasbio changed the base branch from develop to master February 6, 2024 09:30
@mathiasbio mathiasbio added this to the Release 14 milestone Feb 6, 2024
@mathiasbio mathiasbio self-assigned this Feb 6, 2024
@ivadym
Copy link
Contributor

ivadym commented Feb 6, 2024

Let's use develop as the base branch. Maybe we'll include another PR or two for v14 :)

@mathiasbio mathiasbio changed the base branch from master to develop February 6, 2024 09:36
Copy link
Contributor

@fevac fevac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing is that I think it would be better if the manta filters were in a separate rule and not included in the manta rule. I understand this would mean more work though. Alternatively I wonder if the filtering step can be added to the svdb filtering, although that would potentially filter SVs from other tools too. But would be cleaner in my opinion

Copy link
Contributor

@ivadym ivadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 💯 🎸 👨‍🎤 🔥

Copy link

sonarqubecloud bot commented Feb 8, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@mathiasbio mathiasbio merged commit 511d870 into develop Feb 10, 2024
8 checks passed
@mathiasbio mathiasbio deleted the fix_max_depth branch February 10, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment