Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Delly 0.9.1 to 1.0.3 #35

Merged
merged 1 commit into from
Jul 6, 2022

Conversation

Faizal-Eeman
Copy link
Contributor

@Faizal-Eeman Faizal-Eeman commented Jul 6, 2022

Description

This commit will upgrade Delly v0.9.1 to v1.0.3. No difference is observed in significant variant calls i.e. having good mapping quality, but runtime has increased with v1.0.3.

v0.9.1 vs v1.0.3

Variant Calls

1. SVs with Good Mapping Quality - No differences in the SV calls were found between v0.9.1 and v1.0.3 for those variants with Mapping Quality (MAPQ) > 0

  • /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-mini_S2.T-0/delly0.9.1-vs-1.0.3_MAPQ_gt_zero_filtered.sha512
  • /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-full_T5.T/delly0.9.1-vs-1.0.3_MAPQ_gt_zero_filtered.sha512

2. SVs with Bad Mapping Quality - Differences were seen in the number of SVs called by v0.9.1 and v1.0.3 for only those variants with Mapping Quality (MAPQ) = 0. The difference in the number of variants called, appears to be predominantly due to the difference in the Quality Scores of these variants between v0.9.1 and v1.0.3.

  • /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-mini_S2.T-0/delly0.9.1-vs-1.0.3_difference_only_in_MAPQ_zero.txt
  • /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-full_T5.T/delly0.9.1-vs-1.0.3_difference_only_in_MAPQ_zero.txt

Runtime with stringent filters applied

A-mini

  • Delly v0.9.1 (F2 node) - 13m 33.9s
  • Delly v1.0.3 (F2 node) - 20m 32s

A-full

  • Delly v1.0.3 (F16 node) - 17h 53m 49s
  • Delly v0.9.1 (F72 node) - 18h 16m 23.4s
  • Delly v1.0.3 (F72 node) - 18h 16m 15s
  • Delly v1.0.3 (F32 node) - 20h 14m 1s

Real Sample - ILHNLNEV000001-T001-P01-F

  • Delly v1.0.3 (F16 node) - 8h 46m 31s
  • Delly v0.8.7 (F32 node) - 8h 37m 34s
  • Delly v1.0.3 (F32 node) - 9h 38m 29s

Links

Delly v1.0.3 GitHub release - https://github.com/dellytools/delly/releases/tag/v1.0.3
BL CDS Docker Registry for Delly - https://hub.docker.com/repository/docker/blcdsdockerregistry/delly

Closes #26

Testing Results

  • DNA A-mini

    • sample: A-mini S2.T-0
    • node type: F2
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/input/tumor_control_pair_0.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-mini-hg38.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-mini/call-sSV-2.0.0/S2.T-0/DELLY-1.0.3/output/
  • DNA A-full

    • sample: T5.T
    • node type: F16
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-f1s-config/input/tumor_control_pair_afull.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-F16-config/config/A-full-hg19.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-F16-config/A-full
  • DNA A-full

    • sample: A-full T5.T
    • node type: F32
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/input/tumor_control_pair_afull.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-full-hg19.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-full/call-sSV-2.0.0/T5.T.sorted_py/DELLY-1.0.3/output/
  • DNA A-full

    • sample: A-full T5.T
    • node type: F72
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/input/tumor_control_pair_afull.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-full-F72-hg19.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-full-F72/call-sSV-2.0.0/T5.T.sorted_py/DELLY-1.0.3/output/
  • DNA Real Sample

    • sample: ILHNLNEV000001-T001-P01-F
    • node type: F16
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-F16-config/input/tumor_control_pair_ILHNLNEV000001-T001-P01-F.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-F16-config/config/ILHNLNEV000001-T001-P01-F-hg38.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-F16-config/ILHNLNEV000001-T001-P01-F/
  • DNA Real Sample

    • sample: ILHNLNEV000001-T001-P01-F
    • node type: F32
    • input csv: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/input/tumor_control_pair_ILHNLNEV000001-T001-P01-F.csv
    • config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/ILHNLNEV000001-T001-P01-F-F32-hg38.config
    • output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/ILHNLNEV000001-T001-P01-F-F32/call-sSV-2.0.0/ILHNLNEV000001-T001-P01-F_realigned_recalibrated_reheadered/DELLY-1.0.3/output/

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the [github standards](https://confluence.mednet.ucla.edu/pages/viewpage.action?spaceKey=BOUTROSLAB&title=GitHub+Standards#GitHubStanda
    rds-Branchprotectionrule) before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one DNA A-mini sample and one A-full sample. The paths to the test config files and output directories were attached above.

Copy link
Contributor

@tyamaguchi-ucla tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice work!!

@tyamaguchi-ucla
Copy link
Contributor

For benchmarking v0.9.1 vs v1.0.3, we might want to use one of the real samples as well since the A-mini/full tumors are simulated tumor samples. https://confluence.mednet.ucla.edu/pages/viewpage.action?pageId=82391152

@tyamaguchi-ucla tyamaguchi-ucla merged commit 2b9458e into main Jul 6, 2022
@Faizal-Eeman
Copy link
Contributor Author

For benchmarking v0.9.1 vs v1.0.3, we might want to use one of the real samples as well since the A-mini/full tumors are simulated tumor samples. https://confluence.mednet.ucla.edu/pages/viewpage.action?pageId=82391152

Sure, I'll run the pipeline on a real sample and include the result paths here. Should we rather have a confluence page to document the benchmarking?

@Faizal-Eeman Faizal-Eeman self-assigned this Jul 6, 2022
@tyamaguchi-ucla
Copy link
Contributor

tyamaguchi-ucla commented Jul 6, 2022

For benchmarking v0.9.1 vs v1.0.3, we might want to use one of the real samples as well since the A-mini/full tumors are simulated tumor samples. https://confluence.mednet.ucla.edu/pages/viewpage.action?pageId=82391152

Sure, I'll run the pipeline on a real sample and include the result paths here. Should we rather have a confluence page to document the benchmarking?

For now, we can update this section in README.md here - https://github.com/uclahs-cds/pipeline-call-sSV/blob/main/README.md#testing-and-validation #22

@pboutros
Copy link

pboutros commented Jul 7, 2022 via email

@tyamaguchi-ucla
Copy link
Contributor

tyamaguchi-ucla commented Jul 7, 2022

Consider submitting the increased run-time as an issue on delly to see if it's expected/unexpected? Perhaps the developer could optimize away the increased time if they're unaware of it.

@pboutros Sure, we'll benchmark a few more samples using the same node type and see if we consistently observe the increased run-time because each node type has a different expected network bandwidth although I don't think the difference in the network bandwidth can explain the 2h difference.

(e.g. F32 16000Mbps vs F72 30000Mbps) https://docs.microsoft.com/en-us/azure/virtual-machines/fsv2-series

@Faizal-Eeman Faizal-Eeman deleted the mmoootor-upgrade-delly-0.9.1-to-1.0.3 branch July 11, 2022 23:09
@Faizal-Eeman
Copy link
Contributor Author

F16 configuration recommended! The larger, A-full datasets appear to be running with a relatively shorter runtime with F16 node configurations as compared to running on F32/72 node configurations.

Related PR - #36

@tyamaguchi-ucla
Copy link
Contributor

F16 configuration recommended! The larger, A-full datasets appear to be running with a relatively shorter runtime with F16 node configurations as compared to running on F32/72 node configurations.

Related PR - #36

We would want to be careful here because we know that the cluster environment (cluster network burden, etc) can affect the run time in general. We still want to recommend using an F16 node to run this pipeline because Delly can only use a single CPU (for now) and the memory usage is less than 20GB for a large sample like A-full. The recommendation would change once we add other tools like Manta though.

@Faizal-Eeman
Copy link
Contributor Author

Right, noted on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Delly version 0.9.1 -> 1.0.3
3 participants