-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Delly 0.9.1 to 1.0.3 #35
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice work!!
For benchmarking v0.9.1 vs v1.0.3, we might want to use one of the real samples as well since the A-mini/full tumors are simulated tumor samples. https://confluence.mednet.ucla.edu/pages/viewpage.action?pageId=82391152 |
Sure, I'll run the pipeline on a real sample and include the result paths here. Should we rather have a confluence page to document the benchmarking? |
For now, we can update this section in |
Consider submitting the increased run-time as an issue on delly to see if it's expected/unexpected? Perhaps the developer could optimize away the increased time if they're unaware of it.
…________________________________
UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the intended recipient, please immediately notify us by return email, and delete this message from your computer.
|
@pboutros Sure, we'll benchmark a few more samples using the same node type and see if we consistently observe the increased run-time because each node type has a different expected network bandwidth although I don't think the difference in the network bandwidth can explain the 2h difference. (e.g. F32 16000Mbps vs F72 30000Mbps) https://docs.microsoft.com/en-us/azure/virtual-machines/fsv2-series |
F16 configuration recommended! The larger, A-full datasets appear to be running with a relatively shorter runtime with F16 node configurations as compared to running on F32/72 node configurations. Related PR - #36 |
We would want to be careful here because we know that the cluster environment (cluster network burden, etc) can affect the run time in general. We still want to recommend using an F16 node to run this pipeline because Delly can only use a single CPU (for now) and the memory usage is less than 20GB for a large sample like A-full. The recommendation would change once we add other tools like Manta though. |
Right, noted on this! |
Description
This commit will upgrade Delly v0.9.1 to v1.0.3. No difference is observed in significant variant calls i.e. having good mapping quality, but runtime has increased with v1.0.3.
v0.9.1 vs v1.0.3
Variant Calls
1. SVs with Good Mapping Quality - No differences in the SV calls were found between v0.9.1 and v1.0.3 for those variants with Mapping Quality (MAPQ) > 0
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-mini_S2.T-0/delly0.9.1-vs-1.0.3_MAPQ_gt_zero_filtered.sha512
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-full_T5.T/delly0.9.1-vs-1.0.3_MAPQ_gt_zero_filtered.sha512
2. SVs with Bad Mapping Quality - Differences were seen in the number of SVs called by v0.9.1 and v1.0.3 for only those variants with Mapping Quality (MAPQ) = 0. The difference in the number of variants called, appears to be predominantly due to the difference in the Quality Scores of these variants between v0.9.1 and v1.0.3.
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-mini_S2.T-0/delly0.9.1-vs-1.0.3_difference_only_in_MAPQ_zero.txt
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmoootor-upgrade-delly-0.9.1-to-1.0.3/delly-0.9.1-vs-1.0.3_validation/A-full_T5.T/delly0.9.1-vs-1.0.3_difference_only_in_MAPQ_zero.txt
Runtime with stringent filters applied
A-mini
A-full
Real Sample - ILHNLNEV000001-T001-P01-F
Links
Delly v1.0.3 GitHub release - https://github.com/dellytools/delly/releases/tag/v1.0.3
BL CDS Docker Registry for Delly - https://hub.docker.com/repository/docker/blcdsdockerregistry/delly
Closes #26
Testing Results
DNA A-mini
DNA A-full
DNA A-full
DNA A-full
DNA Real Sample
DNA Real Sample
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the [github standards](https://confluence.mednet.ucla.edu/pages/viewpage.action?spaceKey=BOUTROSLAB&title=GitHub+Standards#GitHubStanda
rds-Branchprotectionrule) before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one DNA A-mini sample and one A-full sample. The paths to the test config files and output directories were attached above.