Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added resource usage section to the GermlineCNVCaller java doc. #8064

Merged
merged 1 commit into from
Mar 28, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,32 @@
* https://theano-pymc.readthedocs.io/en/latest/library/config.html</a>.
* </p>
*
* <h3>Resource usage</h3>
*
* <p>Runtime and memory usage for {@link GermlineCNVCaller} can be impacted by (1) the number of input samples, (2) the
* number of intervals, (3) the highest allowed copy-number state (set using the {@code max-copy-number} argument),
* (4) the number of bias factors (set using the {@code max-bias-factors} argument), and convergence criteria.</p>
*
* <p>We recommend running {@link GermlineCNVCaller} in COHORT mode for approximately 200 samples at a time, processing
* between 5k to 12.5k intervals, and {@code max-copy-number} set to 5 across all analyses. For 200 samples and
* 5k intervals, approximately 16GB of memory should be enough to optimize memory usage; for the same
* analysis at 12.5k intervals, we recommend 32GB of memory. Runtimes are on the order of a few hours.</p>
*
* <p>Note that {@link GermlineCNVCaller} can be run on larger interval sets by scattering them into smaller "shards."
* The shards can subsequently be merged together by {@link PostprocessGermlineCNVCalls} tool. In cloud
* and HPC environments, the tool can then process each shard in parallel within a single job.</p>
*
* <p>By default, {@link GermlineCNVCaller} will attempt to use all CPU cores accessible to it within the runtime
* environment. Two environment variables - <code>MKL_NUM_THREADS</code> and <code>OMP_NUM_THREADS</code> - control the
* parallelism of the underlying linear algebra libraries.</p>
*
* <p>Runtime is also affected by how fast the inference procedure converges. There are multiple tool arguments that can
* be used to set convergence criteria that could speed up this convergence, including but not limited to
* {@code caller-update-convergence-threshold}, {@code convergence-snr-averaging-window},
* {@code convergence-snr-countdown-window}, and {@code convergence-snr-trigger-threshold}. However, modifying these
* arguments from the default settings might affect the final results, so please exercise caution if
* including any of these arguments.</p>
*
* <h3>Tool run modes</h3>
* <dl>
* <dt>COHORT mode:</dt>
Expand Down