Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use larger blocksize with bwa index on >100MB files #13

Merged
merged 6 commits into from
Jan 31, 2017

Conversation

unode
Copy link
Member

@unode unode commented Jan 30, 2017

This reduces CPU time required for indexing drastically at the expense of higher memory usage.

The same machine that will run bwa mem should be able to accommodate this without problems.

customBlockSize :: FilePath -> IO [String]
customBlockSize path = sizeAsParam <$> getFileSize path

sizeAsParam :: FileOffset -> [String]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here with the same information as you have in the PR (from my POV, feel free to copy & paste):

BWA's default indexing parameters are quite conservative. This leads to
a small memory footprint at the cost of more CPU hours.
With large databases (~100GB) default settings require over 2 weeks of
CPU time. Increasing the default blocksize will increase the memory
footprint but will reduce indexing time 3 to 6 fold.

This patch increases the blocksize to roughly 1/10th of the filesize.
The memory footprint should be about the size of the database.

As per lh3/bwa#104 this patch may become
obsolete once this functionality is built into bwa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but tests still failing. Will check what's wrong with the tests later.

unode added 6 commits January 31, 2017 17:59
BWA's default indexing parameters are quite conservative. This leads to
a small memory footprint at the cost of more CPU hours.
With large databases (~100GB) default settings require over 2 weeks of
CPU time. Increasing the default blocksize will increase the memory
footprint but will reduce indexing time 3 to 6 fold.

This patch increases the blocksize to roughly 1/10th of the filesize.
The memory footprint should be about the size of the database.

As per lh3/bwa#104 this patch may become
obsolete once this functionality is built into bwa.
@unode
Copy link
Member Author

unode commented Jan 31, 2017

Tests fixed.

Also included a few other commits that made it easier to debug what was going on with TravisCI.
For whatever reason I wasn't able to reproduce the problem the first few times I tried.

@luispedro
Copy link
Member

Thanks. Merging!

@luispedro luispedro merged commit 09e17ce into ngless-toolkit:master Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants