Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mango using Partitioned parquet ADAM #361

Closed
wants to merge 17 commits into from

Conversation

jpdna
Copy link
Member

@jpdna jpdna commented Feb 13, 2018

Replaces #358
works with ADAM: #1911

Example mango run

mango-submit --master yarn --num-executors 10 --executor-cores 8 --executor-memory 10g --driver-memory 20g  -- /home/eecs/akmorrow/builds/hg19.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads hdfs://{headnodepath}/user/jpaschall/mango1/HG00096_Jan30_v2.adam -show_genotypes

The problem with "chr" fix was resolved by querying both with and without "chr" prefix

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/575/
Test FAILed.

@jpdna
Copy link
Member Author

jpdna commented Feb 16, 2018

More recent test of this Mango code with partitioned dataset of the large Platinum genome files can be done with this command

mango-submit --master yarn --num-executors 10 --executor-cores 4 --executor-memory 20g --driver-memory 20g  -- /home/eecs/akmorrow/builds/hg19.2bit \
-genes http://www.biodalliance.org/datasets/ensGene.bb \
-reads hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/feb16_work/NA12877_S1.bam.partitioned.v3.adam,\
hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/feb16_work/NA12890_S1.bam.partitioned.v3.adam,\
hdfs://amp-bdg-master.amplab.net:8020/user/jpaschall/feb16_work/NA12889_S1.bam.partitioned.v3.adam\
-show_genotypes -parquetIsBinned

The "warm-up period", done upfront when using the -parquetIsBinned flag is still 30-45 sec one time, after that response time is 1-3 seconds when jumping around locations in genome browser,

@@ -297,6 +297,9 @@ class VizReadsArgs extends Args4jBase with ParquetArgs {
@Args4jOption(required = false, name = "-preload", usage = "Chromosomes to prefetch, separated by commas (,).")
var preload: String = null

@Args4jOption(required = false, name = "-parquetIsBinned", usage = "This turns on binned parquet pre-fetch warmup step")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-parquetIsBinned -> isPartitioned

@jpdna
Copy link
Member Author

jpdna commented Feb 26, 2018

This is replaced by #370

@jpdna jpdna closed this Feb 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants