Fix record oriented shuffle #599

fnothaft · 2015-03-01T18:50:45Z

Due to our shuffle being record oriented, we experience an approximately 8-10x increase in data volume when we shuffle. This is because our data is stored on disk in a columnar representation, but is shuffled in a row oriented format.

tdanford · 2015-03-01T20:13:58Z

So what's the proposed fix?

fnothaft · 2015-03-01T20:43:33Z

TBD?

tdanford · 2015-03-01T20:45:03Z

Gotcha.

ryan-williams · 2015-05-31T04:42:40Z

FTR: presumably @massie's SPARK-7263 is our best hope here?

Resolves bigdatagenomics#599. Since we have added the RecordGroupMetadata fields in bdg-formats:0.7.0, we can read/write our metadata as separate Avro files. We process these files when loading/writing the Parquet files where the alignment data is stored. This allows us to both eliminate the bulky metadata that we are currently storing in the AlignmentRecord, while maintaining the Sequence and RecordGroup dictionaries that we need to keep around.

fnothaft added this to the 0.17.0 milestone Mar 1, 2015

tomwhite mentioned this issue Apr 27, 2015

Support Hive-style partitioning #651

Closed

ryan-williams mentioned this issue May 31, 2015

Publish/socialize a roadmap #591

Closed

fnothaft modified the milestones: 1.0.0, 0.17.0 May 31, 2015

fnothaft mentioned this issue Dec 27, 2015

ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906

Merged

heuermh closed this as completed in #906 Jan 12, 2016

heuermh modified the milestones: 1.0.0, 0.20.0 Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix record oriented shuffle #599

Fix record oriented shuffle #599

fnothaft commented Mar 1, 2015

tdanford commented Mar 1, 2015

fnothaft commented Mar 1, 2015

tdanford commented Mar 1, 2015

ryan-williams commented May 31, 2015

Fix record oriented shuffle #599

Fix record oriented shuffle #599

Comments

fnothaft commented Mar 1, 2015

tdanford commented Mar 1, 2015

fnothaft commented Mar 1, 2015

tdanford commented Mar 1, 2015

ryan-williams commented May 31, 2015