Skip to content
This repository has been archived by the owner on Dec 10, 2018. It is now read-only.
A. Jesse Jiryu Davis edited this page Jun 4, 2013 · 3 revisions

Project Statement

Big Data is a key driver in today's database industry and for the future of 10gen. It would be a huge boon, to both our customers and our internal testing and development, to build a big data repository of our own using data dumps available from the internet. We would provide publicly downloadable BSON dumps of datasets and the tools we used to create them encouraging the public to contribute datasets of their own. Our experience will generate recommendations for schema design, and analysis of the datasets will provide valuable metrics like key sizes, document sizes, value types and their distribution. We may also find bugs in MongoDB or discover features that would make it better for projects like this.

Jira: https://jira.mongodb.org/browse/XGENTOOLS-265

Interns:

  • Suihui (Sweet) Song
  • Daniel Alabi

Skills to be learned

  1. Program optimization and efficiency for processing massive volumes of data
  2. Schema design for MongoDB and how to organize Big Data for MongoDB
  3. Experience with Big Data analysis and useful metrics for MongoDB planning