Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the data hosted? #3

Open
hbitteur opened this issue Apr 27, 2017 · 3 comments
Open

Where is the data hosted? #3

hbitteur opened this issue Apr 27, 2017 · 3 comments

Comments

@hbitteur
Copy link
Contributor

This GitHub repository only contains some tools to build and use the OMR Dataset. But where should we store the OMR data itself?

Expected size will start small, but could reach several GB. No real need for version control. It must be a public location.

The obvious ultimate target host is IMSLP, but right now we need a temporary storage location that we can play with during project development phase.

Any hint?

@lasconic
Copy link
Collaborator

I'm not sure at all IMSLP is the obvious ultimate host.
For the resulting training data, we could use https://www.kaggle.com/ as primary or secondary storage to raise awareness.

As primary storage, we could use an independent hosting like OSUOSL. MuseScore has a FTP share and I'm sure they would be happy to create one for an open dataset if needed.

For collaboration and sending files around during the development, S3 should do.
If we need versioning, gitLFS but I don't know any free host. Github only offers 1GB.

@hbitteur
Copy link
Contributor Author

We don't really need versioning. Compatibility based on file naming schema would be fine.

@lasconic
Copy link
Collaborator

Hangout April 28
What is data here?

  1. Training Data for full context deep learning classifier. Several GB.
  • FTP --> Nicolas will ask OSUOSL to give us a share and credentials for the 5 of us.
  • S3 --> For communication, we can use that maybe in the meantime. MuseScore can provide a link to a zip file
  1. The model for the classifier ~20MB --> GitLFS seems to be a workable solution for this.
    Price might be an issue
    https://help.github.com/articles/about-storage-and-bandwidth-usage/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants