-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs around getting started. #13
Conversation
#### Initializing the data | ||
|
||
1. Run `bundle install && rake -l ./tasks/elasticsearch.rake elasticsearch:init` | ||
2. Run `??????????????`, which places the report data in `data/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konklone What do I have to do here?
Sorry for the delay in responding! I was at DEFCON without a computer for several days. I should really do some more dedicated docs in the README, but the short of it is that the data comes from this @unitedstates project. The |
Let's leave this PR open, and I'll add to it with more docs later (unless you beat me to it) and merge it in. |
That's easy enough. Curious, however, that there is no Then installation of this dir is as simple as |
It's currently 32GB of data - it's not good for git. So, someone needs to scrape it all themselves, if they want a complete import. However, you can scrape any subset you want to have something to load in. It might be nice to provide a helpful bulk data sample, though keeping it up to date if there are schema changes would be annoying. In the long run, I want to get the entire dataset regularly into the Internet Archive's archives, but that's not done yet. In the meantime, the setup instructions require a separate project to be downloaded, and scrapers run. |
The Internet Archive thing needn't be a long run thing, actually - it's something I plan to do in the next few weeks. I'm tracking that issue at unitedstates/inspectors-general#63. IA has a very nice S3-compatible interface for bulk uploading. IA integration is an issue I'm not looking for help on, I want to be the point person on that. I don't think the size of my collection will require any negotiation on my or the project's part, or whether they'd be interested in making a more dedicated collection view for it, but I'm not certain, I just haven't dug in yet. |
@parkr, I just spent some time and more fully documented the project in the README. Let me know if there's still stuff missing. |
I had no clue how to get started so I monkeyed around until I figured most of it out. Still unclear about how to download the report data so
tasks/inspectors.js
can process it and index it in ES.