Add docs around getting started. #13

parkr · 2014-08-06T23:33:20Z

I had no clue how to get started so I monkeyed around until I figured most of it out. Still unclear about how to download the report data so tasks/inspectors.js can process it and index it in ES.

parkr · 2014-08-06T23:33:32Z

README.md

+#### Initializing the data
+
+1. Run `bundle install && rake -l ./tasks/elasticsearch.rake elasticsearch:init`
+2. Run `??????????????`, which places the report data in `data/`


@konklone What do I have to do here?

konklone · 2014-08-12T04:42:45Z

Sorry for the delay in responding! I was at DEFCON without a computer for several days.

I should really do some more dedicated docs in the README, but the short of it is that the data comes from this @unitedstates project. The data/ directory for that project should be, or be symlinked to be, the data/ directory for this project.

konklone · 2014-08-12T04:43:08Z

Let's leave this PR open, and I'll add to it with more docs later (unless you beat me to it) and merge it in.

parkr · 2014-08-14T16:16:39Z

the short of it is that the data comes from this @unitedstates project. The data/ directory for that project should be, or be symlinked to be, the data/ directory for this project.

That's easy enough. Curious, however, that there is no data directory in the project to which you linked. Is that intentional? Why aren't you tracking the data? Does every person have to scrape for himself or herself? Why not use a submodule in both projects for this directory and populate and update as new reports come in?

Then installation of this dir is as simple as git submodule update --init.

konklone · 2014-08-14T16:47:44Z

It's currently 32GB of data - it's not good for git. So, someone needs to scrape it all themselves, if they want a complete import. However, you can scrape any subset you want to have something to load in.

It might be nice to provide a helpful bulk data sample, though keeping it up to date if there are schema changes would be annoying. In the long run, I want to get the entire dataset regularly into the Internet Archive's archives, but that's not done yet. In the meantime, the setup instructions require a separate project to be downloaded, and scrapers run.

konklone · 2014-08-14T16:50:01Z

The Internet Archive thing needn't be a long run thing, actually - it's something I plan to do in the next few weeks. I'm tracking that issue at unitedstates/inspectors-general#63. IA has a very nice S3-compatible interface for bulk uploading.

IA integration is an issue I'm not looking for help on, I want to be the point person on that. I don't think the size of my collection will require any negotiation on my or the project's part, or whether they'd be interested in making a more dedicated collection view for it, but I'm not certain, I just haven't dug in yet.

konklone · 2014-08-16T22:22:50Z

@parkr, I just spent some time and more fully documented the project in the README. Let me know if there's still stuff missing.

Add docs around getting started.

75308a5

parkr reviewed Aug 6, 2014
View reviewed changes

Docs for my proposed solution.

7e1516f

konklone closed this in 59d7508 Aug 16, 2014

parkr deleted the better-docs branch August 17, 2014 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add docs around getting started. #13

Add docs around getting started. #13

parkr commented Aug 6, 2014

parkr Aug 6, 2014

konklone commented Aug 12, 2014

konklone commented Aug 12, 2014

parkr commented Aug 14, 2014

konklone commented Aug 14, 2014

konklone commented Aug 14, 2014

konklone commented Aug 16, 2014

Add docs around getting started. #13

Add docs around getting started. #13

Conversation

parkr commented Aug 6, 2014

parkr Aug 6, 2014

Choose a reason for hiding this comment

konklone commented Aug 12, 2014

konklone commented Aug 12, 2014

parkr commented Aug 14, 2014

konklone commented Aug 14, 2014

konklone commented Aug 14, 2014

konklone commented Aug 16, 2014