Skip to content

Commit

Permalink
Merge pull request nasa-jpl-memex#760 from memex-explorer/bhard/docs_…
Browse files Browse the repository at this point in the history
…migrations

Docs about Migrations
  • Loading branch information
brittainhard committed Nov 5, 2015
2 parents 594f0a4 + 9e9e4b5 commit a980b3d
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/source/crawler_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Creating a Seeds List

Simply put, the seeds list should contain pages that are relevant to the topics you are searching. Both Nutch and Ache provide insight into the relevance of your seeds list, but in different ways.

For the purposes of memex-explorer, the extenstion and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.
For the purposes of memex-explorer, the extension and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.

Seeds lists are created on the seeds page, and seeds lists can be created from the add crawl page.

Expand Down
6 changes: 6 additions & 0 deletions docs/source/dev_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,9 @@ The Database Model
The current entity relation diagram:

.. image:: _static/img/DbVisualizer.png

Updating the Database
---------------------
As of version 0.4.0, Memex Explorer will start tracking all database migrations. This means that you will be able to upgrade your database and preserve the data without any issues.

If you are using a version that is 0.3.0 or earlier, and you are unable to update your database without server errors, the best course if action is to delete the existing `file at source/db.sqlite3` and start over with a fresh database.
2 changes: 1 addition & 1 deletion docs/source/manual_testing_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Project Creation

a. Click the new project button.
b. Provide a name and a description for the project on the next page, and press submit.
c. Veryify that your new project shows up on the project page list.
c. Verify that your new project shows up on the project page list.
d. Click on the new project and go to the project page. Verify that there are no crawls, models, or datasets yet.

Project Settings
Expand Down
3 changes: 1 addition & 2 deletions docs/source/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Web Crawling
With Memex Explorer you can create, run, and analyze `Nutch <http://nutch.apache.org/>`_ and `ACHE <https://github.com/ViDA-NYU/ache>`_ crawls. The crawl operation is heavily abstracted and simplified. Users provide a list of seed URLs to start the crawl, and in the case of ACHE's targeted crawling, a `machine learning model <https://github.com/ViDA-NYU/ache#build-a-model-for-aches-page-classifier>`_ to determine the relevancy of crawled pages.

Dataset Analysis
Memex Explorer allows you to upload a large number of files, which will be analyized by Tika and placed into our Elasticsearch instance. Tika will exctact metadata from these documents, giving you a better overiew of them.
Memex Explorer allows you to upload a large number of files, which will be analyzed by Tika and placed into our Elasticsearch instance. Tika will exctact metadata from these documents, giving you a better overview of them.

Domain Discovery Tool
Through the use of `Domain Discovery Tool <https://github.com/ViDA-NYU/domain_discovery_tool>`_, the user can search for content in the web and build data models based on clustering algorithms. The user can search the web and highlight relevant and irrelevant pages, and DDT will produce data model files, which you can use with Ache crawls in Memex Explorer.
Expand Down Expand Up @@ -98,4 +98,3 @@ Editing a Seeds List
Once you have created your seeds list, you can edit through our built in editor. This editor allow you to change the content of your seeds list, by adding or removing seeds. It will also validate all of the URLs and display the ones which contain errors.

.. image:: _static/img/edit-seeds.png

0 comments on commit a980b3d

Please sign in to comment.