Merge pull request nasa-jpl-memex#750 from memex-explorer/bhard/updat…

…e_docs Updating Docs
geoffwalmsley · Nov 4, 2015 · 83cc10a · 83cc10a
2 parents ade613a + 5069d33
commit 83cc10a
Show file tree

Hide file tree

Showing 27 changed files with 157 additions and 200 deletions.
diff --git a/README.md b/README.md
@@ -26,10 +26,7 @@ This script will set up a conda environment named memex, prepare the application
 
 If you have already run the install script, simply run `supervisord` from the `memex-explorer/source` directory to restart all of the services.
 
-The supervisord will start supervisord in the foreground, which will
-in turn ensure that all services associated with the core Memex
-Explorer environment are running.  To stop supervisord and the
-associated services, send an interrupt to the process with `Ctrl-c`.
+The supervisord will start supervisord in the foreground, which will in turn ensure that all services associated with the core Memex Explorer environment are running.  To stop supervisord and the associated services, send an interrupt to the process with `Ctrl-c`.
 
 **Memex Explorer will now be running locally at http://localhost:8000**
 
@@ -55,45 +52,3 @@ The documentation is then available within `build/html/index.html`
 
 To access the administration panel, navigate to http://localhost:8000/admin (or the equivalent deployed URL) after starting Memex Explorer. Here you will be able to view and make manual changes to the database.
 
-# Deploying
-
-The current method for deploying to the web is to deploy to ec2 by running a
-fabric script with a few environment variables set.
-
-```
-$ git clone https://github.com/memex-explorer/memex-explorer
-$ cd memex-explorer/deploy
-$ conda env create --file deploy_environment.yml
-$ source activate memex_deploy
-$ cp deploy_ec2.sh nocommit.sh
-```
-
-Now edit the file nocommit.sh. It will contain three environment variables
-which you must set and which you must not commit to the public repository.
-
-    AWS_KEY_ID: The key id for your aws account
-
-    AWS SECRET: The key secret for your aws account
-
-    HTPASSWD_PATH: The HTTP login password path. This file should have been
-    given to you.  Place it at a location not tracked by git and enter the absolute
-    path to this location in the value of this variable.
-
-Additionally, you can choose to deploy a different git branch than the production branch.
-
-Once you have set these variables, you can start a new instance with `source nocommit.sh`, which
-will create an ec2 instance, place a login key for it in memex-explorer/deploy/keys and run the deploy script on the new instance.
-
-The login key for the new instance will be given three names:
-
-    One based on the IP address of the new server.
-
-    One based on the creation time of the new server.
-
-    latest.pem, a convenience to logging in to the most-recently-created server.
-
-To connect to a instance given an IP address of 54.167.11.71, log in with the command
-
-    ssh -i keys/ec2-54.167.11.71.pem vagrant@54.167.11.71
-
-After the setup script is done running, you will be able to access the application by entering the IP address into your browser.
diff --git a/docs/source/_static/img/DbVisualizer.png b/docs/source/_static/img/DbVisualizer.png
diff --git a/docs/source/_static/img/ache-buttons.png b/docs/source/_static/img/ache-buttons.png
diff --git a/docs/source/_static/img/ache-dashboard.png b/docs/source/_static/img/ache-dashboard.png
diff --git a/docs/source/_static/img/ache_dashboard1.png b/docs/source/_static/img/ache_dashboard1.png
diff --git a/docs/source/_static/img/ache_dashboard2.png b/docs/source/_static/img/ache_dashboard2.png
diff --git a/docs/source/_static/img/add-crawl-model.png b/docs/source/_static/img/add-crawl-model.png
diff --git a/docs/source/_static/img/add-crawl.png b/docs/source/_static/img/add-crawl.png
diff --git a/docs/source/_static/img/create-seeds.png b/docs/source/_static/img/create-seeds.png
diff --git a/docs/source/_static/img/edit-seeds.png b/docs/source/_static/img/edit-seeds.png
diff --git a/docs/source/_static/img/homepage-view.png b/docs/source/_static/img/homepage-view.png
diff --git a/docs/source/_static/img/nutch-buttons.png b/docs/source/_static/img/nutch-buttons.png
diff --git a/docs/source/_static/img/nutch-dashboard.png b/docs/source/_static/img/nutch-dashboard.png
diff --git a/docs/source/_static/img/nutch_dashboard.png b/docs/source/_static/img/nutch_dashboard.png
diff --git a/docs/source/_static/img/project-form.png b/docs/source/_static/img/project-form.png
diff --git a/docs/source/_static/img/project-page.png b/docs/source/_static/img/project-page.png
diff --git a/docs/source/_static/img/seeds-from-trail.png b/docs/source/_static/img/seeds-from-trail.png
diff --git a/docs/source/_static/img/seeds-page.png b/docs/source/_static/img/seeds-page.png
diff --git a/docs/source/_static/img/testing_guide/edit_index_link.png b/docs/source/_static/img/testing_guide/edit_index_link.png
diff --git a/docs/source/_static/img/testing_guide/index_creation_success.png b/docs/source/_static/img/testing_guide/index_creation_success.png
diff --git a/docs/source/_static/img/testing_guide/nutch_dashboard_initial.png b/docs/source/_static/img/testing_guide/nutch_dashboard_initial.png
diff --git a/docs/source/_static/img/upload-files.png b/docs/source/_static/img/upload-files.png
diff --git a/docs/source/_static/img/upload-success.png b/docs/source/_static/img/upload-success.png
diff --git a/docs/source/crawler_guide.rst b/docs/source/crawler_guide.rst
@@ -28,17 +28,25 @@ Creating a Seeds List
 
     Simply put, the seeds list should contain pages that are relevant to the topics you are searching. Both Nutch and Ache provide insight into the relevance of your seeds list, but in different ways.
 
-    For the purposes of memex-explorer, the extenstion and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.
+    For the purposes of memex-explorer, the extenstion and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler. 
+
+    Seeds lists are created on the seeds page, and seeds lists can be created from the add crawl page.
 
 Crawler Control Buttons
 =======================
 Here's an overview of the buttons available to each crawler for controlling the crawlers. The buttons behave differently depending on which crawler you are using.
 
-.. image:: _static/img/crawler_control.png
+These are the buttons available for Ache:
+
+.. image:: _static/img/ache-buttons.png
+
+These are the buttons available for Nutch:
+
+.. image:: _static/img/nutch-buttons.png
 
 Options Button
 --------------
-    Symbolized by the "gears" icon. This allows you to change various settings on the crawl. See `Crawl Settings`_.
+    Symbolized by the "pencil" icon. This allows you to change various settings on the crawl. See `Crawl Settings`_.
 
 Start Button
 ------------
@@ -54,19 +62,20 @@ Restart Button
 --------------
     Symbolized by the "refresh" icon. Restarts the current crawl. This button is only available after the crawl has stopped.
 
-    With Ache, it will immediately start a brand new Ache crawl, deleting all of the previous crawl information. With Nutch, it will start a new crawler round, using the  information gathered by the crawl in the previous round.
-
-Get Seeds List
---------------
-    This button will let you download the list of seeds that the crawler is currently using.
+    With Ache, it will immediately start a brand new Ache crawl, deleting all of the previous crawl information. With Nutch, it will start a new crawler round, using the information gathered by the crawl in the previous round.
 
 Get Crawl Log
 -------------
-    This button will let you download the log of the current running crawl. This allows you to see the progress of the crawl and any errors that may be occurring during the crawl.
+    This button will let you download the log of the current running crawl. This allows you to see the progress of the crawl and any errors that may be occurring during the crawl. This is only available for Ache crawls.
+
+CCA Export
+----------
+
+    This button is Nutch only. It allows you to export your crawl data into the CCA format.
 
 Crawl Settings
 ==============
-    The crawl settings page allows you to delete the crawl, as well as change the name or description of the crawl. It is accessed by clicking the "gears" icon next to the name of the crawl.
+    The crawl settings page allows you to delete the crawl, as well as change the name or description of the crawl. It is accessed by clicking the "pencil" icon next to the name of the crawl.
 
     .. image:: _static/img/crawl_settings.png
 
@@ -83,23 +92,19 @@ Nutch
 
     The number of pages left to crawl in a Nutch round increases significantly after each round. With Nutch, you can pass it a seeds list of 100 pages to crawl, and it can find over 1000 pages to crawl for the next round. Because of this, Nutch is a much easier crawler to get running.
 
+    Memex Explorer currently uses the Nutch REST API for running all crawls.
+
 Nutch Dashboard
 =======================
-.. image:: _static/img/nutch_dashboard.png
+    Memex explorer recently added features for monitoring the status of Nutch crawls. You can now get real-time information about which pages Nutch is currently crawling, and information about the duration of the crawl.
+
+.. image:: _static/img/nutch-dashboard.png
 
 Statistics
 ----------
-    memex-explorer will tell you how many pages have been crawled after the current round has finished.
-
-Nutch Specific Buttons
-----------------------
-    Nutch has two buttons which are unique to its implementation.
-
-    View results in Solr
-        The first button is a link to a Solr instance, which you can use to search the results of the crawls using the standard Solr interface.
+    Nutch will tell you how many pages have been crawled after the current round has finished.
 
-    Dump Images
-        This button will download all of the images discovered during the crawl. The images are dumped to a folder on the filesystem. Image Space will use these images as part of its application.
+.. image:: _static/img/nutch_stats.png
 
 .. _ache-section:
 
@@ -110,13 +115,13 @@ Ache
 
 Ache Dashboard
 ======================
-.. image:: _static/img/ache_dashboard1.png
+.. image:: _static/img/ache-dashboard.png
 
 .. image:: _static/img/ache_stats.png
 
 Plots
 -----
-    memex-explorer uses `Bokeh <http://bokeh.pydata.org/en/latest/>`_ for its plots. There are two plots available for analyzing Ache crawls, Domain Relevance and Harvest Rate.
+    Memex Explorer uses `Bokeh <http://bokeh.pydata.org/en/latest/>`_ for its plots. There are two plots available for analyzing Ache crawls, Domain Relevance and Harvest Rate.
 
     The Domain Relevance plot sorts domains by the number of pages crawled, and adds information for relevancy of that domain to your crawl model. This plot helps you understand how well your model fits.
 
@@ -133,7 +138,7 @@ Ache Specific Buttons
     Ache has a "Download Relevant Pages" button, which will allow you download which pages Ache has found to be relevant to your seeds list and your crawl model.
 
 Building a Crawl Model
-======================
+----------------------
     Ache requires a crawl model to run. For information on how to build crawl models, see the `Ache readme <https://github.com/ViDA-NYU/ache/blob/master/README.md>`_.
 
     For more detailed information on Ache, head to the `Ache Wiki <https://github.com/ViDA-NYU/ache/wiki>`_.
diff --git a/docs/source/dev_guide.rst b/docs/source/dev_guide.rst
@@ -6,13 +6,9 @@ Developer's Guide to Memex Explorer
 Setting up Memex Explorer
 *************************
 
-To setup your machine, you will need Anaconda or Miniconda
-installed. Miniconda is a minimal Anaconda installation that
-bootstraps conda and Python on any operating system. Install `Anaconda
-<http://continuum.io/downloads>`_ or `Miniconda
-<http://conda.pydata.org/miniconda.html>`_ from their respective sites.
+To setup your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install `Anaconda <http://continuum.io/downloads>`_ or `Miniconda <http://conda.pydata.org/miniconda.html>`_ from their respective sites.
 
-Memex Explorer requires conda, either from Miniconda or Anaconda.  
+Memex Explorer requires conda, either from Miniconda or Anaconda.
 
 Application Setup
 =================
@@ -21,50 +17,50 @@ Application Setup
 
     .. code-block:: html
 
-	$ git clone https://github.com/memex-explorer/memex-explorer.git
-	$ cd memex-explorer/source
-	$ ./app_setup.sh
+       $ git clone https://github.com/memex-explorer/memex-explorer.git
+       $ cd memex-explorer/source
+       $ ./app_setup.sh
 
-   You can then start the application from this directory:
-
-    .. code-block:: html
-
-	$ source activate memex
-	$ supervisord
+    You can then start the application from this directory:
 
-   Memex Explorer will now be running locally at `http://localhost:8000 <http://localhost:8000/>`_.
+    .. code-block:: html
 
-Tests
-=====
-    To run the tests, return to the root directory and run:
+       $ source activate memex
+       $ supervisord
 
-    .. code-block:: html
+   Memex Explorer will now be running locally at `http://localhost:8000 <http://localhost:8000/>`_.
 
-        $ py.test
+Enabling Nutch Visualizations
+=============================
 
-******************
-Installing Compass
-******************
-    If you need to make changes to the .scss stylesheets, `Compass <http://compass-style.org/>`_ is a useful tool. The following are instructions on how to install compass without using sudo.
+   Nutch visualizations are not enabled by default. Nutch visualizations require RabbitMQ, and the method for installing RabbitMQ varies depending on the operating system. RabbitMQ can be installed via Homebrew on Mac, and apt-get on Debian systems. More information on how to install RabbitMQ, read `this page <https://www.rabbitmq.com/download.html>`_.
 
-    For mac users, add this line to your ~/.bash_profile:
+   To enable Bokeh visualizations for Nutch, change ``autostart=false`` to ``autostart=true`` for both of these directives in source/supervisord.conf, and then kill and restart supervisor.
 
-    .. code-block:: html
+   .. code-block:: html
 
-        export PATH=/Users/<username>/.gem/ruby/<ruby version>/bin:$PATH
+      [program:rabbitmq]
+      command=rabbitmq-server
+      priority=1
+      -autostart=false
+      +autostart=true
 
-    Then run $ gem install compass --user-install. This will install Compass on your system.
+      [program:bokeh-server]
+      command=bokeh-server --backend memory --port 5006
+      priority=1
+      -autostart=false
+      +autostart=true
 
-    To make changes to the stylesheets, do:
+Tests
+=====
+    To run the tests, return to the root directory and run:
 
     .. code-block:: html
 
-        $ cd ../
-        $ compass watch
+       $ py.test
 
-******************
 The Database Model
-******************
+==================
 The current entity relation diagram:
 
 .. image:: _static/img/DbVisualizer.png