Merge pull request #765 from memex-explorer/bhard/services_docs

Documentation on Optional Services
nasa-jpl-memex · Nov 10, 2015 · 1c10627 · 1c10627
2 parents 6d7baea + 0a30adc
commit 1c10627
Show file tree

Hide file tree

Showing 3 changed files with 72 additions and 39 deletions.
diff --git a/docs/source/crawler_guide.rst b/docs/source/crawler_guide.rst
@@ -28,21 +28,21 @@ Creating a Seeds List
 
     Simply put, the seeds list should contain pages that are relevant to the topics you are searching. Both Nutch and Ache provide insight into the relevance of your seeds list, but in different ways.
 
-    For the purposes of memex-explorer, the extension and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler. 
+    For the purposes of memex-explorer, the extension and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.
 
     Seeds lists are created on the seeds page, and seeds lists can be created from the add crawl page.
 
 Crawler Control Buttons
 =======================
-Here's an overview of the buttons available to each crawler for controlling the crawlers. The buttons behave differently depending on which crawler you are using.
+   Here we have an overview of the buttons available to each crawler for controlling the crawlers. The buttons behave differently depending on which one you are using.
 
-These are the buttons available for Ache:
+   These are the buttons available for Ache:
 
-.. image:: _static/img/ache-buttons.png
+   .. image:: _static/img/ache-buttons.png
 
-These are the buttons available for Nutch:
+   These are the buttons available for Nutch:
 
-.. image:: _static/img/nutch-buttons.png
+   .. image:: _static/img/nutch-buttons.png
 
 Options Button
 --------------
@@ -54,9 +54,9 @@ Start Button
 
 Stop Button
 -----------
-    Symbolized by the "stop" button. Stops the crawl.
+   Symbolized by the "stop" button. Stops the crawl.
 
-    In the case of Ache, the crawler stops immediately. In the case of Nutch, the crawler stops after it has finished the current round. This is in order to prevent data corruption that can occur when killing the Nutch process.
+   In the case of Ache, the crawler stops immediately. In the case of Nutch, the crawler stops after it has finished the current process. However, the data on the current round of the crawl will be lost.
 
 Restart Button
 --------------
@@ -70,9 +70,12 @@ Get Crawl Log
 
 CCA Export
 ----------
-
     This button is Nutch only. It allows you to export your crawl data into the CCA format.
 
+Rounds Input
+------------
+   Nutch only. This allows you to specify how many rounds you want the crawl to run. You can press the stop button at any time and it will stop when it is done with the current round.
+
 Crawl Settings
 ==============
     The crawl settings page allows you to delete the crawl, as well as change the name or description of the crawl. It is accessed by clicking the "pencil" icon next to the name of the crawl.
@@ -86,11 +89,11 @@ Crawl Settings
 *****
 Nutch
 *****
-    `Nutch <http://nutch.apache.org/>`_ is developed by Apache, and has interfaces with both Solr and Elasticsearch, and it allows memex-explorer to offer different crawling functionality from Ache.
+    `Nutch <http://nutch.apache.org/>`_ is developed by Apache, and has an interface with Elasticsearch. All Nutch crawls create Elasticsearch indices by default.
 
-    Nutch runs in uninterruptible rounds of crawling. Nutch will run indefinitely until asked to stop. By viewing the crawl log, it is possible to see how many pages are left to crawl in the current round.
+    With Nutch, you can define how long you want to crawl by setting the number of rounds to crawl. You can keep track of the overall crawl time and the sites currently being crawled by looking at the Nutch crawl visualizations.
 
-    The number of pages left to crawl in a Nutch round increases significantly after each round. With Nutch, you can pass it a seeds list of 100 pages to crawl, and it can find over 1000 pages to crawl for the next round. Because of this, Nutch is a much easier crawler to get running.
+    The number of pages left to crawl in a Nutch round increases significantly after each round. You might pass it a seeds list of 100 pages to crawl, and it can find over 1000 pages to crawl for the next round. Because of this, Nutch is a much easier crawler to get running.
 
     Memex Explorer currently uses the Nutch REST API for running all crawls.
 

diff --git a/docs/source/dev_guide.rst b/docs/source/dev_guide.rst
@@ -6,9 +6,9 @@ Developer's Guide to Memex Explorer
 Setting up Memex Explorer
 *************************
 
-To setup your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install `Anaconda <http://continuum.io/downloads>`_ or `Miniconda <http://conda.pydata.org/miniconda.html>`_ from their respective sites.
+   To set up your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install `Anaconda <http://continuum.io/downloads>`_ or `Miniconda <http://conda.pydata.org/miniconda.html>`_ from their respective sites.
 
-Memex Explorer requires conda, either from Miniconda or Anaconda.
+   Memex Explorer requires conda, either from Miniconda or Anaconda.
 
 Application Setup
 =================
@@ -30,12 +30,37 @@ Application Setup
 
    Memex Explorer will now be running locally at `http://localhost:8000 <http://localhost:8000/>`_.
 
-Enabling Nutch Visualizations
+Tests
+=====
+    To run the tests, return to the root directory and run:
+
+    .. code-block:: html
+
+       $ py.test
+
+The Database Model
+==================
+   The current entity relation diagram:
+
+.. image:: _static/img/DbVisualizer.png
+
+Updating the Database
+---------------------
+   As of version 0.4.0, Memex Explorer will start tracking all database migrations. This means that you will be able to upgrade your database and preserve the data without any issues.
+
+   If you are using a version that is 0.3.0 or earlier, and you are unable to update your database without server errors, the best course of action is to delete the existing file at `source/db.sqlite3` and start over with a fresh database.
+
+Enabling Non-Default Services
 =============================
 
-   Nutch visualizations are not enabled by default. Nutch visualizations require RabbitMQ, and the method for installing RabbitMQ varies depending on the operating system. RabbitMQ can be installed via Homebrew on Mac, and apt-get on Debian systems. More information on how to install RabbitMQ, read `this page <https://www.rabbitmq.com/download.html>`_.  Note: You may also need to change the below command to `sudo rabbitmq-server`, depending on how RabbitMQ is installed on your system and the permissions of the current user.
+Nutch Visualizations
+--------------------
+
+   Nutch visualizations are not enabled by default. Nutch visualizations require RabbitMQ, and the method for installing RabbitMQ varies depending on the operating system. RabbitMQ can be installed via Homebrew on Mac, and apt-get on Debian systems. For more information on how to install RabbitMQ, read `this page <https://www.rabbitmq.com/download.html>`_.  Note: You may also need to change the below command to `sudo rabbitmq-server`, depending on how RabbitMQ is installed on your system and the permissions of the current user.
 
-   To enable Bokeh visualizations for Nutch, change ``autostart=false`` to ``autostart=true`` for both of these directives in source/supervisord.conf, and then kill and restart supervisor.
+   RabbitMQ and Bokeh-Server are necessary for creating the Nutch visualizations. The Nutch streaming visualization works by creating and subscribing to a queue of AMQP messages (hosted by RabbitMQ) being dispatched from Nutch as it runs the crawl. A background task reads the messages and updates the plot (hosted by Bokeh server).
+
+   To enable Bokeh visualizations for Nutch, change `autostart=false` to `autostart=true` for both of these directives in `source/supervisord.conf`, and then kill and restart supervisor.
 
    .. code-block:: html
 
@@ -51,22 +76,32 @@ Enabling Nutch Visualizations
       -autostart=false
       +autostart=true
 
-Tests
-=====
-    To run the tests, return to the root directory and run:
+Domain Discovery Tool (DDT)
+---------------------------
 
-    .. code-block:: html
+   Domain Discovery Tool can be installed as a conda package. Simply run `conda install ddt` to download the package for DDT.
 
-       $ py.test
+   Like with Nutch visualizations, to enable DDT, change the directive in `source/supervisord`.
 
-The Database Model
-==================
-The current entity relation diagram:
+   .. code-block:: html
 
-.. image:: _static/img/DbVisualizer.png
+      [program:ddt]
+      command=ddt
+      priority=5
+      -autostart=false
+      +autostart=false
 
-Updating the Database
----------------------
-As of version 0.4.0, Memex Explorer will start tracking all database migrations. This means that you will be able to upgrade your database and preserve the data without any issues.
+Temporal Anomaly Detection (TAD)
+--------------------------------
+
+   TAD does not currently have a conda package. Like the Nutch visualizations, it also has a RabbitMQ dependency. For instructions on installing TAD, visit the `github repository <https://github.com/autonlab/tad>`_.
 
-If you are using a version that is 0.3.0 or earlier, and you are unable to update your database without server errors, the best course if action is to delete the existing `file at source/db.sqlite3` and start over with a fresh database.
+   Like DDT and Nutch Visualizations, you also have to change the supervisor directive.
+
+   .. code-block:: html
+
+      [program:tad]
+      command=tad
+      priority=5
+      -autostart=false
+      +autostart=false
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,18 +1,13 @@
 Memex Explorer
 ==============
 
-Memex Explorer is a web application that provides easy-to-use interfaces for
-gathering, analyzing, and graphing web crawl data.
+   Memex Explorer is a web application that provides easy-to-use interfaces for gathering, analyzing, and graphing web crawl data.
 
-For usage instructions, please refer to the `User's Guide <user_guide.html>`_.
+   For usage instructions, please refer to the `User's Guide <user_guide.html>`_.
 
-.. For more information about the project architecture, please refer to our `Developer's Guide <dev_guide.html>`_ and `API Guide <api.html>`_.
+   For more information about the project architecture, please refer to our `Developer's Guide <dev_guide.html>`_ and `API Guide <api.html>`_.
 
-Memex Explorer is built by `Continuum Analytics <http://continuum.io/>`_,
-with grants and support from the
-`NASA Jet Propulsion Laboratory <http://www.jpl.nasa.gov/>`_,
-`Kitware <http://www.kitware.com/>`_,
-and the `NYU Polytechnic School of Engineering <http://engineering.nyu.edu/>`_.
+   Memex Explorer is built by `Continuum Analytics <http://continuum.io/>`_, with grants and support from the `NASA Jet Propulsion Laboratory <http://www.jpl.nasa.gov/>`_, `Kitware <http://www.kitware.com/>`_, and the `NYU Polytechnic School of Engineering <http://engineering.nyu.edu/>`_.
 
 Contents: