Skip to content

Commit

Permalink
Merge pull request nasa-jpl-memex#750 from memex-explorer/bhard/updat…
Browse files Browse the repository at this point in the history
…e_docs

Updating Docs
  • Loading branch information
brittainhard committed Nov 4, 2015
2 parents ade613a + 5069d33 commit 83cc10a
Show file tree
Hide file tree
Showing 27 changed files with 157 additions and 200 deletions.
47 changes: 1 addition & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,7 @@ This script will set up a conda environment named memex, prepare the application

If you have already run the install script, simply run `supervisord` from the `memex-explorer/source` directory to restart all of the services.

The supervisord will start supervisord in the foreground, which will
in turn ensure that all services associated with the core Memex
Explorer environment are running. To stop supervisord and the
associated services, send an interrupt to the process with `Ctrl-c`.
The supervisord will start supervisord in the foreground, which will in turn ensure that all services associated with the core Memex Explorer environment are running. To stop supervisord and the associated services, send an interrupt to the process with `Ctrl-c`.

**Memex Explorer will now be running locally at http://localhost:8000**

Expand All @@ -55,45 +52,3 @@ The documentation is then available within `build/html/index.html`

To access the administration panel, navigate to http://localhost:8000/admin (or the equivalent deployed URL) after starting Memex Explorer. Here you will be able to view and make manual changes to the database.

# Deploying

The current method for deploying to the web is to deploy to ec2 by running a
fabric script with a few environment variables set.

```
$ git clone https://github.com/memex-explorer/memex-explorer
$ cd memex-explorer/deploy
$ conda env create --file deploy_environment.yml
$ source activate memex_deploy
$ cp deploy_ec2.sh nocommit.sh
```

Now edit the file nocommit.sh. It will contain three environment variables
which you must set and which you must not commit to the public repository.

AWS_KEY_ID: The key id for your aws account

AWS SECRET: The key secret for your aws account

HTPASSWD_PATH: The HTTP login password path. This file should have been
given to you. Place it at a location not tracked by git and enter the absolute
path to this location in the value of this variable.

Additionally, you can choose to deploy a different git branch than the production branch.

Once you have set these variables, you can start a new instance with `source nocommit.sh`, which
will create an ec2 instance, place a login key for it in memex-explorer/deploy/keys and run the deploy script on the new instance.

The login key for the new instance will be given three names:

One based on the IP address of the new server.

One based on the creation time of the new server.

latest.pem, a convenience to logging in to the most-recently-created server.

To connect to a instance given an IP address of 54.167.11.71, log in with the command

ssh -i keys/ec2-54.167.11.71.pem vagrant@54.167.11.71

After the setup script is done running, you will be able to access the application by entering the IP address into your browser.
Binary file modified docs/source/_static/img/DbVisualizer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/ache-buttons.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/ache-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/_static/img/ache_dashboard1.png
Binary file not shown.
Binary file removed docs/source/_static/img/ache_dashboard2.png
Binary file not shown.
Binary file modified docs/source/_static/img/add-crawl-model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/_static/img/add-crawl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/create-seeds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/edit-seeds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/_static/img/homepage-view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/nutch-buttons.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/nutch-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/_static/img/nutch_dashboard.png
Binary file not shown.
Binary file added docs/source/_static/img/project-form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/_static/img/project-page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/seeds-from-trail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/seeds-page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/_static/img/testing_guide/edit_index_link.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/_static/img/testing_guide/index_creation_success.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/upload-files.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/img/upload-success.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 29 additions & 24 deletions docs/source/crawler_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,25 @@ Creating a Seeds List

Simply put, the seeds list should contain pages that are relevant to the topics you are searching. Both Nutch and Ache provide insight into the relevance of your seeds list, but in different ways.

For the purposes of memex-explorer, the extenstion and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.
For the purposes of memex-explorer, the extenstion and name of your seeds list does not matter. It will be automatically renamed and stored according to the specifications of the crawler.

Seeds lists are created on the seeds page, and seeds lists can be created from the add crawl page.

Crawler Control Buttons
=======================
Here's an overview of the buttons available to each crawler for controlling the crawlers. The buttons behave differently depending on which crawler you are using.

.. image:: _static/img/crawler_control.png
These are the buttons available for Ache:

.. image:: _static/img/ache-buttons.png

These are the buttons available for Nutch:

.. image:: _static/img/nutch-buttons.png

Options Button
--------------
Symbolized by the "gears" icon. This allows you to change various settings on the crawl. See `Crawl Settings`_.
Symbolized by the "pencil" icon. This allows you to change various settings on the crawl. See `Crawl Settings`_.

Start Button
------------
Expand All @@ -54,19 +62,20 @@ Restart Button
--------------
Symbolized by the "refresh" icon. Restarts the current crawl. This button is only available after the crawl has stopped.

With Ache, it will immediately start a brand new Ache crawl, deleting all of the previous crawl information. With Nutch, it will start a new crawler round, using the information gathered by the crawl in the previous round.

Get Seeds List
--------------
This button will let you download the list of seeds that the crawler is currently using.
With Ache, it will immediately start a brand new Ache crawl, deleting all of the previous crawl information. With Nutch, it will start a new crawler round, using the information gathered by the crawl in the previous round.

Get Crawl Log
-------------
This button will let you download the log of the current running crawl. This allows you to see the progress of the crawl and any errors that may be occurring during the crawl.
This button will let you download the log of the current running crawl. This allows you to see the progress of the crawl and any errors that may be occurring during the crawl. This is only available for Ache crawls.

CCA Export
----------

This button is Nutch only. It allows you to export your crawl data into the CCA format.

Crawl Settings
==============
The crawl settings page allows you to delete the crawl, as well as change the name or description of the crawl. It is accessed by clicking the "gears" icon next to the name of the crawl.
The crawl settings page allows you to delete the crawl, as well as change the name or description of the crawl. It is accessed by clicking the "pencil" icon next to the name of the crawl.

.. image:: _static/img/crawl_settings.png

Expand All @@ -83,23 +92,19 @@ Nutch

The number of pages left to crawl in a Nutch round increases significantly after each round. With Nutch, you can pass it a seeds list of 100 pages to crawl, and it can find over 1000 pages to crawl for the next round. Because of this, Nutch is a much easier crawler to get running.

Memex Explorer currently uses the Nutch REST API for running all crawls.

Nutch Dashboard
=======================
.. image:: _static/img/nutch_dashboard.png
Memex explorer recently added features for monitoring the status of Nutch crawls. You can now get real-time information about which pages Nutch is currently crawling, and information about the duration of the crawl.

.. image:: _static/img/nutch-dashboard.png

Statistics
----------
memex-explorer will tell you how many pages have been crawled after the current round has finished.

Nutch Specific Buttons
----------------------
Nutch has two buttons which are unique to its implementation.

View results in Solr
The first button is a link to a Solr instance, which you can use to search the results of the crawls using the standard Solr interface.
Nutch will tell you how many pages have been crawled after the current round has finished.

Dump Images
This button will download all of the images discovered during the crawl. The images are dumped to a folder on the filesystem. Image Space will use these images as part of its application.
.. image:: _static/img/nutch_stats.png

.. _ache-section:

Expand All @@ -110,13 +115,13 @@ Ache

Ache Dashboard
======================
.. image:: _static/img/ache_dashboard1.png
.. image:: _static/img/ache-dashboard.png

.. image:: _static/img/ache_stats.png

Plots
-----
memex-explorer uses `Bokeh <http://bokeh.pydata.org/en/latest/>`_ for its plots. There are two plots available for analyzing Ache crawls, Domain Relevance and Harvest Rate.
Memex Explorer uses `Bokeh <http://bokeh.pydata.org/en/latest/>`_ for its plots. There are two plots available for analyzing Ache crawls, Domain Relevance and Harvest Rate.

The Domain Relevance plot sorts domains by the number of pages crawled, and adds information for relevancy of that domain to your crawl model. This plot helps you understand how well your model fits.

Expand All @@ -133,7 +138,7 @@ Ache Specific Buttons
Ache has a "Download Relevant Pages" button, which will allow you download which pages Ache has found to be relevant to your seeds list and your crawl model.

Building a Crawl Model
======================
----------------------
Ache requires a crawl model to run. For information on how to build crawl models, see the `Ache readme <https://github.com/ViDA-NYU/ache/blob/master/README.md>`_.

For more detailed information on Ache, head to the `Ache Wiki <https://github.com/ViDA-NYU/ache/wiki>`_.
64 changes: 30 additions & 34 deletions docs/source/dev_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,9 @@ Developer's Guide to Memex Explorer
Setting up Memex Explorer
*************************

To setup your machine, you will need Anaconda or Miniconda
installed. Miniconda is a minimal Anaconda installation that
bootstraps conda and Python on any operating system. Install `Anaconda
<http://continuum.io/downloads>`_ or `Miniconda
<http://conda.pydata.org/miniconda.html>`_ from their respective sites.
To setup your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install `Anaconda <http://continuum.io/downloads>`_ or `Miniconda <http://conda.pydata.org/miniconda.html>`_ from their respective sites.

Memex Explorer requires conda, either from Miniconda or Anaconda.
Memex Explorer requires conda, either from Miniconda or Anaconda.

Application Setup
=================
Expand All @@ -21,50 +17,50 @@ Application Setup

.. code-block:: html

$ git clone https://github.com/memex-explorer/memex-explorer.git
$ cd memex-explorer/source
$ ./app_setup.sh
$ git clone https://github.com/memex-explorer/memex-explorer.git
$ cd memex-explorer/source
$ ./app_setup.sh

You can then start the application from this directory:

.. code-block:: html

$ source activate memex
$ supervisord
You can then start the application from this directory:

Memex Explorer will now be running locally at `http://localhost:8000 <http://localhost:8000/>`_.
.. code-block:: html

Tests
=====
To run the tests, return to the root directory and run:
$ source activate memex
$ supervisord

.. code-block:: html
Memex Explorer will now be running locally at `http://localhost:8000 <http://localhost:8000/>`_.

$ py.test
Enabling Nutch Visualizations
=============================

******************
Installing Compass
******************
If you need to make changes to the .scss stylesheets, `Compass <http://compass-style.org/>`_ is a useful tool. The following are instructions on how to install compass without using sudo.
Nutch visualizations are not enabled by default. Nutch visualizations require RabbitMQ, and the method for installing RabbitMQ varies depending on the operating system. RabbitMQ can be installed via Homebrew on Mac, and apt-get on Debian systems. More information on how to install RabbitMQ, read `this page <https://www.rabbitmq.com/download.html>`_.

For mac users, add this line to your ~/.bash_profile:
To enable Bokeh visualizations for Nutch, change ``autostart=false`` to ``autostart=true`` for both of these directives in source/supervisord.conf, and then kill and restart supervisor.

.. code-block:: html
.. code-block:: html

export PATH=/Users/<username>/.gem/ruby/<ruby version>/bin:$PATH
[program:rabbitmq]
command=rabbitmq-server
priority=1
-autostart=false
+autostart=true

Then run $ gem install compass --user-install. This will install Compass on your system.
[program:bokeh-server]
command=bokeh-server --backend memory --port 5006
priority=1
-autostart=false
+autostart=true

To make changes to the stylesheets, do:
Tests
=====
To run the tests, return to the root directory and run:

.. code-block:: html

$ cd ../
$ compass watch
$ py.test

******************
The Database Model
******************
==================
The current entity relation diagram:

.. image:: _static/img/DbVisualizer.png
Loading

0 comments on commit 83cc10a

Please sign in to comment.