WORK IN PROGRESS
Crawl the web and use machine learning algorithms to provide a personalized list of links.
Tested on Ubuntu 14.04 and 16.04. For more details on how the setup was done on Ubuntu 14.04 and 16.04 see admin/howto.txt
.
-
Make sure you have installed the dependencies:
Git LFS
Python
2.7,pip
andvirtualenv
Docker
PhantomJS
Java
7+ JRENode.js
7,npm
-
Clone this repository:
$ git clone https://github.com/gator-life/gator.life.git
-
Under repo root directory, run setup script:
$ scripts/setup_local.sh
This script will:
* Clean the previous install if needed * Create a virtual environment `global_env` * Install all dependencies * Run unit tests, functional tests and linter to ensure setup is ok
Thank you for your interest for the open source projet Gator Life. For now we welcome contributions in two ways:
- Fill an issue on github
- Make a pull request and assign it to one of the main contributors:
- It should be green on Travis CI before it can be merged into master, which means:
-
Passes all unit tests
$ scripts/start_tests.sh
-
Passes linter:
$ scripts/run_pylint.sh
-
- Test coverage should be close to 100%:
- Unit tests for a package are in directory tests besides the package.
- Each module {MODULE}.py is tested by a script named test_{MODULE}.py
- Integration tests are in directory src/functests
- It should be green on Travis CI before it can be merged into master, which means:
## Architecture overview
The project contains the following python packages:
common
: shared tools (serialization, testing helpers, logging...) between packages. Should be kept minimal. No internal dependency.server
: gator.life website backend. UsesFlask
framework and is deployed onGoogle Cloud Platform
. Depends oncommon
andlearner
.scraper
: scraping of the web to extract documents. Depends oncommon
package only.topicmodeller
: classification of the extracted documents. Depends oncommon
package only.learner
: machine learning algorithms to learn users preferences and match classified documents and users. Depends oncommon
package only.orchestrator
: coordinatation of the pipeline from scraping to users/documents matching. Depends oncommon
,scraper
,topicmodeller
,learner
. It currently depends also onserver
for database access, but this should be extracted in its own package.