GitHub - gator-life/gator.life: The code behind gator.life

WORK IN PROGRESS

Crawl the web and use machine learning algorithms to provide a personalized list of links.

Installation

Tested on Ubuntu 14.04 and 16.04. For more details on how the setup was done on Ubuntu 14.04 and 16.04 see admin/howto.txt.

Make sure you have installed the dependencies:
- Git LFS
- Python 2.7, pip and virtualenv
- Docker
- PhantomJS
- Java 7+ JRE
- Node.js 7, npm

Clone this repository:

$ git clone https://github.com/gator-life/gator.life.git

Under repo root directory, run setup script:

$ scripts/setup_local.sh

This script will:

 * Clean the previous install if needed
 * Create a virtual environment `global_env`
 * Install all dependencies
 * Run unit tests, functional tests and linter to ensure setup is ok

Contributing

Thank you for your interest for the open source projet Gator Life. For now we welcome contributions in two ways:

Fill an issue on github
Make a pull request and assign it to one of the main contributors:
- It should be green on Travis CI before it can be merged into master, which means:
  - Passes all unit tests
```
 $ scripts/start_tests.sh
```
  - Passes linter:
```
 $ scripts/run_pylint.sh
```
- Test coverage should be close to 100%:
  - Unit tests for a package are in directory tests besides the package.
  - Each module {MODULE}.py is tested by a script named test_{MODULE}.py
  - Integration tests are in directory src/functests

## Architecture overview

The project contains the following python packages:

common: shared tools (serialization, testing helpers, logging...) between packages. Should be kept minimal. No internal dependency.
server: gator.life website backend. Uses Flask framework and is deployed on Google Cloud Platform. Depends on common and learner.
scraper: scraping of the web to extract documents. Depends on common package only.
topicmodeller: classification of the extracted documents. Depends on common package only.
learner: machine learning algorithms to learn users preferences and match classified documents and users. Depends on common package only.
orchestrator: coordinatation of the pipeline from scraping to users/documents matching. Depends on common, scraper, topicmodeller, learner. It currently depends also on server for database access, but this should be extracted in its own package.

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
admin		admin
docker_images		docker_images
docs		docs
scripts		scripts
src		src
temp/test_dataset		temp/test_dataset
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
client-secret.json.enc		client-secret.json.enc
pylintrc		pylintrc
pylintrc_tests		pylintrc_tests
pytest.ini		pytest.ini
requirements.txt		requirements.txt
test_requirements.txt		test_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Contributing

About

Releases

Packages

Contributors 2

Languages

License

gator-life/gator.life

Folders and files

Latest commit

History

Repository files navigation

Installation

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages