Aguaite

A Python 3 tool to crawl tweets using the Streaming API. The name comes from the chilean slang "al aguaite", which means to await.

Commands

Currently the tool has two scripts.

sapea.py: crawls tweets using project-specific query parameters.
cuela.py: cleans the crawled tweets using project-specific settings.

Settings

In addition to project data (see the example folder projects/cl), the scripts need at least two configuration files: one for authentication and one for settings.

keys.json

{
    "consumer_key": "APP_CONSUMER_KEY",
    "consumer_secret": "APP_CONSUMER_SECRET",
    "access_token_key": "YOUR_ACCESS_TOKEN",
    "access_token_secret": "YOUR_ACCESS_SECRET"
}

config.json

{
    "source_account": "carnby",
    "project_name": "poketest",
    "project_data_path": "./projects/cl",
    "minutes": 5,
    "search_location_box": [-73.655740,-37.944243,-72.090433,-34.879482],
    "log_level": 10,
    "storage_path": "./test_crawl",
    "filtered_path": "./test_cleaned"
}

Google Big Query

The file schema.json contains a schema definition to use in Google Big Query with the results from the cuela.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
colador		colador
projects/cl		projects/cl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
cuela.py		cuela.py
requirements.txt		requirements.txt
sapea.py		sapea.py
schema.json.js		schema.json.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aguaite

Commands

Settings

keys.json

config.json

Google Big Query

About

Releases

Packages

Languages

License

zorzalerrante/aguaite

Folders and files

Latest commit

History

Repository files navigation

Aguaite

Commands

Settings

keys.json

config.json

Google Big Query

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages