SCRAPER TODO

City Scraper Deployment to lambda is failing because of dependencies with lxml, need to figure out a way to correctly import package. Possible solutions: Compiling dependencies in EC2 virtual environment then FTP for deployment, Using docker of aws ec2 (AMI 2018.03) to yum install dependencies
Standardize data format when producing message to kafka.
Create an algorithm to extract year + model + trim
Refactor vehicleScraper code (extra unused variables at bottom) TODO:
Automate craigslistFilter to run on a daily basis, kafka for data streaming, store in S3, using Airflow for scheduling (satvik)
Stream website traffic and logging using kafka, store in S3 (omkar)
Create machine learning model (satvik & omkar)
(something with account creation processing with spark from DB to redshift TBD)

Provide feedback

Saved searches