SCRAPER TODO
- City Scraper Deployment to lambda is failing because of dependencies with lxml, need to figure out a way to correctly import package. Possible solutions: Compiling dependencies in EC2 virtual environment then FTP for deployment, Using docker of aws ec2 (AMI 2018.03) to yum install dependencies
- Standardize data format when producing message to kafka.
- Create an algorithm to extract year + model + trim
- Refactor vehicleScraper code (extra unused variables at bottom) TODO:
- Automate craigslistFilter to run on a daily basis, kafka for data streaming, store in S3, using Airflow for scheduling (satvik)
- Stream website traffic and logging using kafka, store in S3 (omkar)
- Create machine learning model (satvik & omkar)
- (something with account creation processing with spark from DB to redshift TBD)