Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 859 Bytes

backend_todo.md

File metadata and controls

17 lines (12 loc) · 859 Bytes

BACKEND & DATA

SCRAPER TODO

  1. City Scraper Deployment to lambda is failing because of dependencies with lxml, need to figure out a way to correctly import package. Possible solutions: Compiling dependencies in EC2 virtual environment then FTP for deployment, Using docker of aws ec2 (AMI 2018.03) to yum install dependencies
  2. Standardize data format when producing message to kafka.
  3. Create an algorithm to extract year + model + trim
  4. Refactor vehicleScraper code (extra unused variables at bottom) TODO:
  5. Automate craigslistFilter to run on a daily basis, kafka for data streaming, store in S3, using Airflow for scheduling (satvik)
  6. Stream website traffic and logging using kafka, store in S3 (omkar)
  7. Create machine learning model (satvik & omkar)
  8. (something with account creation processing with spark from DB to redshift TBD)