Contributers: O. Gencer Ates, Matthew Dolce, Hannah Lee, Felix Quintana
Your final project is a group project. But the task is something you can define. As a group, you need to:
decide on a realistic task (utilizing multiple data science steps we have discussed, such as data collection, data cleansing, classification, clustering, and visualization).
You may not use data from the UCI Machine Learning repository, and the data you use must have at least 1,000 data points.
decide on a target customer (could be a person, company, or population, but must be someone who would value what you are doing).
post a 2-3 paragraph description of your proposed task, data to be used (how large?), and target customer(s) along with the names of your team members to the appropriate place in Piazza by Wednesday, November 13 (earlier is highly suggested).
confirm (after possible adjustments based on the feedback you receive) your proposed task and customer by Monday, November 18.
perform the proposed task (highly recommended to start early).
produce a one-page executive summary, plus 5-7 page report of technical supporting material, plus one page appendix describing contributions of each team member, due Sunday, December 15.
make a ten-to-eleven minute presentation of your project and your results to the class on Monday, December 16 at 7pm.
https://www.kaggle.com/osmi/mental-health-in-tech-survey
Topic: Mental Health in the Tech Industry
Group 2: O. Gencer Ates, Matthew Dolce, Hannah Lee, Felix Quintana
Dataset: https://www.kaggle.com/osmi/mental-health-in-tech-survey
Goal: Figure the level of difficulty of dealing with mental illnesses in companies, mainly tech companies, likelihood of getting treated for mental illnesses for employees in these industries, suggest possible options for the companies and employees to pursue.
Data: We will be using a dataset from kaggle.com about mental health in tech which has 1259 subjects (about 1241 of them seem to be usable) with 27 factors including pursuing treatment, company benefits, medical history(-ish).
Target Audience: Potential employees who may be dealing with mental health problems deciding on companies to work for, and companies that are trying to increase productivity with employees with these problems.
Process: To begin we will be going through a thorough data cleanup process to make the data workable for the project. Afterwards we will be using data visualisation methods to answer potential questions that don’t necessarily require predictions. And finally since our prediction problems seem to be classification problems we will be using a combination of logistic regression, knn, decision trees and random forest models.