Skip to content

Breast cancer dataset analysis (UPM's Master in Data Science project for Data Process subject)

Notifications You must be signed in to change notification settings

gabyarte/breast-cancer-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UPM Master's in Data Science: Data Process project (I)

Collaborators

Assigment details

In the data/raw/ folder can be found 4 datasets which contains some features that have been artificially generated for patients of breast cancer:

  • One of them contains information about the patients and characteristics associated to their tumors (see the annex for further information).
  • The second one contains information about the TNM classification of the tumor and the TNM after neoadjuvance, if applicable (see the annex for further information).
  • The last two ones contain a new batch of data, which were generated later.

The goal of the assignment is to clean and pre-process the data in the way that you think that might be the best, explaining all your decisions. Then, you need to generate a descriptive analysis of the data, considering univariate and bivariate analysis, making use of the tools that you consider (including graphics).

During the data preparation you might find some challenges which might not been evident from the beginning, you need to justify how you solved all of them. Take a close look at all data, you might want to validate that all data are consistent.

About

Breast cancer dataset analysis (UPM's Master in Data Science project for Data Process subject)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •