- Elisa Marson (@elisa98mars)
- Pablo Bande Girón-Sánchez (@pbande)
- Gabriela Argüelles Terrón (@gabyarte)
In the data/raw/
folder can be found 4 datasets which contains some features that have been artificially generated for patients of breast cancer:
- One of them contains information about the patients and characteristics associated to their tumors (see the annex for further information).
- The second one contains information about the TNM classification of the tumor and the TNM after neoadjuvance, if applicable (see the annex for further information).
- The last two ones contain a new batch of data, which were generated later.
The goal of the assignment is to clean and pre-process the data in the way that you think that might be the best, explaining all your decisions. Then, you need to generate a descriptive analysis of the data, considering univariate and bivariate analysis, making use of the tools that you consider (including graphics).
During the data preparation you might find some challenges which might not been evident from the beginning, you need to justify how you solved all of them. Take a close look at all data, you might want to validate that all data are consistent.