Master-Thesis-DifferentialPrivacy

Master Thesis on Differential Privacy of Global COVID-19 Trends and Impact Survey Microdata and Opendata, especially focusing on the evaluation of different synthetic datasets.

Overview

The COVID-19 Trends and Impact Survey Data project aims to generate synthetic datasets using various synthesizing algorithms, such as linear regression, multinomial logistic regression and random forest, based on the COVID-19 Trends and Impact Survey Data. The goal is to evaluate the data utility and practicability in the context of Machine Learning using Tree based methods.

Data

The COVID-19 Trends and Impact Survey Data used in this project was collected through an online survey that aimed to understand the trends and impact of the COVID-19 pandemic on individuals and society. The survey data includes information on demographics, mental health, work and financial impact, and COVID-19 knowledge and behavior.

Synthetic Data Generation

The synthetic datasets are generated using the following algorithms:

Linear Regression (method="norm")
Linear Regression which maintains the marginal distribution (method="normrank")
Decision Tree (method="cart")
Multinomial Logistic Regression (method="polyreg")
Random Forest (method="rf")
Random Forest based Bagging algorithm (method="bag")

These algorithms are used to synthesize the survey data and create new, synthetic datasets that can be used for machine learning and analysis.

Experiment Design

Evaluation

The utility and practicability of the synthetic datasets are evaluated using Tree based methods. These methods include decision trees, random forests, and gradient boosting. The evaluation aims to assess the quality of the synthetic datasets and their potential usefulness in machine learning and analysis.

Conclusion

To be continued...

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
EveryWeekTask/Week1-24-06-2022		EveryWeekTask/Week1-24-06-2022
SyntheticData		SyntheticData
Thesis_Paper		Thesis_Paper
.gitignore		.gitignore
CODEBOOK_Version5_Revised.xlsx		CODEBOOK_Version5_Revised.xlsx
LICENSE		LICENSE
README.md		README.md
assignment.R		assignment.R
assignment.Rmd		assignment.Rmd
dataPrep.py		dataPrep.py
intro2synthpop.md		intro2synthpop.md
sl_models.py		sl_models.py
train_testing.ipynb		train_testing.ipynb
workflow-design.svg		workflow-design.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master-Thesis-DifferentialPrivacy

Overview

Data

Synthetic Data Generation

Experiment Design

Evaluation

Conclusion

About

Releases

Packages

Languages

License

CodeYueXiong/Master-Thesis-SyntheticDataGeneration

Folders and files

Latest commit

History

Repository files navigation

Master-Thesis-DifferentialPrivacy

Overview

Data

Synthetic Data Generation

Experiment Design

Evaluation

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages