Skip to content

Master Thesis on Generating Synthetic Data and Disclosure Control of Global COVID-19 Trends and Impact Survey Microdata and Opendata, especially focusing on the evaluation of different synthetic datasets.

License

Notifications You must be signed in to change notification settings

CodeYueXiong/Master-Thesis-SyntheticDataGeneration

Repository files navigation

Master-Thesis-DifferentialPrivacy

Master Thesis on Differential Privacy of Global COVID-19 Trends and Impact Survey Microdata and Opendata, especially focusing on the evaluation of different synthetic datasets.

Overview

The COVID-19 Trends and Impact Survey Data project aims to generate synthetic datasets using various synthesizing algorithms, such as linear regression, multinomial logistic regression and random forest, based on the COVID-19 Trends and Impact Survey Data. The goal is to evaluate the data utility and practicability in the context of Machine Learning using Tree based methods.

Data

The COVID-19 Trends and Impact Survey Data used in this project was collected through an online survey that aimed to understand the trends and impact of the COVID-19 pandemic on individuals and society. The survey data includes information on demographics, mental health, work and financial impact, and COVID-19 knowledge and behavior.

Synthetic Data Generation

The synthetic datasets are generated using the following algorithms:

  • Linear Regression (method="norm")
  • Linear Regression which maintains the marginal distribution (method="normrank")
  • Decision Tree (method="cart")
  • Multinomial Logistic Regression (method="polyreg")
  • Random Forest (method="rf")
  • Random Forest based Bagging algorithm (method="bag")

These algorithms are used to synthesize the survey data and create new, synthetic datasets that can be used for machine learning and analysis.

Experiment Design

My Image

Evaluation

The utility and practicability of the synthetic datasets are evaluated using Tree based methods. These methods include decision trees, random forests, and gradient boosting. The evaluation aims to assess the quality of the synthetic datasets and their potential usefulness in machine learning and analysis.

Conclusion

To be continued...

About

Master Thesis on Generating Synthetic Data and Disclosure Control of Global COVID-19 Trends and Impact Survey Microdata and Opendata, especially focusing on the evaluation of different synthetic datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published