Impacts of data synthesis : a metric for quantifiable data standards and performances
Chandra, Gunjan (2020-06-23)
Chandra, Gunjan
G. Chandra
23.06.2020
© 2020 Gunjan Chandra. Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202006242643
https://urn.fi/URN:NBN:fi:oulu-202006242643
Tiivistelmä
Publicly shared data could unfold a wide range of innovative pedagogical and learning techniques. In the case of healthcare, open data could save lives. Consolidating medical data with lifestyle information can support possibilities for further development of current approaches towards medical diagnoses and treatments. It is critical to note that healthcare data contains sensitive information about patients and therefore, could lead to harmful consequences if such details reach the wrong hands. The use of the concept of data anonymisation for reducing the risk of disclosure to share data publicly is the standard practice. However, current data anonymisation techniques have failed multiple times in the past. The goal of this study is to evaluate the performance of an emerging practice for data sharing, by utilising a tool for data synthesis, termed Synthpop. The synthetic data is generated by executing the multiple imputation methods, although differently. This study describes and analyses Synthpop by establishing the data standards and measuring the impacts of the data synthesis process based on the utilities and quality of information contained in the data. The analyses reveal that synthetic data simulates original data by adequately preserving the utilities and quality of the information content.
Kokoelmat
- Avoin saatavuus [32049]