A Simple-to-Use R Package for Mimicking Study Data by Simulations

Standard

A Simple-to-Use R Package for Mimicking Study Data by Simulations. / Koliopanos, Giorgos; Ojeda, Francisco; Ziegler, Andreas.

In: METHOD INFORM MED, Vol. 62, No. 03-04, 09.2023, p. 119-129.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

APA

Vancouver

Bibtex

@article{940f9f5714c4422d979edaf388fc04b1,
title = "A Simple-to-Use R Package for Mimicking Study Data by Simulations",
abstract = "BACKGROUND: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.OBJECTIVES: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.METHODS: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.RESULTS: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.CONCLUSION: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.",
keywords = "Computer Simulation, Computer Security",
author = "Giorgos Koliopanos and Francisco Ojeda and Andreas Ziegler",
note = "The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).",
year = "2023",
month = sep,
doi = "10.1055/a-2048-7692",
language = "English",
volume = "62",
pages = "119--129",
journal = "METHOD INFORM MED",
issn = "0026-1270",
publisher = "Schattauer",
number = "03-04",

}

RIS

TY - JOUR

T1 - A Simple-to-Use R Package for Mimicking Study Data by Simulations

AU - Koliopanos, Giorgos

AU - Ojeda, Francisco

AU - Ziegler, Andreas

N1 - The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).

PY - 2023/9

Y1 - 2023/9

N2 - BACKGROUND: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.OBJECTIVES: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.METHODS: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.RESULTS: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.CONCLUSION: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.

AB - BACKGROUND: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.OBJECTIVES: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.METHODS: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.RESULTS: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.CONCLUSION: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.

KW - Computer Simulation

KW - Computer Security

U2 - 10.1055/a-2048-7692

DO - 10.1055/a-2048-7692

M3 - SCORING: Journal article

C2 - 36882158

VL - 62

SP - 119

EP - 129

JO - METHOD INFORM MED

JF - METHOD INFORM MED

SN - 0026-1270

IS - 03-04

ER -