Introduction Statistical Data Simulation

R
simulation
mathematical statistical
Using Statistical Distributions to Generate Data that Mimics the Real World Scenario
Author
Published

September 12, 2024

0.0.0.1 Data Simulation
Code
# load packages
library(here)
library(tidyverse)

Simulation of data used by the Government in MTI to place university students into 5 bands for award of Government scholarship

Disclaimer!

The data is simulated, and therefore substantially differs with the actual scenario! The data is for learning purposes only, and the statistical estimates reported MUST NOT be taken as true reflection of the real-word situation.

Code
# table display setup
#| label: tbl-simulated_data .striped .hover .primary .bordered
#| tbl-cap: "Simulated data"
#| tbl-cap-location: bottom 

# for reproducibility
set.seed(1) 

# generate data
simulated_data <- data.frame(Bands = rbinom(n = 1000, 
                                        size = 5, 
                                        prob = 0.5),
                         Gross_Family_Income = rnbinom(n = 1000, 
                                                         size = 5, 
                                                         mu = 20000),
                         Geographical_Location = as.factor(ceiling(runif(n = 1000, 
                                                                           min = 1, 
                                                                           max = 47))),
                         Poverty_Probability_Index = abs(rnorm(n = 1000, 
                                                                 mean = 0.3, 
                                                                 sd = 0.2)),
                         Orphans = as.factor(rbinom(n = 1000,
                                                    size = 1,
                                                    prob = 0.4)),
                         Disability = as.factor(rbinom(n = 1000,
                                                       size = 1,
                                                       prob = 0.022)),
                         Number_of_Dependents = rpois(n = 1000, 
                                                        lambda = 4),
                         Program_Costs_KES = abs(rnorm(n = 1000,
                                                     mean = 500000,
                                                     sd = 50000)),
                         Gender = as.factor(rbinom(n = 1000,
                                                   size = 3,
                                                   prob = 0.017)))

# view data (printed on your browser)
knitr::kable(head(x = simulated_data, n = 5))
Bands Gross_Family_Income Geographical_Location Poverty_Probability_Index Orphans Disability Number_of_Dependents Program_Costs_KES Gender
2 18621 36 0.1751991 1 0 4 510735.4 0
2 6235 39 0.2081443 1 0 5 466547.2 0
3 33812 33 0.5994036 0 0 3 458893.0 0
4 30154 22 0.3511708 0 0 3 554462.2 0
2 27145 14 0.1675043 0 0 5 464203.2 1
Code
# write data to disc
write.csv(x = simulated_data, row.names = FALSE, file = here::here("./Data/simulated_data.csv"))
saveRDS(object = simulated_data, file = here::here("./Data/simulated_data.rds"))