Title: | Survey Planning Tools |
---|---|
Description: | Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation. |
Authors: | Juris Breidaks [aut, cre], Martins Liberts [aut], Janis Jukams [aut] |
Maintainer: | Juris Breidaks <[email protected]> |
License: | GPL (>= 2) |
Version: | 4.0 |
Built: | 2025-02-10 04:13:25 UTC |
Source: | https://github.com/csblatvia/surveyplanning |
Tools for sample survey planning, including sample size calculation, estimation of expected precision for the estimates of totals, and calculation of optimal sample size allocation.
Package: | surveyplanning |
Version: | 2.9 |
Date: | 2017-10-26 |
Depends: | R (>= 3.0.0), data.table (>= 1.10.4), stats, laeken |
License: | GPL (>= 2) |
URL: | https://github.com/CSBLatvia/surveyplanning/ |
BugReports: | https://github.com/CSBLatvia/surveyplanning/issues/ |
Index:
dom_optimal_allocation Optimal sample size allocation expsize Sample size calculation expvar Expected precision for the estimates of totals min_count Minimal count of respondents for the given relative margin of error min_prop Minimal proportion for the given relative margin of error MoE_Y Margin of error for count MoE_P Margin of error for proportion optsize Optimal sample size allocation s2 Population variance estimation surveyplanning-package Survey Planning Tools
Juris Breidaks [aut, cre], Martins Liberts [aut], Janis Jukams [aut]
Maintainer: Juris Breidaks <[email protected]>
The function computes optimal sample size allocation over strata and domain for population.
dom_optimal_allocation( id, Dom, H, Y, Rh = NULL, deffh = NULL, indicator, sup_w, sup_cv, min_size = 3, correction_before = FALSE, dataset = NULL )
dom_optimal_allocation( id, Dom, H, Y, Rh = NULL, deffh = NULL, indicator, sup_w, sup_cv, min_size = 3, correction_before = FALSE, dataset = NULL )
id |
Variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, values are calculated for each domain. An object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
Y |
Variable of interest. Object convertible to |
Rh |
The expected response rate in each stratum (optional). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column |
deffh |
The expected design effect for the estimate of variable (optional). If not defined, it is assumed to be 1 for each variable in each stratum. If is defined, then variables is defined the same arrangement as |
indicator |
Variable for detection fully surveyed units. Object convertible to |
sup_w |
Variable for weight limit in domain of stratum. Object convertible to |
sup_cv |
Variable for maximum coeficient of variation (CV) in percentage for domain. Object convertible to |
min_size |
A numeric value for sample size. |
correction_before |
by default FALSE; correction of sample size is made before ending, if true, correction of sample size is made at the end. |
dataset |
Optional survey data object convertible to |
A list with eights data objects:
data |
An object as |
nh_larger_then_Nh |
An object as |
dom_strata_size |
An object as |
dom_size |
An object as |
size |
An object as |
dom_strata_expected_precision |
An object as |
dom_expected_precision |
An object as |
total_expected_precision |
An object as |
expsize
, optsize
, prop_dom_optimal_allocation
library("laeken") library("data.table") data("ses") data <- data.table(ses) data[, H := paste(location, NACE1, size, sep = "_")] data[, id := .I] data[, full := 0] data[, sup_cv := 10] data[, sup_w := 20] #vars <- dom_optimal_allocation(id = "id", dom = "sex", # H = "H", Y = "earnings", # indicator = "full", # sup_w = "sup_w", # sup_cv = "sup_cv", # min_size = 3, # correction_before = FALSE, # dataset = data) # dataset=data) #vars
library("laeken") library("data.table") data("ses") data <- data.table(ses) data[, H := paste(location, NACE1, size, sep = "_")] data[, id := .I] data[, full := 0] data[, sup_cv := 10] data[, sup_w := 20] #vars <- dom_optimal_allocation(id = "id", dom = "sex", # H = "H", Y = "earnings", # indicator = "full", # sup_w = "sup_w", # sup_cv = "sup_cv", # min_size = 3, # correction_before = FALSE, # dataset = data) # dataset=data) #vars
The function computes minimum sample size for each stratum to achieve defined precision (CV) for the estimates of totals in each stratum. The calculation takes into account expected totals, population variance, expected response rate and design effect in each stratum.
expsize(Yh, H, s2h, poph, Rh = NULL, deffh = NULL, CVh, dataset = NULL)
expsize(Yh, H, s2h, poph, Rh = NULL, deffh = NULL, CVh, dataset = NULL)
Yh |
The expected totals for variables of interest in each stratum. Object convertible to |
H |
The stratum variable. One dimensional object convertible to one-column |
s2h |
The expected population variance |
poph |
Population size in each stratum. One dimensional object convertible to one-column |
Rh |
The expected response rate in each stratum (optional). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column |
deffh |
The expected design effect for the estimates of totals (optional). If not defined, it is assumed to be 1 for each variable in each stratum. Object convertible to |
CVh |
Coefficient of variation (in percentage) to be achieved for each stratum. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
A data.table
is returned by the function, with variables:H
- stratum, variable
- the name of variable of interest, estim
- total value, deffh
- the expected design effect, s2h
- population variance ,
CVh
- the expected coefficient of variation, Rh
- the expected response rate, poph
- population size, nh
- minimal sample size to achieve defined precision (CV).
library("data.table") data <- data.table(H = 1:3, Yh = 10 * 1:3, Yh1 = 10 * 4:6, s2h = 10 * runif(3), s2h2 = 10 * runif(3), CVh = rep(4.9,3), poph = 8 * 1:3, Rh = rep(1, 3), deffh = rep(2, 3), deffh2 = rep(3, 3)) size <- expsize(Yh = c("Yh", "Yh1"), H = "H", s2h = c("s2h", "s2h2"), poph = "poph", Rh = "Rh", deffh = c("deffh", "deffh2"), CVh = "CVh", dataset = data) size
library("data.table") data <- data.table(H = 1:3, Yh = 10 * 1:3, Yh1 = 10 * 4:6, s2h = 10 * runif(3), s2h2 = 10 * runif(3), CVh = rep(4.9,3), poph = 8 * 1:3, Rh = rep(1, 3), deffh = rep(2, 3), deffh2 = rep(3, 3)) size <- expsize(Yh = c("Yh", "Yh1"), H = "H", s2h = c("s2h", "s2h2"), poph = "poph", Rh = "Rh", deffh = c("deffh", "deffh2"), CVh = "CVh", dataset = data) size
The function computes expected precision as variance, standard error, and coefficient of variation for the estimates.
expvar( Yh, Zh = NULL, H, s2h, nh, poph, Rh = NULL, deffh = NULL, Dom = NULL, dataset = NULL )
expvar( Yh, Zh = NULL, H, s2h, nh, poph, Rh = NULL, deffh = NULL, Dom = NULL, dataset = NULL )
Yh |
The expected totals for variables of interest in each stratum. Object convertible to |
Zh |
Optional variables of denominator for the expected ratio estimation in each stratum. Object convertible to |
H |
The stratum variable. One dimensional object convertible to one-column |
s2h |
The expected population variance |
nh |
Sample size in each stratum. One dimensional object convertible to one-column |
poph |
Population size in each stratum. One dimensional object convertible to one-column |
Rh |
The expected response rate in each stratum (optional). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column |
deffh |
The expected design effect for the estimates of totals (optional). If not defined, it is assumed to be 1 for each variable in each stratum. If is defined, then variables is defined the same arrangement as |
Dom |
Optional variables used to define population domains. Only domains as unions of strata can be defined. If supplied, estimated precision is calculated for each domain. An object convertible to |
dataset |
Optional survey data object convertible to |
A list with three data objects:
resultH |
An object as |
resultDom |
An object as |
result |
An object as |
library("data.table") data <- data.table(H = 1:3, Yh = 10 * 1:3, Yh1 = 10 * 4:6, s2h = 10 * runif(3), s2h2 = 10 * runif(3), nh = rep(4 * 1:3), poph = 8 * 1:3, Rh = rep(1, 3), deffh = rep(2, 3), deffh2 = rep(3, 3)) vars <- expvar(Yh = c("Yh", "Yh1"), H = "H", s2h = c("s2h", "s2h2"), nh = "nh", poph = "poph", Rh = "Rh", deffh = c("deffh", "deffh2"), dataset = data) vars
library("data.table") data <- data.table(H = 1:3, Yh = 10 * 1:3, Yh1 = 10 * 4:6, s2h = 10 * runif(3), s2h2 = 10 * runif(3), nh = rep(4 * 1:3), poph = 8 * 1:3, Rh = rep(1, 3), deffh = rep(2, 3), deffh2 = rep(3, 3)) vars <- expvar(Yh = c("Yh", "Yh1"), H = "H", s2h = c("s2h", "s2h2"), nh = "nh", poph = "poph", Rh = "Rh", deffh = c("deffh", "deffh2"), dataset = data) vars
The function computes minimal proportion for the given relative margin of error. The calculation takes into sample size, population size, margin of error, expected response rate and design effect.
min_count(n, pop, RMoE, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
min_count(n, pop, RMoE, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
n |
The expected sample size. |
pop |
Population size. |
RMoE |
The expected relative margin of error. |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
R |
The expected response rate (optional). If not defined, it is assumed to be 1 (full-response). |
deff_sam |
The expected design effect of sample design for the estimates (optional). If not defined, it is assumed to be 1. |
deff_est |
The estimated design effect of estimator for the estimates (optional). If not defined, it is assumed to be 1. |
The estimate of minimal count of respondents for the given relative margin of error.
min_count(n = 15e3, pop = 2e6, RMoE = 0.1) ## Not run: library("data.table") min_count(n = c(10e3, 15e3, 20e3), pop = 2e6, 0.1) n <- seq(10e3, 30e3, length.out = 11) # n <- sort(c(n, 22691)) n RMoE <- seq(.02, .2, length.out = 10) RMoE dt <- data.table(n = rep(n, each = length(RMoE)), RMoE = RMoE) dt[, Y := min_count(n = n, pop = 2.1e6, RMoE = RMoE, R = 1) / 1e3] dt ## End(Not run)
min_count(n = 15e3, pop = 2e6, RMoE = 0.1) ## Not run: library("data.table") min_count(n = c(10e3, 15e3, 20e3), pop = 2e6, 0.1) n <- seq(10e3, 30e3, length.out = 11) # n <- sort(c(n, 22691)) n RMoE <- seq(.02, .2, length.out = 10) RMoE dt <- data.table(n = rep(n, each = length(RMoE)), RMoE = RMoE) dt[, Y := min_count(n = n, pop = 2.1e6, RMoE = RMoE, R = 1) / 1e3] dt ## End(Not run)
The function computes minimal proportion for the given relative margin of error. The calculation takes into sample size, population size, margin of error, expected response rate and design effect.
min_prop(n, pop, RMoE, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
min_prop(n, pop, RMoE, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
n |
The expected sample size. |
pop |
Population size. |
RMoE |
The expected relative margin of error. |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
R |
The expected response rate (optional). If not defined, it is assumed to be 1 (full-response). |
deff_sam |
The expected design effect of sample design for the estimates (optional). If not defined, it is assumed to be 1. |
deff_est |
The estimated design effect of estimator for the estimates (optional). If not defined, it is assumed to be 1. |
The estimate of minimal proportion for the given relative margin of error.
min_prop(n = 100, pop = 1000, RMoE = 0.1)
min_prop(n = 100, pop = 1000, RMoE = 0.1)
The function computes margin of error for proportion. The calculation takes into proportion, expected response rate and design effect.
MoE_P(P = 0.5, n, pop, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
MoE_P(P = 0.5, n, pop, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
P |
The expected proportion for variable of interest. |
n |
The expected sample size. |
pop |
Population size. |
R |
The expected response rate (optional). If not defined, it is assumed to be 1 (full-response). |
deff_sam |
The expected design effect of sample design for the estimates (optional). If not defined, it is assumed to be 1. |
deff_est |
The estimated design effect of estimator for the estimates (optional). If not defined, it is assumed to be 1. |
confidence |
Optional |
positive value for confidence interval. This variable by default is 0.95.
The estimate of margin of error for proportion.
library("data.table") n <- 100 pop <- 1000 MoE_P(P = 0.5, n = n, pop = pop) DT <- data.table(P = seq(0, 1, 0.01)) DT[, Y := round(pop * P)] DT[, AMoE := MoE_P(P, n = 100, pop = 1000)] DT[Y > 0, RMoE := AMoE / Y] DT
library("data.table") n <- 100 pop <- 1000 MoE_P(P = 0.5, n = n, pop = pop) DT <- data.table(P = seq(0, 1, 0.01)) DT[, Y := round(pop * P)] DT[, AMoE := MoE_P(P, n = 100, pop = 1000)] DT[Y > 0, RMoE := AMoE / Y] DT
The function computes margin of error for count. The calculation takes into proportion, expected response rate and design effect.
MoE_Y(P = 0.5, n, pop, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
MoE_Y(P = 0.5, n, pop, confidence = 0.95, R = 1, deff_sam = 1, deff_est = 1)
P |
The expected proportion for variable of interest. |
n |
The expected sample size. |
pop |
Population size. |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
R |
The expected response rate (optional). If not defined, it is assumed to be 1 (full-response). |
deff_sam |
The expected design effect of sample design for the estimates (optional). If not defined, it is assumed to be 1. |
deff_est |
The estimated design effect of estimator for the estimates (optional). If not defined, it is assumed to be 1. |
The estimate of margin of error for count.
library("data.table") n <- 100 pop <- 1000 MoE_Y(P = 0.5, n = n, pop = pop) DT <- data.table(P = seq(0, 1, 0.01)) DT[, Y := round(pop * P)] DT[, AMoE := MoE_Y(P, n = 100, pop = 1000)] DT[Y > 0, RMoE := AMoE / Y] DT
library("data.table") n <- 100 pop <- 1000 MoE_Y(P = 0.5, n = n, pop = pop) DT <- data.table(P = seq(0, 1, 0.01)) DT[, Y := round(pop * P)] DT[, AMoE := MoE_Y(P, n = 100, pop = 1000)] DT[Y > 0, RMoE := AMoE / Y] DT
The function computes optimal sample size allocation over strata.
optsize( H, n, poph, s2h = NULL, Rh = NULL, deffh = NULL, fullsampleh = NULL, dataset = NULL )
optsize( H, n, poph, s2h = NULL, Rh = NULL, deffh = NULL, fullsampleh = NULL, dataset = NULL )
H |
The stratum variable. One dimensional object convertible to one-column |
n |
Total sample size. One dimensional object with length one. |
poph |
Population size in each stratum. One dimensional object convertible to one-column |
s2h |
The expected population variance |
Rh |
The expected response rate in each stratum (optional). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column |
deffh |
The expected design effect for the estimate of variable (optional). If not defined, it is assumed to be 1 for each variable in each stratum. If is defined, then variables is defined the same arrangement as |
fullsampleh |
Variable for detection fully surveyed stratum (optinal). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column |
dataset |
Optional survey data object convertible to |
An object as data.table
, with variables:H
- stratum, variable
- the name of variable for population variance ,
s2h
- population variance ,
Rh
- the expectedresponse rate, deffh
- the expected design effect, poph
- population size, deffh
- design effect, fullsampleh
- full sample indicator, nh
- sample size.
If s2h
and Rh
is not defined, the sample allocation will be calculated as proportional allocation (proportional to the population size).
If Rh
is not defined, the sample allocation will be calculated as Neyman allocation.
expsize
, dom_optimal_allocation
library("data.table") data <- data.table(H = 1 : 3, s2h=10 * runif(3), s2h2 = 10 * runif(3), poph = 8 * 1 : 3, Rh = rep(1, 3), dd = c(1, 1, 1)) vars <- optsize(H = "H", s2h = c("s2h", "s2h2"), n = 10, poph = "poph", Rh = "Rh", fullsampleh = NULL, dataset = data) vars
library("data.table") data <- data.table(H = 1 : 3, s2h=10 * runif(3), s2h2 = 10 * runif(3), poph = 8 * 1 : 3, Rh = rep(1, 3), dd = c(1, 1, 1)) vars <- optsize(H = "H", s2h = c("s2h", "s2h2"), n = 10, poph = "poph", Rh = "Rh", fullsampleh = NULL, dataset = data) vars
The function computes optimal sample size allocation over strata and domain for proportion.
prop_dom_optimal_allocation( H, Dom, pop = NULL, R = NULL, deff = NULL, se_max = 0.5, prop = 0.5, min_size = 3, step = 1, unit_level = TRUE, dataset = NULL )
prop_dom_optimal_allocation( H, Dom, pop = NULL, R = NULL, deff = NULL, se_max = 0.5, prop = 0.5, min_size = 3, step = 1, unit_level = TRUE, dataset = NULL )
H |
The stratum variable. One dimensional object convertible to one-column |
Dom |
Variables |
used to define population domains. An object convertible to data.table
or variable names as character vector, column numbers.
pop |
The |
population size in each stratum.
R |
The |
expected response rate in each stratum (optional). If not defined, it is assumed to be 1 in each stratum (full-response). Object convertible to one-column data.table
, variable name as character, or column number.
deff |
The |
expected design effect for the estimate of variable (optional). If not defined, it is assumed to be 1 for each variable in each stratum. If is defined, then variables is defined the same arrangement as Yh
. Object convertible to data.table
, variable name as character vector, or column numbers.
se_max |
Variable |
for maximum standarterror (se) in domain.
prop |
The |
excepted ratio proportion.
min_size |
A |
numeric value for minimal sample size.
step |
A |
value for pace.
unit_level |
A |
logical value, if dataset is prepared for unit level then value TRUE, othercase FALSE.
dataset |
Optional |
agrregated survey data object convertible to data.table
with one row for each stratum.
A list with two data objects:
datah |
An object as |
aggr_Dom |
An object as |
expsize
, optsize
, dom_optimal_allocation
library("data.table") library("laeken") data("eusilc") eusilc <- data.table(eusilc) dataset <- eusilc[, .(poph = sum(db090)), by = c("db040")] dataset[, dom := "1"] res <- prop_dom_optimal_allocation(H = "db040", Dom = "dom", pop = "poph", R = NULL, deff = NULL, se_max = 0.5, prop = 0.5, min_size = 3, step = 1, unit_level = FALSE, dataset = dataset)
library("data.table") library("laeken") data("eusilc") eusilc <- data.table(eusilc) dataset <- eusilc[, .(poph = sum(db090)), by = c("db040")] dataset[, dom := "1"] res <- prop_dom_optimal_allocation(H = "db040", Dom = "dom", pop = "poph", R = NULL, deff = NULL, se_max = 0.5, prop = 0.5, min_size = 3, step = 1, unit_level = FALSE, dataset = dataset)
The function rounds the values in its first argument to the specified number of decimal places (default 0).
round2(x, n)
round2(x, n)
x |
a numeric vector. |
n |
integer indicating the number of decimal places. |
Rounded value
expsize
, dom_optimal_allocation
dar <- 100 * runif(3) dar round2(dar, 1)
dar <- 100 * runif(3) dar round2(dar, 1)
The function to estimate population variance .
s2(y, w = NULL)
s2(y, w = NULL)
y |
Study variable. |
w |
Survey weight (optional). If not defined, it is assumed to be 1 for each element. |
Population variance or the estimate of population variance
.
If w
is not defined, the result is equal to the result of the function var
.
s2(1:10) s2(1:10, rep(1:2, each = 5)) all.equal(s2(1:10), var(1:10))
s2(1:10) s2(1:10, rep(1:2, each = 5)) all.equal(s2(1:10), var(1:10))