Title: | Variance Estimation for Sample Surveys by the Ultimate Cluster Method |
---|---|
Description: | Generation of domain variables, linearization of several non-linear population statistics (the ratio of two totals, weighted income percentile, relative median income ratio, at-risk-of-poverty rate, at-risk-of-poverty threshold, Gini coefficient, gender pay gap, the aggregate replacement ratio, the relative median income ratio, median income below at-risk-of-poverty gap, income quintile share ratio, relative median at-risk-of-poverty gap), computation of regression residuals in case of weight calibration, variance estimation of sample surveys by the ultimate cluster method (Hansen, Hurwitz and Madow, Sample Survey Methods And Theory, vol. I: Methods and Applications; vol. II: Theory. 1953, New York: John Wiley and Sons), variance estimation for longitudinal, cross-sectional measures and measures of change for single and multistage stage cluster sampling designs (Berger, Y. G., 2015, <doi:10.1111/rssa.12116>). Several other precision measures are derived - standard error, the coefficient of variation, the margin of error, confidence interval, design effect. |
Authors: | Juris Breidaks [aut], Martins Liberts [aut, cre], Santa Ivanova [aut], Aleksis Jursevskis [ctb], Anthony Damico [ctb], Liliana Roze [ctb], Central Statistical Bureau of Latvia [cph, fnd] |
Maintainer: | Martins Liberts <[email protected]> |
License: | EUPL |
Version: | 0.20.3 |
Built: | 2024-11-19 04:40:19 UTC |
Source: | https://github.com/csblatvia/vardpoor |
The function computes extra variables for domain estimation. Each unique D
row defines a domain. Extra variables are computed for each Y
variable.
domain(Y, D, dataset = NULL, checking = TRUE)
domain(Y, D, dataset = NULL, checking = TRUE)
Y |
Matrix of study variables. Any object convertible to |
D |
Matrix of domain variables. Any object convertible to |
dataset |
Optional survey data object convertible to |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
Numeric data.table
containing extra variables for domain estimation.
Carl-Erik Sarndal, Bengt Swensson, Jan Wretman. Model Assisted Survey Sampling. Springer-Verlag, 1992, p.70.
### Example 0 domain(Y = 1, D = "A") ### Example 1 Y1 <- as.matrix(1 : 10) colnames(Y1) <- "Y1" D1 <- as.matrix(rep(1, 10)) colnames(D1) <- "D1" domain(Y = Y1, D = D1) ### Example 2 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(rep(1 : 2, each = 5), 10, 1) colnames(D) <- "D" domain(Y, D) ### Example 3 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(rep(1 : 4, each = 5), 10, 2) colnames(D) <- paste0("D", 1 : 2) domain(Y, D) ### Example 4 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(c(rep(1 : 2, each = 5), rep(3, 10)), 10, 2) colnames(D) <- paste0("D", 1 : 2) domain(Y, D)
### Example 0 domain(Y = 1, D = "A") ### Example 1 Y1 <- as.matrix(1 : 10) colnames(Y1) <- "Y1" D1 <- as.matrix(rep(1, 10)) colnames(D1) <- "D1" domain(Y = Y1, D = D1) ### Example 2 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(rep(1 : 2, each = 5), 10, 1) colnames(D) <- "D" domain(Y, D) ### Example 3 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(rep(1 : 4, each = 5), 10, 2) colnames(D) <- paste0("D", 1 : 2) domain(Y, D) ### Example 4 Y <- matrix(1 : 20, 10, 2) colnames(Y) <- paste0("Y", 1 : 2) D <- matrix(c(rep(1 : 2, each = 5), rep(3, 10)), 10, 2) colnames(D) <- paste0("D", 1 : 2) domain(Y, D)
The function computes the estimates of weighted percentiles.
incPercentile( Y, weights = NULL, sort = NULL, Dom = NULL, period = NULL, k = c(20, 80), dataset = NULL, checking = TRUE )
incPercentile( Y, weights = NULL, sort = NULL, Dom = NULL, period = NULL, k = c(20, 80), dataset = NULL, checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
weights |
Optional weight variable. One dimensional object convert to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, the estimates of percentiles are computed for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of at-risk-of-poverty threshold is done for each survey period. Object convertible to |
k |
A vector of values between 0 and 100 specifying the percentiles to be computed (0 gives the minimum, 100 gives the maximum). |
dataset |
Optional survey data object convertible to |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A data.table containing the estimates of weighted income percentiles specified by k
.
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
library("laeken") data("eusilc") incPercentile(Y = "eqIncome", weights = "rb050", Dom = "db040", dataset = eusilc)
library("laeken") data("eusilc") incPercentile(Y = "eqIncome", weights = "rb050", Dom = "db040", dataset = eusilc)
Computes linearized variable for the ratio estimator.
lin.ratio( Y, Z, weight, Dom = NULL, dataset = NULL, percentratio = 1, checking = TRUE )
lin.ratio( Y, Z, weight, Dom = NULL, dataset = NULL, percentratio = 1, checking = TRUE )
Y |
Matrix of numerator variables. Any object convertible to |
Z |
Matrix of denominator variables. Any object convertible to |
weight |
Weight variable. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, the linearized variables are computed for each domain. An object convertible to |
dataset |
Optional survey data object convertible to |
percentratio |
Positive integer value. All linearized variables are multiplied with |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The function returns the data.table
of the linearized variables for the ratio estimator.
Carl-Erik Sarndal, Bengt Swensson, Jan Wretman. Model Assisted Survey Sampling. Springer-Verlag, 1992, p.178.
domain
,
vardom
,
vardomh
,
vardcros
,
vardchanges
,
vardannual
library("data.table") Y <- data.table(Y = rchisq(10, 3)) Z <- data.table(Z = rchisq(10, 3)) weights <- rep(2, 10) data.table(Y, Z, weights, V1 = lin.ratio(Y, Z, weights, percentratio = 1), V10 = lin.ratio(Y, Z, weights, percentratio = 10), V100 = lin.ratio(Y, Z, weights, percentratio = 100))
library("data.table") Y <- data.table(Y = rchisq(10, 3)) Z <- data.table(Z = rchisq(10, 3)) weights <- rep(2, 10) data.table(Y, Z, weights, V1 = lin.ratio(Y, Z, weights, percentratio = 1), V10 = lin.ratio(Y, Z, weights, percentratio = 10), V100 = lin.ratio(Y, Z, weights, percentratio = 100))
Estimates the at-risk-of-poverty rate (defined as the proportion of persons with equalized disposable income below at-risk-of-poverty threshold) and computes linearized variable for variance estimation.
linarpr( Y, id = NULL, weight = NULL, Y_thres = NULL, wght_thres = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_arpr", checking = TRUE )
linarpr( Y, id = NULL, weight = NULL, Y_thres = NULL, wght_thres = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_arpr", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
Y_thres |
Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
wght_thres |
Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of at-risk-of-poverty threshold is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of at-risk-of-poverty threshold is done for each survey period. Object convertible to |
dataset |
Optional survey data object convertible to |
percentage |
A numeric value in range
For example, to compute at-risk-of-poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute at-risk-of-poverty threshold equal to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The implementation strictly follows the Eurostat definition.
A list with four objects are returned:
quantile
- a data.table
containing the estimated value of the quantile used for at-risk-of-poverty threshold estimation.
threshold
- a data.table
containing the estimated at-risk-of-poverty threshold.
value
- a data.table
containing the estimated at-risk-of-poverty rate (in percentage).
lin
- a data.table
containing the linearized variables of the at-risk-of-poverty rate (in percentage).
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
linarpt
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linarpr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d$value ## Not run: # By domains dd <- linarpr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linarpr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d$value ## Not run: # By domains dd <- linarpr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd ## End(Not run)
Estimates the at-risk-of-poverty threshold (defined as percentage (usually 60%) of equalised disposable income after social transfers quantile (usually median)) and computes linearized variable for variance estimation.
linarpt( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_arpt", checking = TRUE )
linarpt( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_arpt", checking = TRUE )
Y |
Study variable (for example equalised disposable income after social transfers). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of at-risk-of-poverty threshold is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of at-risk-of-poverty threshold is done for each survey period. Object convertible to |
dataset |
Optional survey data object convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute poverty threshold equal to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The implementation strictly follows the Eurostat definition.
A list with three objects are returned:
quantile
- a data.table
containing the estimated value of the quantile used for at-risk-of-poverty threshold estimation.
value
- a data.table
containing the estimated at-risk-of-poverty threshold (in percentage).
lin
- a data.table
containing the linearized variables of the at-risk-of-poverty threshold (in percentage).
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
linarpr
, incPercentile
,
varpoord
, vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d1 <- linarpt(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d1$value ## Not run: # By domains d2 <- linarpt(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) d2$value ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d1 <- linarpt(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d1$value ## Not run: # By domains d2 <- linarpt(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) d2$value ## End(Not run)
Estimates the aggregate replacement ratio (defined as the gross median individual pension income of the population aged 65-74 relative to the gross median individual earnings from work of the population aged 50-59, excluding other social benefits) and computes linearized variable for variance estimation.
linarr( Y, Y_den, id = NULL, age, pl085, month_at_work, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, order_quant = 50, var_name = "lin_arr", checking = TRUE )
linarr( Y, Y_den, id = NULL, age, pl085, month_at_work, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, order_quant = 50, var_name = "lin_arr", checking = TRUE )
Y |
Numerator variable (for gross pension income). One dimensional object convertible to one-column |
Y_den |
Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
age |
Age variable. One dimensional object convertible to one-column |
pl085 |
Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column |
month_at_work |
Variable for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number of months spent at part-time work as self-employed (including family worker)). One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of at-risk-of-poverty threshold is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of at-risk-of-poverty threshold is done for each survey period. Object convertible to |
dataset |
Optional survey data object convertible to |
order_quant |
A numeric value in range
For example, to compute at-risk-of-poverty threshold equal to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The implementation strictly follows the Eurostat definition.
A list with four objects are returned:
value
- a data.table
containing the estimated the aggregate replacement ratio.
lin
- a data.table
containing the linearized variables of the aggregate replacement ratio.
Working group on Statistics on Income and Living Conditions (2015) Task 5 - Improvement and optimization of calculation of net change. LC- 139/15/EN, Eurostat.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
varpoord
,
vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) dataset1$pl085 <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1$month_at_work <- 12 * trunc(runif(nrow(dataset1), 0, 2)) # Full population d <- linarr(Y = "eqIncome", Y_den = "eqIncome", id = "IDd", age = "age", pl085 = "pl085", month_at_work = "month_at_work", weight = "rb050", Dom = NULL, dataset = dataset1, order_quant = 50L) d$value ## Not run: # By domains dd <- linarr(Y = "eqIncome", Y_den = "eqIncome", id = "IDd", age = "age", pl085 = "pl085", month_at_work = "month_at_work", weight = "rb050", Dom = "db040", dataset = dataset1, order_quant = 50L) dd ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) dataset1$pl085 <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1$month_at_work <- 12 * trunc(runif(nrow(dataset1), 0, 2)) # Full population d <- linarr(Y = "eqIncome", Y_den = "eqIncome", id = "IDd", age = "age", pl085 = "pl085", month_at_work = "month_at_work", weight = "rb050", Dom = NULL, dataset = dataset1, order_quant = 50L) d$value ## Not run: # By domains dd <- linarr(Y = "eqIncome", Y_den = "eqIncome", id = "IDd", age = "age", pl085 = "pl085", month_at_work = "month_at_work", weight = "rb050", Dom = "db040", dataset = dataset1, order_quant = 50L) dd ## End(Not run)
Estimate the Gini coefficient, which is a measure for inequality, and its linearization.
lingini( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gini", checking = TRUE )
lingini( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gini", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the Gini is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of the Gini is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. return A list with two objects are returned by the function:
|
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
lingini2
,
linqsr
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)[1 : 3,] # Full population dat1 <- lingini(Y = "eqIncome", id = "IDd", weight = "rb050", dataset = dataset1) dat1$value ## Not run: # By domains dat2 <- lingini(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = c("db040"), dataset = dataset1) dat2$value ## End(Not run)
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc)[1 : 3,] # Full population dat1 <- lingini(Y = "eqIncome", id = "IDd", weight = "rb050", dataset = dataset1) dat1$value ## Not run: # By domains dat2 <- lingini(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = c("db040"), dataset = dataset1) dat2$value ## End(Not run)
Estimate the Gini coefficient, which is a measure for inequality, and its linearization.
lingini2( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gini2", checking = TRUE )
lingini2( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gini2", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the Gini is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of the Gini is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with two objects are returned by the function:
value
- a data.table
containing the estimated Gini coefficients (in percentage) by Langel and Tille (2012) and Eurostat.
lin
- a data.table
containing the linearized variables of the Gini coefficients (in percentage) by Langel and Tille (2012).
Eric Graf, Yves Tille, Variance Estimation Using Linearization for Poverty and Social Exclusion Indicators, Survey Methodology, June 2014 61 Vol. 40, No. 1, pp. 61-79, Statistics Canada, Catalogue no. 12-001-X, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/12-001-x2014001-eng.pdf
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
Matti Langel, Yves Tille, Corrado Gini, a pioneer in balanced sampling and inequality theory. Metron - International Journal of Statistics, 2011, vol. LXIX, n. 1, pp. 45-65, URL doi:10.1007/BF03263549.
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
lingini
,
linqsr
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population dat1 <- lingini2(Y = "eqIncome", id = "IDd", weight = "rb050", dataset = dataset1) dat1$value ## Not run: # By domains dat2 <- lingini2(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = c("db040"), dataset = dataset1) dat2$value ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population dat1 <- lingini2(Y = "eqIncome", id = "IDd", weight = "rb050", dataset = dataset1) dat1$value ## Not run: # By domains dat2 <- lingini2(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = c("db040"), dataset = dataset1) dat2$value ## End(Not run)
Estimation of gender pay (wage) gap and computation of linearized variables for variance estimation.
lingpg( Y, gender = NULL, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gpg", checking = TRUE )
lingpg( Y, gender = NULL, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, var_name = "lin_gpg", checking = TRUE )
Y |
Study variable (for example the gross hourly earning). One dimensional object convertible to one-column |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, estimation and linearization of gender pay (wage) gap is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, estimation and linearization of gender pay (wage) gap is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with two objects are returned:
value
- a data.table
containing the estimated gender pay (wage) gap (in percentage).
lin
- a data.table
containing the linearized variables of the gender pay (wage) gap (in percentage) for variance estimation.
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
linqsr
, lingini
,
varpoord
, vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("ses") dataset1 <- data.table(ID = paste0("V", 1 : nrow(ses)), ses) dataset1[, IDnum := .I] setnames(dataset1, "sex", "sexf") dataset1[sexf == "male", sex:= 1] dataset1[sexf == "female", sex:= 2] # Full population gpgs1 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", dataset = dataset1) gpgs1$value ## Not run: # Domains by education gpgs2 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", Dom = "education", dataset = dataset1) gpgs2$value # Sort variable gpgs3 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", sort = "IDnum", Dom = "education", dataset = dataset1) gpgs3$value # Two survey periods dataset1[, year := 2010] dataset2 <- copy(dataset1) dataset2[, year := 2011] dataset1 <- rbind(dataset1, dataset2) gpgs4 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", sort = "IDnum", Dom = "education", period = "year", dataset = dataset1) gpgs4$value names(gpgs4$lin) ## End(Not run)
library("data.table") library("laeken") data("ses") dataset1 <- data.table(ID = paste0("V", 1 : nrow(ses)), ses) dataset1[, IDnum := .I] setnames(dataset1, "sex", "sexf") dataset1[sexf == "male", sex:= 1] dataset1[sexf == "female", sex:= 2] # Full population gpgs1 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", dataset = dataset1) gpgs1$value ## Not run: # Domains by education gpgs2 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", Dom = "education", dataset = dataset1) gpgs2$value # Sort variable gpgs3 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", sort = "IDnum", Dom = "education", dataset = dataset1) gpgs3$value # Two survey periods dataset1[, year := 2010] dataset2 <- copy(dataset1) dataset2[, year := 2011] dataset1 <- rbind(dataset1, dataset2) gpgs4 <- lingpg(Y = "earningsHour", gender = "sex", id = "IDnum", weight = "weights", sort = "IDnum", Dom = "education", period = "year", dataset = dataset1) gpgs4$value names(gpgs4$lin) ## End(Not run)
Estimation of the median income of individuals below At Risk of Poverty Threshold and computation of linearized variable for variance estimation. The At Risk of Poverty Threshold is estimated for the whole population always. The median income is estimated for the whole population or for each domain.
linpoormed( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_poormed", checking = TRUE )
linpoormed( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_poormed", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the median income of persons below a poverty threshold is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of the median income of persons below a poverty threshold is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
. For example, to compute poverty threshold equal to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with two objects are returned by the function:
value
- a data.table
containing the estimated median income of individuals below the At Risk of Poverty Threshold.
lin
- a data.table
containing the linearized variables of the median income below the At Risk of Poverty Threshold.
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
linarpt
,
linrmpg
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linpoormed(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) ## Not run: # Domains by location of houshold dd <- linpoormed(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd ## End(Not run)
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linpoormed(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) ## Not run: # Domains by location of houshold dd <- linpoormed(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd ## End(Not run)
Estimate the Quintile Share Ratio, which is defined as the ratio of the sum of equalized disposable income received by the top 20% to the sum of equalized disposable income received by the bottom 20%, and its linearization.
linqsr( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, alpha = 20, var_name = "lin_qsr", checking = TRUE )
linqsr( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, alpha = 20, var_name = "lin_qsr", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the income quantile share ratio is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of the income quantile share ratio is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
alpha |
a numeric value in range |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with two objects are returned by the function:
value
- a data.table
containing the estimated Quintile Share Ratio by G. Osier and Eurostat papers.
lin
- a data.table
containing the linearized variables of the Quintile Share Ratio by G. Osier paper.
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
incPercentile
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population dd <- linqsr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, alpha = 20) dd$value ## Not run: # By domains dd <- linqsr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, alpha = 20) dd$value ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population dd <- linqsr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, alpha = 20) dd$value ## Not run: # By domains dd <- linqsr(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, alpha = 20) dd$value ## End(Not run)
Estimates the relative median income ratio (defined as the ratio of the median equivalised disposable income of people aged above age to the median equivalised disposable income of those aged below 65) and computes linearized variable for variance estimation.
linrmir( Y, id = NULL, age, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, order_quant = 50, var_name = "lin_rmir", checking = TRUE )
linrmir( Y, id = NULL, age, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, order_quant = 50, var_name = "lin_rmir", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
age |
Age variable. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of at-risk-of-poverty threshold is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of at-risk-of-poverty threshold is done for each survey period. Object convertible to |
dataset |
Optional survey data object convertible to |
order_quant |
A numeric value in range
For example, to compute the relative median income ratio to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The implementation strictly follows the Eurostat definition.
A list with four objects are returned:
value
- a data.table
containing the estimated relative median income ratio.
lin
- a data.table
containing the linearized variables of the relative median income ratio.
Working group on Statistics on Income and Living Conditions (2015) Task 5 - Improvement and optimization of calculation of net change. LC- 139/15/EN, Eurostat.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
varpoord
,
vardcrospoor
,
vardchangespoor
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linrmir(Y = "eqIncome", id = "IDd", age = "age", weight = "rb050", Dom = NULL, dataset = dataset1, order_quant = 50L) ## Not run: # By domains dd <- linrmir(Y = "eqIncome", id = "IDd", age = "age", weight = "rb050", Dom = "db040", dataset = dataset1, order_quant = 50L) dd ## End(Not run)
library("laeken") library("data.table") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linrmir(Y = "eqIncome", id = "IDd", age = "age", weight = "rb050", Dom = NULL, dataset = dataset1, order_quant = 50L) ## Not run: # By domains dd <- linrmir(Y = "eqIncome", id = "IDd", age = "age", weight = "rb050", Dom = "db040", dataset = dataset1, order_quant = 50L) dd ## End(Not run)
Estimate the relative median at-risk-of-poverty gap, which is defined as the relative difference between the median equalized disposable income of persons below the At Risk of Poverty Threshold and the At Risk of Poverty Threshold itself (expressed as a percentage of the at-risk-of-poverty threshold) and its linearization.
linrmpg( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_rmpg", checking = TRUE )
linrmpg( Y, id = NULL, weight = NULL, sort = NULL, Dom = NULL, period = NULL, dataset = NULL, percentage = 60, order_quant = 50, var_name = "lin_rmpg", checking = TRUE )
Y |
Study variable (for example equalized disposable income). One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
weight |
Optional weight variable. One dimensional object convertible to one-column |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the relative median at-risk-of-poverty gap is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, linearization of the relative median at-risk-of-poverty gap is done for each time period. Object convertible to |
dataset |
Optional survey data object convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute poverty threshold equal to some percentage of median income, |
var_name |
A character specifying the name of the linearized variable. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. return A list with two objects are returned by the function:
|
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
linarpt
,
linarpr
,
linpoormed
,
varpoord
,
vardcrospoor
,
vardchangespoor
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linrmpg(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d$value d$threshold ## Not run: # By domains dd <- linrmpg(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd$value ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) # Full population d <- linrmpg(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = NULL, dataset = dataset1, percentage = 60, order_quant = 50L) d$value d$threshold ## Not run: # By domains dd <- linrmpg(Y = "eqIncome", id = "IDd", weight = "rb050", Dom = "db040", dataset = dataset1, percentage = 60, order_quant = 50L) dd$value ## End(Not run)
Computes the estimation residuals of calibration.
residual_est(Y, X, weight, q, dataset = NULL, checking = TRUE)
residual_est(Y, X, weight, q, dataset = NULL, checking = TRUE)
Y |
Matrix of the variable of interest. |
X |
Matrix of the auxiliary variables for the calibration estimator. This is the matrix of the sample calibration variables. |
weight |
Weight variable. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
The function implements the following estimator:
where
.
A list with objects are returned by the function:
residuals
- a numeric data.table
containing the estimated residuals of calibration.
betas
- a numeric data.table
containing the estimated coefficients of calibration.
Sixten Lundstrom and Carl-Erik Sarndal. Estimation in the presence of Nonresponse and Frame Imperfections. Statistics Sweden, 2001, p. 43-44.
domain
, lin.ratio
, linarpr
,
linarpt
, lingini
, lingini2
,
lingpg
, linpoormed
, linqsr
,
linrmpg
, vardom
, vardomh
,
varpoord
, variance_est
, variance_othstr
Y <- matrix(rchisq(10, 3), 10, 1) X <- matrix(rchisq(20, 3), 10, 2) w <- rep(2, 10) q <- rep(1, 10) residual_est(Y, X, w, q) ### Test2 Y <- matrix(rchisq(10, 3), 10, 1) X <- matrix(c(rchisq(10, 2), rchisq(10, 2) + 10), 10, 2) w <- rep(2, 10) q <- rep(1, 10) residual_est(Y, X, w, q) as.matrix(lm(Y ~ X - 1, weights = w * q)$residuals)
Y <- matrix(rchisq(10, 3), 10, 1) X <- matrix(rchisq(20, 3), 10, 2) w <- rep(2, 10) q <- rep(1, 10) residual_est(Y, X, w, q) ### Test2 Y <- matrix(rchisq(10, 3), 10, 1) X <- matrix(c(rchisq(10, 2), rchisq(10, 2) + 10), 10, 2) w <- rep(2, 10) q <- rep(1, 10) residual_est(Y, X, w, q) as.matrix(lm(Y ~ X - 1, weights = w * q)$residuals)
Computes the estimation of the simple random sampling.
var_srs(Y, w = rep(1, length(Y)))
var_srs(Y, w = rep(1, length(Y)))
Y |
The variables of interest. |
w |
Weight variable. One dimensional object convertible to one-column |
A list with objects are returned by the function:
S2p
- a data.table
containing the values of the variance estimation of the population.
varsrs
- a data.table
containing the values of the variance estimation of the simple random sampling.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Ys <- matrix(rchisq(10, 3), 10, 1) ws <- c(rep(2, 5), rep(3, 5)) var_srs(Ys, ws)
Ys <- matrix(rchisq(10, 3), 10, 1) ws <- c(rep(2, 5), rep(3, 5)) var_srs(Ys, ws)
Computes the variance estimation for measures of annual net change or annual for single and multistage stage cluster sampling designs.
vardannual( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, years, subperiods, dataset = NULL, year1 = NULL, year2 = NULL, X = NULL, countryX = NULL, yearsX = NULL, subperiodsX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, frate = 0, percentratio = 1, use.estVar = FALSE, use.gender = FALSE, confidence = 0.95, method = "cros" )
vardannual( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, years, subperiods, dataset = NULL, year1 = NULL, year2 = NULL, X = NULL, countryX = NULL, yearsX = NULL, subperiodsX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, frate = 0, percentratio = 1, use.estVar = FALSE, use.gender = FALSE, confidence = 0.95, method = "cros" )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
Z |
Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertible to |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
country |
Variable for the survey countries. The values for each country are computed independently. Object convertible to |
years |
Variable for the all survey years. The values for each year are computed independently. Object convertible to |
subperiods |
Variable for the all survey sub-periods. The values for each sub-period are computed independently. Object convertible to |
year1 |
The vector of years from variable |
year2 |
The vector of years from variable |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
countryX |
Optional variable for the survey countries. The values for each country are computed independently. Object convertible to |
yearsX |
Variable of the all survey years. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
subperiodsX |
Variable for the all survey sub-periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
frate |
Positive numeric value. Sampling rate in percentage, by default - 0. |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
use.estVar |
Logical value. If value is |
use.gender |
Logical value. If value is |
confidence |
optional; either a positive value for confidence interval. This variable by default is 0.95. |
method |
character value; value 'cros' is for measures of annual or value 'netchanges' is for measures of annual net change. This variable by default is netchanges. |
ID_level2 |
Optional |
variable for unit ID codes. One dimensional object convertible to one-column data.table
or variable name as character, column number.
dataset |
Optional |
survey data object convertible to data.table
.
A list with objects are returned by the function:
crossectional_results
- a data.table
containing:
year
- survey years,
subperiods
- survey sub-periods,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
sample_size
- the sample size (in numbers of individuals),
pop_size
- the population size (in numbers of individuals),
total
- the estimated totals,
variance
- the estimated variance of cross-sectional or longitudinal measures,
sd_w
- the estimated weighted variance of simple random sample,
sd_nw
- the estimated variance estimation of simple random sample,
pop
- the population size (in numbers of households),
sampl_siz
- the sample size (in numbers of households),
stderr_w
- the estimated weighted standard error of simple random sample,
stderr_nw
- the estimated standard error of simple random sample,
se
- the estimated standard error of cross-sectional or longitudinal,
rse
- the estimated relative standard error (coefficient of variation),
cv
- the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error
- the estimated absolute margin of error,
relative_margin_of_error
- the estimated relative margin of error,
CI_lower
- the estimated confidence interval lower bound,
CI_upper
- the estimated confidence interval upper bound,
confidence_level
- the positive value for confidence interval.
crossectional_var_grad
- a data.table
containing:
year
- survey years,
subperiods
- survey sub-periods,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
grad
- the estimated gradient,
var
- the estimated a design-based variance.
vardchanges_grad_var
- a data.table
containing:
year_1
- survey years of years1
,
subperiods_1
- survey sub-periods of years1
,
year_2
- survey years of years2
,
subperiods_2
- survey sub-periods of years2
,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
nams
- gradient names, numerator (num) and denominator (den), for each year,
grad
- the estimated gradient,
cros_var
- the estimated a design-based variance.
vardchanges_rho
- a data.table
containing:
year
- survey years of years
for cross-sectional estimates,
subperiods
- survey sub-periods of years
for cross-sectional estimates,
year_1
- survey years of years1
,
subperiods_1
- survey sub-periods of years1
,
year_2
- survey years of years2
,
subperiods_2
- survey sub-periods of years2
,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
nams
- gradient names, numerator (num) and denominator (den), for each year,
rho
- the estimated correlation matrix.
vardchanges_var_tau
- a data.table
containing:
year_1
- survey years of years1
,
subperiods_1
- survey sub-periods of years1
,
year_2
- survey years of years2
,
subperiods_2
- survey sub-periods of years2
,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
nams
- gradient names, numerator (num) and denominator (den), for each year,
var_tau
- the estimated covariance matrix.
vardchanges_results
- a data.table
containing:
year
- survey years of years
for measures of annual,
subperiods
- survey sub-periods of years
for measures of annual,
year_1
- survey years of years1
for measures of annual net change,
subperiods_1
- survey sub-periods of years1
for measures of annual net change,
year_2
- survey years of years2
for measures of annual net change,
subperiods_2
- survey sub-periods of years2
for measures of annual net change,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
estim_1
- the estimated value for period1,
estim_2
- the estimated value for period2,
estim
- the estimated value,
var
- the estimated variance,
se
- the estimated standard error,
CI_lower
- the estimated confidence interval lower bound,
CI_upper
- the estimated confidence interval upper bound,
confidence_level
- the positive value for confidence interval,
significant
- is the the difference significant
X_annual
- a data.table
containing:
year
- survey years of years
for measures of annual,
year_1
- survey years of years1
for measures of annual net change,
year_2
- survey years of years2
for measures of annual net change,
period
- period1 and period2 together,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
cros_se
- the estimated cross-sectional standard error.
A_matrix
- a data.table
containing:
year
- survey years of years1
for measures of annual,
year_1
- survey years of years1
for measures of annual net change,
year_2
- survey years of years2
for measures of annual net change,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
cols
- the estimated matrix_A columns,
matrix_A
- the estimated matrix A.
annual_sum
- a data.table
containing:
year
- survey years,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
totalY
- the estimated value of variables of interest for period1,
totalZ
- optional the estimated value of denominator for period2,
estim
- the estimated value for year.
annual_results
- a data.table
containing:
year
- survey years of years
for measures of annual,
year_1
- survey years of years1
for measures of annual net change,
year_2
- survey years of years2
for measures of annual net change,
country
- survey countries,
Dom
- optional variable of the population domains,
namesY
- variable with names of variables of interest,
namesZ
- optional variable with names of denominator for ratio estimation,
estim_1
- the estimated value for period1 for measures of annual net change,
estim_2
- the estimated value for period2 for measures of annual net change,
estim
- the estimated value,
var
- the estimated variance,
se
- the estimated standard error,
rse
- the estimated relative standard error (coefficient of variation),
cv
- the estimated relative standard error (coefficient of variation) in percentage,
absolute_margin_of_error
- the estimated absolute margin of error for period1 for measures of annual,
relative_margin_of_error
- the estimated relative margin of error in percentage for measures of annual,
CI_lower
- the estimated confidence interval lower bound,
CI_upper
- the estimated confidence interval upper bound,
confidence_level
- the positive value for confidence interval,
significant
- is the the difference significant
Guillaume Osier, Virginie Raymond, (2015), Development of methodology for the estimate of variance of annual net changes for LFS-based indicators. Deliverable 1 - Short document with derivation of the methodology.
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en.
### Example library("data.table") data("eusilc", package = "laeken") set.seed(1) eusilc1 <- eusilc[1:20, ] dataset1 <- data.table(rbind(eusilc1, eusilc1), year = c(rep(2010, nrow(eusilc1)), rep(2011, nrow(eusilc1)))) dataset1[, country := "AT"] dataset1[, half := .I - 2 * trunc((.I - 1) / 2)] dataset1[, quarter := .I - 4 * trunc((.I - 1) / 4)] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(.N, 0, 5))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") dataset1[, strata := "XXXX"] dataset1[, employed := trunc(runif(.N, 0, 2))] dataset1[, unemployed := trunc(runif(.N, 0, 2))] dataset1[, labour_force := employed + unemployed] dataset1[, id_lv2 := paste0("V", .I)] vardannual(Y = "employed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = NULL, years = "year", subperiods = "half", dataset = dataset1, percentratio = 100, confidence = 0.95, method = "cros") vardannual(Y = "employed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = NULL, country = "country", years = "year", subperiods = "quarter", dataset = dataset1, year1 = 2010, year2 = 2011, percentratio = 100, confidence = 0.95, method = "netchanges") vardannual(Y = "unemployed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = "labour_force", country = "country", years = "year", subperiods = "quarter", dataset = dataset1, year1 = 2010, year2 = 2011, percentratio = 100, confidence = 0.95, method = "netchanges")
### Example library("data.table") data("eusilc", package = "laeken") set.seed(1) eusilc1 <- eusilc[1:20, ] dataset1 <- data.table(rbind(eusilc1, eusilc1), year = c(rep(2010, nrow(eusilc1)), rep(2011, nrow(eusilc1)))) dataset1[, country := "AT"] dataset1[, half := .I - 2 * trunc((.I - 1) / 2)] dataset1[, quarter := .I - 4 * trunc((.I - 1) / 4)] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(.N, 0, 5))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") dataset1[, strata := "XXXX"] dataset1[, employed := trunc(runif(.N, 0, 2))] dataset1[, unemployed := trunc(runif(.N, 0, 2))] dataset1[, labour_force := employed + unemployed] dataset1[, id_lv2 := paste0("V", .I)] vardannual(Y = "employed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = NULL, years = "year", subperiods = "half", dataset = dataset1, percentratio = 100, confidence = 0.95, method = "cros") vardannual(Y = "employed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = NULL, country = "country", years = "year", subperiods = "quarter", dataset = dataset1, year1 = 2010, year2 = 2011, percentratio = 100, confidence = 0.95, method = "netchanges") vardannual(Y = "unemployed", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lv2", Dom = NULL, Z = "labour_force", country = "country", years = "year", subperiods = "quarter", dataset = dataset1, year1 = 2010, year2 = 2011, percentratio = 100, confidence = 0.95, method = "netchanges")
Computes the variance estimation for measures of change for single and multistage stage cluster sampling designs.
vardchanges( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, period, dataset = NULL, period1, period2, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, linratio = FALSE, percentratio = 1, use.estVar = FALSE, outp_res = FALSE, confidence = 0.95, change_type = "absolute", checking = TRUE )
vardchanges( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, period, dataset = NULL, period1, period2, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, linratio = FALSE, percentratio = 1, use.estVar = FALSE, outp_res = FALSE, confidence = 0.95, change_type = "absolute", checking = TRUE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
Z |
Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertible to |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
country |
Variable for the survey countries. The values for each country are computed independently. Object convertible to |
period |
Variable for the all survey periods. The values for each period are computed independently. Object convertible to |
dataset |
Optional survey data object convertible to |
period1 |
The vector of periods from variable |
period2 |
The vector of periods from variable |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
countryX |
Optional variable for the survey countries. The values for each country are computed independently. Object convertible to |
periodX |
Optional variable of the all survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
linratio |
Logical value. If value is |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
use.estVar |
Logical value. If value is |
outp_res |
Logical value. If |
confidence |
optional; either a positive value for confidence interval. This variable by default is 0.95 . |
change_type |
character value net changes type - absolute or relative. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with objects are returned by the function:
res_out
- a data.table
containing the estimated residuals of calibration with ID_level1 and PSU by periods and countries (if available).
#'
crossectional_results
- a data.table
containing: period
- survey periods, country
- survey countries, Dom
- optional variable of the population domains, namesY
- variable with names of variables of interest, namesZ
- optional variable with names of denominator for ratio estimation, sample_size
- the sample size (in numbers of individuals), pop_size
- the population size (in numbers of individuals), total
- the estimated totals, variance
- the estimated variance of cross-sectional or longitudinal measures, sd_w
- the estimated weighted variance of simple random sample, sd_nw
- the estimated variance estimation of simple random sample, pop
- the population size (in numbers of households), sampl_siz
- the sample size (in numbers of households), stderr_w
- the estimated weighted standard error of simple random sample, stderr_nw
- the estimated standard error of simple random sample, se
- the estimated standard error of cross-sectional or longitudinal, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound.
#'
crossectional_var_grad
- a data.table
containing: periods
- survey periods, country
- survey countries, Dom
- optional variable of the population domains, namesY
- variable with names of variables of interest, namesZ
- optional variable with names of denominator for ratio estimation, grad
- the estimated gradient, var
- the estimated a design-based variance.
rho
- a data.table
containing: periods_1
- survey periods of periods1
, periods_2
- survey periods of periods2
, country
- survey countries, Dom
- optional variable of the population domains, namesY
- variable with names of variables of interest, namesZ
- optional variable with names of denominator for ratio estimation, nams
- the variable names in correlation matrix, rho
- the estimated correlation matrix.
var_tau
- a data.table
containing: periods_1
- survey periods of periods1
, periods_2
- survey periods of periods2
, country
- survey countries, Dom
- optional variable of the population domains, namesY
- variable with names of variables of interest, namesZ
- optional variable with names of denominator for ratio estimation, nams
- the variable names in correlation matrix, var_tau
- the estimated covariance matrix.
changes_results
- a data.table
containing: periods_1
- survey periods of periods1
, periods_2
- survey periods of periods2
, country
- survey countries, Dom
- optional variable of the population domains, namesY
- variable with names of variables of interest, namesZ
- optional variable with names of denominator for ratio estimation, estim_1
- the estimated value for period1, estim_2
- the estimated value for period2, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound. significant
- is the the difference significant.
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
domain
,
vardcros
,
vardchangespoor
### Example library("data.table") library("laeken") data("eusilc") set.seed(1) eusilc1 <- eusilc[1:40,] set.seed(1) dataset1 <- data.table(rbind(eusilc1, eusilc1), year = c(rep(2010, nrow(eusilc1)), rep(2011, nrow(eusilc1)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 5))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := c("XXXX")] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse (t_pov == 1, 1, 0)] dataset1[, id_lev2 := paste0("V", .I)] result <- vardchanges(Y = "pov", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lev2", Dom = NULL, Z = NULL, period = "year", dataset = dataset1, period1 = 2010, period2 = 2011, change_type = "absolute") result ## Not run: data("eusilc") dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[,.N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse (t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse (t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse (t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse (pov == 1 | dep == 1 | lwi == 1, 1, 0)] dataset1[, dom := 1] dataset1[, id_lev2 := .I] result <- vardchanges(Y = c("pov", "dep", "lwi", "arope"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lev2", Dom = "rb090", Z = NULL, period = "year", dataset = dataset1, period1 = 2010, period2 = 2011, change_type = "absolute") result ## End(Not run)
### Example library("data.table") library("laeken") data("eusilc") set.seed(1) eusilc1 <- eusilc[1:40,] set.seed(1) dataset1 <- data.table(rbind(eusilc1, eusilc1), year = c(rep(2010, nrow(eusilc1)), rep(2011, nrow(eusilc1)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 5))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := c("XXXX")] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse (t_pov == 1, 1, 0)] dataset1[, id_lev2 := paste0("V", .I)] result <- vardchanges(Y = "pov", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lev2", Dom = NULL, Z = NULL, period = "year", dataset = dataset1, period1 = 2010, period2 = 2011, change_type = "absolute") result ## Not run: data("eusilc") dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[,.N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse (t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse (t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse (t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse (pov == 1 | dep == 1 | lwi == 1, 1, 0)] dataset1[, dom := 1] dataset1[, id_lev2 := .I] result <- vardchanges(Y = c("pov", "dep", "lwi", "arope"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_lev2", Dom = "rb090", Z = NULL, period = "year", dataset = dataset1, period1 = 2010, period2 = 2011, change_type = "absolute") result ## End(Not run)
Computes the variance estimation for measures of change for indicators on social exclusion and poverty.
vardchangespoor( Y, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, country = NULL, period, sort = NULL, period1, period2, gender = NULL, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, use.estVar = FALSE, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg", change_type = "absolute" )
vardchangespoor( Y, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, country = NULL, period, sort = NULL, period1, period2, gender = NULL, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, use.estVar = FALSE, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg", change_type = "absolute" )
Y |
Study variable (for example equalized disposable income or gross pension income). One dimensional object convertible to one-column |
age |
Age variable. One dimensional object convertible to one-column |
pl085 |
Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column |
month_at_work |
Variable for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number of months spent at part-time work as self-employed (including family worker)). One dimensional object convertible to one-column |
Y_den |
Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column |
Y_thres |
Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
wght_thres |
Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
country |
Variable for the survey countries. The values for each country are computed independently. Object convertible to |
period |
Variable for the all survey periods. The values for each period are computed independently. Object convertible to |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
period1 |
The vector from variable |
period2 |
The vector from variable |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
countryX |
Optional variable for the survey countries. The values for each country are computed independently. Object convertible to |
periodX |
Optional variable of the survey periods and countries. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute poverty threshold equal to some percentage of median income, |
alpha |
a numeric value in range |
use.estVar |
Logical value. If value is |
confidence |
optional; either a positive value for confidence interval. This variable by default is 0.95. |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
type |
a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "linarr", "linrmir", "all_choices". |
change_type |
character value net changes type - absolute or relative. |
A list with objects are returned by the function:
cros_lin_out
- a data.table
containing the linearized values of the ratio estimator with ID_level2 and PSU by periods and countries (if available).
cros_res_out
- a data.table
containing the estimated residuals of calibration with ID_level1 and PSU by periods and countries (if available).
crossectional_results
- a data.table
containing: period
- survey periods, country
- survey countries, Dom
- optional variable of the population domains, type
- type variable, count_respondents
- the count of respondents, pop_size
- the population size (in numbers of individuals), estim
- the estimated value, se
- the estimated standard error, var
- the estimated variance, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage.
changes_results
- a data.table
containing: period
- survey periods, country
- survey countries, Dom
- optional variable of the population domains, type
- type variable, estim_1
- the estimated value for period1, estim_2
- the estimated value for period2, estim
- the estimated value, se
- the estimated standard error, var
- the estimated variance, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage.
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
domain
,
vardchanges
,
vardcros
,
vardcrospoor
### Example library("laeken") library("data.table") data(eusilc) set.seed(1) dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc))), country = c(rep("AT", nrow(eusilc)), rep("AT", nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] PSU$inc <- runif(nrow(PSU), 20, 100000) dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := c("XXXX")] dataset1$pl085 <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1$month_at_work <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1[, id_l2 := paste0("V", .I)] result <- vardchangespoor(Y = "inc", age = "age", pl085 = "pl085", month_at_work = "month_at_work", Y_den = "inc", Y_thres = "inc", wght_thres = "rb050", H = "strata", PSU = "PSU", w_final="rb050", ID_level1 = "db030", ID_level2 = "id_l2", Dom = c("rb090"), country = "country", period = "year", sort = NULL, period1 = c(2010, 2011), period2 = c(2011, 2010), gender = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = 0.95, type = "linrmpg") result
### Example library("laeken") library("data.table") data(eusilc) set.seed(1) dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc))), country = c(rep("AT", nrow(eusilc)), rep("AT", nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] PSU$inc <- runif(nrow(PSU), 20, 100000) dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") PSU <- eusilc <- NULL dataset1[, strata := c("XXXX")] dataset1$pl085 <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1$month_at_work <- 12 * trunc(runif(nrow(dataset1), 0, 2)) dataset1[, id_l2 := paste0("V", .I)] result <- vardchangespoor(Y = "inc", age = "age", pl085 = "pl085", month_at_work = "month_at_work", Y_den = "inc", Y_thres = "inc", wght_thres = "rb050", H = "strata", PSU = "PSU", w_final="rb050", ID_level1 = "db030", ID_level2 = "id_l2", Dom = c("rb090"), country = "country", period = "year", sort = NULL, period1 = c(2010, 2011), period2 = c(2011, 2010), gender = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = 0.95, type = "linrmpg") result
Computes the variance estimation for measures of annual net change or annual for single stratified sampling designs.
vardchangstrs( Y, H, PSU, w_final, Dom = NULL, periods = NULL, dataset, periods1, periods2, in_sample, in_frame, confidence = 0.95, percentratio = 1, correction = FALSE )
vardchangstrs( Y, H, PSU, w_final, Dom = NULL, periods = NULL, dataset, periods1, periods2, in_sample, in_frame, confidence = 0.95, percentratio = 1, correction = FALSE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
periods |
Variable for the all survey periods. The values for each period are computed independently. Object convertible to |
dataset |
Optional survey data object convertible to |
periods1 |
The vector of periods from variable |
periods2 |
The vector of periods from variable |
in_sample |
Sample variable. One dimensional object convertible to one-column |
in_frame |
Frame variable. One dimensional object convertible to one-column |
confidence |
optional; either a positive value for confidence interval. This variable by default is 0.95. |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
correction |
Logical value. If TRUE calculate variance without covariance (negative variance correction). |
A list with objects are returned by the function:
crossectional_results
- a data.table
containing: year
- survey years, subperiods
- survey sub-periods, variable
- names of variables of interest, Dom
- optional variable of the population domains, estim
- the estimated value, var
- the estimated variance of cross-sectional and longitudinal measures, sd_w
- the estimated weighted variance of simple random sample, se
- the estimated standard error of cross-sectional or longitudinal, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval.
annual_results
- a data.table
containing:
year_1
- survey years of years1
for measures of annual net change, year_2
- survey years of years2
for measures of annual net change, Dom
- optional variable of the population domains, variable
- names of variables of interest, estim_2
- the estimated value for period2 for measures of annual net change, estim_1
- the estimated value for period1 for measures of annual net change, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error for period1 for measures of annual, relative_margin_of_error
- the estimated relative margin of error in percentage for measures of annual, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, significant
- is the the difference significant.
annual_results_correction
- a data.table
of corrected variables (if correction TRUE) containing:
year_1
- survey years of years1
for measures of annual net change, year_2
- survey years of years2
for measures of annual net change, Dom
- optional variable of the population domains, variable
- names of variables of interest, estim_2
- the estimated value for period2 for measures of annual net change, estim_1
- the estimated value for period1 for measures of annual net change, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error for period1 for measures of annual, relative_margin_of_error
- the estimated relative margin of error in percentage for measures of annual, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, significant
- is the the difference significant.
Guillaume OSIER, Virginie RAYMOND, (2015), Development of methodology for the estimate of variance of annual net changes for LFS-based indicators. Deliverable 1 - Short document with derivation of the methodology.
Computes the variance estimation for cross-sectional and longitudinal measures for any stage cluster sampling designs.
vardcros( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, period, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, linratio = FALSE, percentratio = 1, use.estVar = FALSE, ID_level1_max = TRUE, outp_res = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = 0.95, checking = TRUE )
vardcros( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, Z = NULL, gender = NULL, country = NULL, period, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, linratio = FALSE, percentratio = 1, use.estVar = FALSE, ID_level1_max = TRUE, outp_res = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = 0.95, checking = TRUE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
Z |
Optional variables of denominator for ratio estimation. If supplied, the ratio estimation is computed. Object convertible to |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
country |
Variable for the survey countries. The values for each country are computed independently. Object convertible to |
period |
Variable for the survey periods. The values for each period are computed independently. Object convertible to |
dataset |
Optional survey data object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
countryX |
Optional variable for the survey countries. The values for each country are computed independently. Object convertible to |
periodX |
Optional variable of the survey periods and countries. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
linratio |
Logical value. If value is |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
use.estVar |
Logical value. If value is |
ID_level1_max |
Logical value. If value is |
outp_res |
Logical value. If |
withperiod |
Logical value. If |
netchanges |
Logical value. If value is TRUE, then produce two objects: the first object is aggregation of weighted data by period (if available), country, strata and PSU, the second object is an estimation for Y, the variance, gradient for numerator and denominator by country and period (if available). If value is FALSE, then both objects containing |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
A list with four objects are returned by the function:
res_out
- a data.table
containing the estimated residuals of calibration with ID_level1 and PSU.
data_net_changes
- a data.table
containing aggregation of weighted data by period (if available) and countries (if available), country, strata, PSU.
var_grad
- a data.table
containing estimation for Y, the variance, gradient for numerator and denominator by period, country (if available) and population domains (if available).
results A data.table
containing: period
- survey periods, country
- survey countries (if available), Dom
- optional variable of the population domains, namesY
- names of variables of interest, namesZ
- optional variable for names of denominator for ratio estimation, sample_size
- the sample size (in numbers of individuals), pop_size
- the population size (in numbers of individuals), total
- the estimated totals, variance
- the estimated variance of cross-sectional or longitudinal measures, sd_w
- the estimated weighted variance of simple random sample, sd_nw
- the estimated variance estimation of simple random sample, pop
- the population size (in numbers of households), sampl_siz
- the sample size (in numbers of households), stderr_w
- the estimated weighted standard error of simple random sample, stderr_nw
- the estimated standard error of simple random sample, se
- the estimated standard error of cross-sectional or longitudinal, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval.
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
library("data.table") library("laeken") library("foreach") # Example 1 data(eusilc) set.seed(1) dataset1 <- data.table(eusilc) dataset1[, year := 2010] dataset1[, country := "AT"] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse(t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse(t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse(t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse(pov == 1 | dep == 1 | lwi == 1, 1, 0)] result11 <- vardcros(Y="arope", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset1, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) # Example 2 data(eusilc) set.seed(1) dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[, country := "AT"] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, strata := as.character(strata)] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse(t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse(t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse(t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse(pov == 1 | dep == 1 | lwi == 1, 1, 0)] result11 <- vardcros(Y = c("pov", "dep", "arope"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset1, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) dataset2 <- dataset1[exp2 == 1] result12 <- vardcros(Y = c("lwi"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset2, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) ### Example 3 data(eusilc) set.seed(1) year <- 2011 dataset1 <- data.table(rbind(eusilc, eusilc, eusilc, eusilc), rb010 = c(rep(2008, nrow(eusilc)), rep(2009, nrow(eusilc)), rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[, rb020 := "AT"] dataset1[, u := 1] dataset1[age < 0, age := 0] dataset1[, strata := "XXXX"] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) thres <- data.table(rb020 = as.character(rep("AT", 4)), thres = c(11406, 11931, 12371, 12791), rb010 = 2008:2011) dataset1 <- merge(dataset1, thres, all.x = TRUE, by = c("rb010", "rb020")) dataset1[is.na(u), u := 0] dataset1 <- dataset1[u == 1] ############# # T3 # ############# T3 <- dataset1[rb010 == year - 3] T3[, strata1 := strata] T3[, PSU1 := PSU] T3[, w1 := rb050] T3[, inc1 := eqIncome] T3[, rb110_1 := db030] T3[, pov1 := inc1 <= thres] T3 <- T3[, c("rb020", "rb030", "strata", "PSU", "inc1", "pov1"), with = FALSE] ############# # T2 # ############# T2 <- dataset1[rb010 == year - 2] T2[, strata2 := strata] T2[, PSU2 := PSU] T2[, w2 := rb050] T2[, inc2 := eqIncome] T2[, rb110_2 := db030] setnames(T2, "thres", "thres2") T2[, pov2 := inc2 <= thres2] T2 <- T2[, c("rb020", "rb030", "strata2", "PSU2", "inc2", "pov2"), with = FALSE] ############# # T1 # ############# T1 <- dataset1[rb010 == year - 1] T1[, strata3 := strata] T1[, PSU3 := PSU] T1[, w3 := rb050] T1[, inc3 := eqIncome] T1[, rb110_3 := db030] setnames(T1, "thres", "thres3") T1[, pov3 := inc3 <= thres3] T1 <- T1[, c("rb020", "rb030", "strata3", "PSU3", "inc3", "pov3"), with = FALSE] ############# # T0 # ############# T0 <- dataset1[rb010 == year] T0[, PSU4 := PSU] T0[, strata4 := strata] T0[, w4 := rb050] T0[, inc4 := eqIncome] T0[, rb110_4 := db030] setnames(T0, "thres", "thres4") T0[, pov4 := inc4 <= thres4] T0 <- T0[, c("rb010", "rb020", "rb030", "strata4", "PSU4", "w4", "inc4", "pov4"), with = FALSE] apv <- merge(T3, T2, all = TRUE, by = c("rb020", "rb030")) apv <- merge(apv, T1, all = TRUE, by = c("rb020", "rb030")) apv <- merge(apv, T0, all = TRUE, by = c("rb020", "rb030")) apv <- apv[(!is.na(inc1)) & (!is.na(inc2)) & (!is.na(inc3)) & (!is.na(inc4))] apv[, ppr := as.integer(((pov4 == 1) & ((pov1 == 1 & pov2 == 1 & pov3 == 1) | (pov1 == 1 & pov2 == 1 & pov3 == 0) | (pov1 == 1 & pov2 == 0 & pov3 == 1) | (pov1 == 0 & pov2 ==1 & pov3 == 1))))] result20 <- vardcros(Y = "ppr", H = "strata", PSU = "PSU", w_final = "w4", ID_level1 = "rb030", ID_level2 = "rb030", Dom = NULL, Z = NULL, country = "rb020", period = "rb010", dataset = apv, linratio = FALSE, withperiod = TRUE, netchanges = FALSE, confidence = .95) result20
library("data.table") library("laeken") library("foreach") # Example 1 data(eusilc) set.seed(1) dataset1 <- data.table(eusilc) dataset1[, year := 2010] dataset1[, country := "AT"] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse(t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse(t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse(t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse(pov == 1 | dep == 1 | lwi == 1, 1, 0)] result11 <- vardcros(Y="arope", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset1, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) # Example 2 data(eusilc) set.seed(1) dataset1 <- data.table(rbind(eusilc, eusilc), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[, country := "AT"] dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) PSU <- eusilc <- NULL dataset1[, strata := "XXXX"] dataset1[, strata := as.character(strata)] dataset1[, t_pov := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_dep := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, t_lwi := trunc(runif(nrow(dataset1), 0, 2))] dataset1[, exp := 1] dataset1[, exp2 := 1 * (age < 60)] # At-risk-of-poverty (AROP) dataset1[, pov := ifelse(t_pov == 1, 1, 0)] # Severe material deprivation (DEP) dataset1[, dep := ifelse(t_dep == 1, 1, 0)] # Low work intensity (LWI) dataset1[, lwi := ifelse(t_lwi == 1 & exp2 == 1, 1, 0)] # At-risk-of-poverty or social exclusion (AROPE) dataset1[, arope := ifelse(pov == 1 | dep == 1 | lwi == 1, 1, 0)] result11 <- vardcros(Y = c("pov", "dep", "arope"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset1, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) dataset2 <- dataset1[exp2 == 1] result12 <- vardcros(Y = c("lwi"), H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "rb090", Z = NULL, country = "country", period = "year", dataset = dataset2, linratio = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = .95) ### Example 3 data(eusilc) set.seed(1) year <- 2011 dataset1 <- data.table(rbind(eusilc, eusilc, eusilc, eusilc), rb010 = c(rep(2008, nrow(eusilc)), rep(2009, nrow(eusilc)), rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[, rb020 := "AT"] dataset1[, u := 1] dataset1[age < 0, age := 0] dataset1[, strata := "XXXX"] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] dataset1 <- merge(dataset1, PSU, by = "db030", all = TRUE) thres <- data.table(rb020 = as.character(rep("AT", 4)), thres = c(11406, 11931, 12371, 12791), rb010 = 2008:2011) dataset1 <- merge(dataset1, thres, all.x = TRUE, by = c("rb010", "rb020")) dataset1[is.na(u), u := 0] dataset1 <- dataset1[u == 1] ############# # T3 # ############# T3 <- dataset1[rb010 == year - 3] T3[, strata1 := strata] T3[, PSU1 := PSU] T3[, w1 := rb050] T3[, inc1 := eqIncome] T3[, rb110_1 := db030] T3[, pov1 := inc1 <= thres] T3 <- T3[, c("rb020", "rb030", "strata", "PSU", "inc1", "pov1"), with = FALSE] ############# # T2 # ############# T2 <- dataset1[rb010 == year - 2] T2[, strata2 := strata] T2[, PSU2 := PSU] T2[, w2 := rb050] T2[, inc2 := eqIncome] T2[, rb110_2 := db030] setnames(T2, "thres", "thres2") T2[, pov2 := inc2 <= thres2] T2 <- T2[, c("rb020", "rb030", "strata2", "PSU2", "inc2", "pov2"), with = FALSE] ############# # T1 # ############# T1 <- dataset1[rb010 == year - 1] T1[, strata3 := strata] T1[, PSU3 := PSU] T1[, w3 := rb050] T1[, inc3 := eqIncome] T1[, rb110_3 := db030] setnames(T1, "thres", "thres3") T1[, pov3 := inc3 <= thres3] T1 <- T1[, c("rb020", "rb030", "strata3", "PSU3", "inc3", "pov3"), with = FALSE] ############# # T0 # ############# T0 <- dataset1[rb010 == year] T0[, PSU4 := PSU] T0[, strata4 := strata] T0[, w4 := rb050] T0[, inc4 := eqIncome] T0[, rb110_4 := db030] setnames(T0, "thres", "thres4") T0[, pov4 := inc4 <= thres4] T0 <- T0[, c("rb010", "rb020", "rb030", "strata4", "PSU4", "w4", "inc4", "pov4"), with = FALSE] apv <- merge(T3, T2, all = TRUE, by = c("rb020", "rb030")) apv <- merge(apv, T1, all = TRUE, by = c("rb020", "rb030")) apv <- merge(apv, T0, all = TRUE, by = c("rb020", "rb030")) apv <- apv[(!is.na(inc1)) & (!is.na(inc2)) & (!is.na(inc3)) & (!is.na(inc4))] apv[, ppr := as.integer(((pov4 == 1) & ((pov1 == 1 & pov2 == 1 & pov3 == 1) | (pov1 == 1 & pov2 == 1 & pov3 == 0) | (pov1 == 1 & pov2 == 0 & pov3 == 1) | (pov1 == 0 & pov2 ==1 & pov3 == 1))))] result20 <- vardcros(Y = "ppr", H = "strata", PSU = "PSU", w_final = "w4", ID_level1 = "rb030", ID_level2 = "rb030", Dom = NULL, Z = NULL, country = "rb020", period = "rb010", dataset = apv, linratio = FALSE, withperiod = TRUE, netchanges = FALSE, confidence = .95) result20
Computes the variance estimation for cross-sectional and longitudinal measures for indicators on social exclusion and poverty.
vardcrospoor( Y, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, country = NULL, period, sort = NULL, gender = NULL, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, use.estVar = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg", checking = TRUE )
vardcrospoor( Y, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, country = NULL, period, sort = NULL, gender = NULL, dataset = NULL, X = NULL, countryX = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, use.estVar = FALSE, withperiod = TRUE, netchanges = TRUE, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg", checking = TRUE )
Y |
Variables of interest. Object convertible to |
age |
Age variable. One dimensional object convertible to one-column |
pl085 |
Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column |
month_at_work |
Variable for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number of months spent at part-time work as self-employed (including family worker)). One dimensional object convertible to one-column |
Y_den |
Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column |
Y_thres |
Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
wght_thres |
Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables are calculated for each domain. An object convertible to |
country |
Variable for the survey countries. The values for each country are computed independently. Object convertible to |
period |
Variable for the survey periods. The values for each period are computed independently. Object convertible to |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
countryX |
Optional variable for the survey countries. The values for each country are computed independently. Object convertible to |
periodX |
Optional variable of the survey periods and countries. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute poverty threshold equal to some percentage of median income, |
alpha |
a numeric value in range |
withperiod |
Logical value. If |
netchanges |
Logical value. If value is TRUE, then produce two objects: the first object is aggregation of weighted data by period (if available), country, strata and PSU, the second object is an estimation for Y, the variance, gradient for numerator and denominator by country and period (if available). If value is FALSE, then both objects containing |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
type |
a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "linarr", "linrmir". |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
ind_gr |
Optional |
variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column data.table
or variable name as character, column number.
use.estVar |
Logical |
value. If value is TRUE
, then R
function estVar
is used for the estimation of covariance matrix of the residuals. If value is FALSE
, then R
function estVar
is not used for the estimation of covariance matrix of the residuals.
A list with objects are returned by the function:
lin_out
- a data.table
containing the linearized values of the ratio estimator with ID_level2 and PSU.
res_out
- a data.table
containing the estimated residuals of calibration with ID_level1 and PSU.
data_net_changes
- a data.table
containing aggregation of weighted data by period (if available), country, strata, PSU.
results
- a data.table
containing: period
- survey periods, country
- survey countries, Dom
- optional variable of the population domains, type
- type variable, count_respondents
- the count of respondents, pop_size
- the population size (in numbers of individuals), estim
- the estimated value, se
- the estimated standard error, var
- the estimated variance, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage.
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF. Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF
library("data.table") data("eusilc", package = "laeken") setDT(eusilc) set.seed(1) eusilc <- eusilc[sample(x = .N, size = 3000)] dataset1 <- data.table(rbindlist(list(eusilc, eusilc)), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] PSU[, inc := runif(.N, 20, 100000)] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") dataset1[, strata := "XXXX"] dataset1[, pl085 := 12 * trunc(runif(.N, 0, 2))] dataset1[, month_at_work := 12 * trunc(runif(.N, 0, 2))] dataset1[, id_l2 := paste0("V", .I)] vardcrospoor(Y = "inc", age = "age", pl085 = "pl085", month_at_work = "month_at_work", Y_den = "inc", Y_thres = "inc", wght_thres = "rb050", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_l2", Dom = c("rb090", "db040"), country = NULL, period = "year", sort = NULL, gender = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = 0.95, type = "linrmpg")
library("data.table") data("eusilc", package = "laeken") setDT(eusilc) set.seed(1) eusilc <- eusilc[sample(x = .N, size = 3000)] dataset1 <- data.table(rbindlist(list(eusilc, eusilc)), year = c(rep(2010, nrow(eusilc)), rep(2011, nrow(eusilc)))) dataset1[age < 0, age := 0] PSU <- dataset1[, .N, keyby = "db030"][, N := NULL] PSU[, PSU := trunc(runif(nrow(PSU), 0, 100))] PSU[, inc := runif(.N, 20, 100000)] dataset1 <- merge(dataset1, PSU, all = TRUE, by = "db030") dataset1[, strata := "XXXX"] dataset1[, pl085 := 12 * trunc(runif(.N, 0, 2))] dataset1[, month_at_work := 12 * trunc(runif(.N, 0, 2))] dataset1[, id_l2 := paste0("V", .I)] vardcrospoor(Y = "inc", age = "age", pl085 = "pl085", month_at_work = "month_at_work", Y_den = "inc", Y_thres = "inc", wght_thres = "rb050", H = "strata", PSU = "PSU", w_final = "rb050", ID_level1 = "db030", ID_level2 = "id_l2", Dom = c("rb090", "db040"), country = NULL, period = "year", sort = NULL, gender = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = 0.95, type = "linrmpg")
Computes the variance estimation of the sample surveys in domain by the ultimate cluster method.
vardom( Y, H, PSU, w_final, id = NULL, Dom = NULL, period = NULL, PSU_sort = NULL, N_h = NULL, fh_zero = FALSE, PSU_level = TRUE, Z = NULL, X = NULL, ind_gr = NULL, g = NULL, q = NULL, dataset = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
vardom( Y, H, PSU, w_final, id = NULL, Dom = NULL, period = NULL, PSU_sort = NULL, N_h = NULL, fh_zero = FALSE, PSU_level = TRUE, Z = NULL, X = NULL, ind_gr = NULL, g = NULL, q = NULL, dataset = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables of interest are calculated for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, residual estimation of calibration is done independently for each time period. One dimensional object convertible to one-column |
PSU_sort |
optional; if PSU_sort is defined, then variance is calculated for systematic sample. |
N_h |
Number of primary sampling units in population for each stratum (and period if |
fh_zero |
by default FALSE; |
PSU_level |
by default TRUE; if PSU_level is TRUE, in each strata |
Z |
Optional variables of denominator for ratio estimation. Object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
Calculate variance estimation in domains based on book of Hansen, Hurwitz and Madow.
A list with objects is returned by the function:
lin_out
- a data.table
containing the linearized values of the ratio estimator with id and PSU.
res_out
- a data.table
containing the estimated residuals of calibration with id and PSU.
betas
- a numeric data.table
containing the estimated coefficients of calibration.
all_result
- a data.table
, which containing variables:
variable
- names of variables of interest, Dom
- optional variable of the population domains, period
- optional variable of the survey periods, respondent_count
- the count of respondents, pop_size
- the estimated size of population, n_nonzero
- the count of respondents, who answers are larger than zero, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error in percentage, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, S2_y_HT
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights, S2_y_ca
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights, S2_res
- the estimated variance of the regression residuals, var_srs_HT
- the estimated variance of the HT estimator under SRS, var_cur_HT
- the estimated variance of the HT estimator under current design, var_srs_ca
- the estimated variance of the calibrated estimator under SRS, deff_sam
- the estimated design effect of sample design, deff_est
- the estimated design effect of estimator, deff
- the overall estimated design effect of sample design and estimator, n_eff
- the effective sample size.
Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
domain
,
lin.ratio
,
residual_est
,
vardomh
,
var_srs
,
variance_est
,
variance_othstr
library("data.table") library("laeken") data(eusilc) dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) aa <- vardom(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h = NULL, Z = NULL, X = NULL, g = NULL, q = NULL, dataset = dataset1, confidence = .95, percentratio = 100, outp_lin = TRUE, outp_res = TRUE)
library("data.table") library("laeken") data(eusilc) dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) aa <- vardom(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h = NULL, Z = NULL, X = NULL, g = NULL, q = NULL, dataset = dataset1, confidence = .95, percentratio = 100, outp_lin = TRUE, outp_res = TRUE)
Computes the variance estimation for sample surveys in domain by the two stratification.
vardom_othstr( Y, H, H2, PSU, w_final, id = NULL, Dom = NULL, period = NULL, N_h = NULL, N_h2 = NULL, Z = NULL, X = NULL, ind_gr = NULL, g = NULL, q = NULL, dataset = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
vardom_othstr( Y, H, H2, PSU, w_final, id = NULL, Dom = NULL, period = NULL, N_h = NULL, N_h2 = NULL, Z = NULL, X = NULL, ind_gr = NULL, g = NULL, q = NULL, dataset = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
H2 |
The unit new stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
id |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, linearization of the at-risk-of-poverty rate is done for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, residual estimation of calibration is done independently for each time period. One dimensional object convertible to one-column |
N_h |
optional data object convertible to |
N_h2 |
optional data object convertible to |
Z |
optional variables of denominator for ratio estimation. Object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
percentratio |
Positive |
numeric value. All linearized variables are multiplied with percentratio
value, by default - 1.
A list with objects are returned by the function:
lin_out
- a data.table
containing the linearized values of the ratio estimator with id and PSU.
res_out
- a data.table
containing the estimated residuals of calibration with id and PSU.
betas
- a numeric data.table
containing the estimated coefficients of calibration.
s2g
- a data.table
containing the s^2g value.
all_result
- a data.table
, which containing variables: respondent_count
- the count of respondents, pop_size
- the estimated size of population, n_nonzero
- the count of respondents, who answers are larger than zero, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error in percentage, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, var_srs_HT
- the estimated variance of the HT estimator under SRS, var_cur_HT
- the estimated variance of the HT estimator under current design, var_srs_ca
- the estimated variance of the calibrated estimator under SRS, deff_sam
- the estimated design effect of sample design, deff_est
- the estimated design effect of estimator, deff
- the overall estimated design effect of sample design and estimator.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
M. Liberts. (2004) Non-response Analysis and Bias Estimation in a Survey on Transportation of Goods by Road.
domain
,
lin.ratio
,
residual_est
,
vardomh
,
var_srs
,
variance_est
,
variance_othstr
library("laeken") library("data.table") data("eusilc") # Example 1 eusilc1 <- eusilc[1:1000, ] dataset1 <- data.table(IDd = paste0("V", 1:nrow(eusilc1)), eusilc1) dataset1[, db040_2 := get("db040")] N_h2 <- dataset1[, sum(rb050, na.rm = FALSE), keyby = "db040_2"] aa <- vardom_othstr(Y = "eqIncome", H = "db040", H2 = "db040_2", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h = NULL, N_h2 = N_h2, Z = NULL, X = NULL, g = NULL, q = NULL, dataset = dataset1, confidence = .95, outp_lin = TRUE, outp_res = TRUE) ## Not run: # Example 2 dataset1 <- data.table(IDd = 1:nrow(eusilc), eusilc) dataset1[, db040_2 := get("db040")] N_h2 <- dataset1[, sum(rb050, na.rm = FALSE), keyby = "db040_2"] aa <- vardom_othstr(Y = "eqIncome", H = "db040", H2 = "db040_2", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h2 = N_h2, Z = NULL, X = NULL, g = NULL, dataset = dataset1, q = NULL, confidence = .95, outp_lin = TRUE, outp_res = TRUE) aa ## End(Not run)
library("laeken") library("data.table") data("eusilc") # Example 1 eusilc1 <- eusilc[1:1000, ] dataset1 <- data.table(IDd = paste0("V", 1:nrow(eusilc1)), eusilc1) dataset1[, db040_2 := get("db040")] N_h2 <- dataset1[, sum(rb050, na.rm = FALSE), keyby = "db040_2"] aa <- vardom_othstr(Y = "eqIncome", H = "db040", H2 = "db040_2", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h = NULL, N_h2 = N_h2, Z = NULL, X = NULL, g = NULL, q = NULL, dataset = dataset1, confidence = .95, outp_lin = TRUE, outp_res = TRUE) ## Not run: # Example 2 dataset1 <- data.table(IDd = 1:nrow(eusilc), eusilc) dataset1[, db040_2 := get("db040")] N_h2 <- dataset1[, sum(rb050, na.rm = FALSE), keyby = "db040_2"] aa <- vardom_othstr(Y = "eqIncome", H = "db040", H2 = "db040_2", PSU = "db030", w_final = "rb050", id = "rb030", Dom = "db040", period = NULL, N_h2 = N_h2, Z = NULL, X = NULL, g = NULL, dataset = dataset1, q = NULL, confidence = .95, outp_lin = TRUE, outp_res = TRUE) aa ## End(Not run)
Computes the variance estimation in domain for ID_level1.
vardomh( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, period = NULL, N_h = NULL, PSU_sort = NULL, fh_zero = FALSE, PSU_level = TRUE, Z = NULL, dataset = NULL, X = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
vardomh( Y, H, PSU, w_final, ID_level1, ID_level2, Dom = NULL, period = NULL, N_h = NULL, PSU_sort = NULL, fh_zero = FALSE, PSU_level = TRUE, Z = NULL, dataset = NULL, X = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = 0.95, percentratio = 1, outp_lin = FALSE, outp_res = FALSE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Variable for unit ID codes. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, values are calculated for each domain. An object convertible to |
period |
Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to |
N_h |
Number of primary sampling units in population for each stratum (and period if |
PSU_sort |
optional; if PSU_sort is defined, then variance is calculated for systematic sample. |
fh_zero |
by default FALSE; |
PSU_level |
by default TRUE; if PSU_level is TRUE, in each strata |
Z |
Optional variables of denominator for ratio estimation. Object convertible to |
dataset |
Optional survey data object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
periodX |
Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in level1 convertible to |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
percentratio |
Positive numeric value. All linearized variables are multiplied with |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
Calculate variance estimation in domains for household surveys based on book of Hansen, Hurwitz and Madow.
A list with objects are returned by the function:
lin_out A data.table
containing the linearized values of the ratio estimator with ID_level2 and PSU.
res_out A data.table
containing the estimated residuals of calibration with ID_level1 and PSU.
betas A numeric data.table
containing the estimated coefficients of calibration.
all_result A data.table
, which containing variables:
variable
- names of variables of interest, Dom
- optional variable of the population domains, period
- optional variable of the survey periods, respondent_count
- the count of respondents, pop_size
- the estimated size of population, n_nonzero
- the count of respondents, who answers are larger than zero, estim
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error in percentage, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, S2_y_HT
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights, S2_y_ca
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights, S2_res
- the estimated variance of the regression residuals, S2_res
- the estimated variance of the regression residuals, var_srs_HT
- the estimated variance of the HT estimator under SRS for household, var_cur_HT
- the estimated variance of the HT estimator under current design for household, var_srs_ca
- the estimated variance of the calibrated estimator under SRS for household, deff_sam
- the estimated design effect of sample design for household, deff_est
- the estimated design effect of estimator for household, deff
- the overall estimated design effect of sample design and estimator for household
Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier, Yves Berger, Tim Goedeme, (2013), Standard error estimation for the EU-SILC indicators of poverty and social exclusion, Eurostat Methodologies and Working papers, URL https://ec.europa.eu/eurostat/documents/3888793/5855973/KS-RA-13-024-EN.PDF.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
domain
,
lin.ratio
,
residual_est
,
var_srs
,
variance_est
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) aa <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = NULL, N_h = NULL, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = 0.95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) ## Not run: dataset2 <- copy(dataset1) dataset1$period <- 1 dataset2$period <- 2 dataset1 <- data.table(rbind(dataset1, dataset2)) # by default without using fh_zero (finite population correction) aa2 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa2 # without using fh_zero (finite population correction) aa3 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, fh_zero = FALSE, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa3 # with using fh_zero (finite population correction) aa4 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, fh_zero = TRUE, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa4 ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset1 <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) aa <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = NULL, N_h = NULL, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = 0.95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) ## Not run: dataset2 <- copy(dataset1) dataset1$period <- 1 dataset2$period <- 2 dataset1 <- data.table(rbind(dataset1, dataset2)) # by default without using fh_zero (finite population correction) aa2 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa2 # without using fh_zero (finite population correction) aa3 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, fh_zero = FALSE, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa3 # with using fh_zero (finite population correction) aa4 <- vardomh(Y = "eqIncome", H = "db040", PSU = "db030", w_final = "rb050", ID_level1 = "db030", ID_level2 = "rb030", Dom = "db040", period = "period", N_h = NULL, fh_zero = TRUE, Z = NULL, dataset = dataset1, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, confidence = .95, percentratio = 1, outp_lin = TRUE, outp_res = TRUE) aa4 ## End(Not run)
Computes the variance estimation by the ultimate cluster method.
variance_est( Y, H, PSU, w_final, N_h = NULL, fh_zero = FALSE, PSU_level = TRUE, PSU_sort = NULL, period = NULL, dataset = NULL, msg = "", checking = TRUE )
variance_est( Y, H, PSU, w_final, N_h = NULL, fh_zero = FALSE, PSU_level = TRUE, PSU_sort = NULL, period = NULL, dataset = NULL, msg = "", checking = TRUE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
N_h |
Number of primary sampling units in population for each stratum (and period if |
fh_zero |
by default FALSE; |
PSU_level |
by default TRUE; if PSU_level is TRUE, in each strata |
PSU_sort |
optional; if PSU_sort is defined, then variance is calculated for systematic sample. |
period |
Optional variable for the survey periods. If supplied, the values for each period are computed independently. Object convertible to |
dataset |
an optional name of the individual dataset |
msg |
an optional printed text, when function print error. |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
If we assume that for all
, that is, two or more PSUs are selected from each stratum, then the variance of
can be estimated from the variation among the estimated PSU totals of the variable
:
where
is the sampling fraction of PSUs within stratum
is the stratum number, with a total of H strata
is the primary sampling unit (PSU) number within stratum
, with a total of
PSUs
is the household number within cluster
of stratum
, with a total of
household
is the sampling weight for household
in PSU
of stratum
denotes the observed value of the analysis variable
for household
in PSU
of stratum
a data.table
containing the values of the variance estimation by totals.
Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second onwards? 2012
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
domain
, lin.ratio
, linarpr
,
linarpt
, lingini
, lingini2
,
lingpg
, linpoormed
, linqsr
,
linrmpg
, residual_est
, vardom
,
vardomh
, varpoord
, variance_othstr
Ys <- rchisq(10, 3) w <- rep(2, 10) PSU <- 1 : length(Ys) H <- rep("Strata_1", 10) # by default without using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w) ## Not run: # without using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = FALSE) # with using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = TRUE) ## End(Not run)
Ys <- rchisq(10, 3) w <- rep(2, 10) PSU <- 1 : length(Ys) H <- rep("Strata_1", 10) # by default without using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w) ## Not run: # without using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = FALSE) # with using fh_zero (finite population correction) variance_est(Y = Ys, H = H, PSU = PSU, w_final = w, fh_zero = TRUE) ## End(Not run)
Computes s2g and the variance estimation by the new stratification.
variance_othstr( Y, H, H2, w_final, N_h = NULL, N_h2, period = NULL, dataset = NULL, checking = TRUE )
variance_othstr( Y, H, H2, w_final, N_h = NULL, N_h2, period = NULL, dataset = NULL, checking = TRUE )
Y |
Variables of interest. Object convertible to |
H |
The unit stratum variable. One dimensional object convertible to one-column |
H2 |
The unit new stratum variable. One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
N_h |
optional; either a |
N_h2 |
optional; either a |
period |
Optional variable for the survey periods. If supplied, the values for each period are computed independently. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
checking |
Optional variable if this variable is TRUE, then function checks data preparation errors, otherwise not checked. This variable by default is TRUE. |
It is possible to compute population size from sampling frame. The standard deviation of
-th stratum is
and
have to be estimated to estimate
. Estimate of
is
, where
,
is the index group of successfully surveyed units belonging to
-th stratum. #'Estimate of
is
So the estimate of is
Two conditions have to realize to estimate and
Variance of is
Estimate of is
A list with objects are returned by the function:
betas A numeric data.table
containing the estimated coefficients of calibration.
s2g A data.table
containing the s^2g value.
var_est A data.table
containing the values of the variance estimation.
M. Liberts. (2004) Non-response Analysis and Bias Estimation in a Survey on Transportation of Goods by Road.
domain
, lin.ratio
, linarpr
,
linarpt
, lingini
, lingini2
,
lingpg
, linpoormed
, linqsr
,
linrmpg
, residual_est
, vardom
,
vardom_othstr
, vardomh
, varpoord
library("data.table") Y <- data.table(matrix(runif(50) * 5, ncol = 5)) H <- data.table(H = as.integer(trunc(5 * runif(10)))) H2 <- data.table(H2 = as.integer(trunc(3 * runif(10)))) N_h <- data.table(matrix(0 : 4, 5, 1)) setnames(N_h, names(N_h), "H") N_h[, sk:= 10] N_h2 <- data.table(matrix(0 : 2, 3, 1)) setnames(N_h2, names(N_h2), "H2") N_h2[, sk2:= 4] w_final <- rep(2, 10) vo <- variance_othstr(Y = Y, H = H, H2 = H2, w_final = w_final, N_h = N_h, N_h2 = N_h2, period = NULL, dataset = NULL) vo
library("data.table") Y <- data.table(matrix(runif(50) * 5, ncol = 5)) H <- data.table(H = as.integer(trunc(5 * runif(10)))) H2 <- data.table(H2 = as.integer(trunc(3 * runif(10)))) N_h <- data.table(matrix(0 : 4, 5, 1)) setnames(N_h, names(N_h), "H") N_h[, sk:= 10] N_h2 <- data.table(matrix(0 : 2, 3, 1)) setnames(N_h2, names(N_h2), "H2") N_h2[, sk2:= 4] w_final <- rep(2, 10) vo <- variance_othstr(Y = Y, H = H, H2 = H2, w_final = w_final, N_h = N_h, N_h2 = N_h2, period = NULL, dataset = NULL) vo
Computes the estimation of the variance for indicators on social exclusion and poverty.
varpoord( Y, w_final, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, ID_level1, ID_level2 = NULL, H, PSU, N_h, PSU_sort = NULL, fh_zero = FALSE, PSU_level = TRUE, sort = NULL, Dom = NULL, period = NULL, gender = NULL, dataset = NULL, X = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg" )
varpoord( Y, w_final, age = NULL, pl085 = NULL, month_at_work = NULL, Y_den = NULL, Y_thres = NULL, wght_thres = NULL, ID_level1, ID_level2 = NULL, H, PSU, N_h, PSU_sort = NULL, fh_zero = FALSE, PSU_level = TRUE, sort = NULL, Dom = NULL, period = NULL, gender = NULL, dataset = NULL, X = NULL, periodX = NULL, X_ID_level1 = NULL, ind_gr = NULL, g = NULL, q = NULL, datasetX = NULL, percentage = 60, order_quant = 50, alpha = 20, confidence = 0.95, outp_lin = FALSE, outp_res = FALSE, type = "linrmpg" )
Y |
Study variable (for example equalized disposable income or gross pension income). One dimensional object convertible to one-column |
w_final |
Weight variable. One dimensional object convertible to one-column |
age |
Age variable. One dimensional object convertible to one-column |
pl085 |
Retirement variable (Number of months spent in retirement or early retirement). One dimensional object convertible to one-column |
Y_den |
Denominator variable (for example gross individual earnings). One dimensional object convertible to one-column |
Y_thres |
Variable (for example equalized disposable income) used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
wght_thres |
Weight variable used for computation and linearization of poverty threshold. One dimensional object convertible to one-column |
ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ID_level2 |
Optional variable for unit ID codes. One dimensional object convertible to one-column |
H |
The unit stratum variable. One dimensional object convertible to one-column |
PSU |
Primary sampling unit variable. One dimensional object convertible to one-column |
N_h |
Number of primary sampling units in population for each stratum (and period if |
PSU_sort |
optional; if PSU_sort is defined, then variance is calculated for systematic sample. |
fh_zero |
by default FALSE; |
PSU_level |
by default TRUE; if PSU_level is TRUE, in each strata |
sort |
Optional variable to be used as tie-breaker for sorting. One dimensional object convertible to one-column |
Dom |
Optional variables used to define population domains. If supplied, variables is calculated for each domain. An object convertible to |
period |
Optional variable for survey period. If supplied, variables is calculated for each time period. Object convertible to |
gender |
Numerical variable for gender, where 1 is for males, but 2 is for females. One dimensional object convertible to one-column |
dataset |
Optional survey data object convertible to |
X |
Optional matrix of the auxiliary variables for the calibration estimator. Object convertible to |
periodX |
Optional variable of the survey periods. If supplied, residual estimation of calibration is done independently for each time period. Object convertible to |
X_ID_level1 |
Variable for level1 ID codes. One dimensional object convertible to one-column |
ind_gr |
Optional variable by which divided independently X matrix of the auxiliary variables for the calibration. One dimensional object convertible to one-column |
g |
Optional variable of the g weights. One dimensional object convertible to one-column |
q |
Variable of the positive values accounting for heteroscedasticity. One dimensional object convertible to one-column |
datasetX |
Optional survey data object in household level convertible to |
percentage |
A numeric value in range
For example, to compute poverty threshold equal to 60% of some income quantile, |
order_quant |
A numeric value in range
For example, to compute poverty threshold equal to some percentage of median income, |
alpha |
a numeric value in range |
confidence |
Optional positive value for confidence interval. This variable by default is 0.95. |
outp_lin |
Logical value. If |
outp_res |
Logical value. If |
type |
a character vector (of length one unless several.ok is TRUE), example "linarpr","linarpt", "lingpg", "linpoormed", "linrmpg", "lingini", "lingini2", "linqsr", "linarr", "linrmir". |
month_at_work |
Variable |
for total number of month at work (sum of the number of months spent at full-time work as employee, number of months spent at part-time work as employee, number of months spent at full-time work as self-employed (including family worker), number of months spent at part-time work as self-employed (including family worker)). One dimensional object convertible to one-column data.table
or variable name as character, column number.
A list with objects are returned by the function:
lin_out
- a data.table
containing the linearized values of the ratio estimator with ID_level2 and PSU.
res_out
- a data.table
containing the estimated residuals of calibration with ID_level1 and PSU.
betas
- a numeric data.table
containing the estimated coefficients of calibration.
all_result
- a data.table
, which containing variables: respondent_count
- the count of respondents, pop_size
- the estimated size of population, n_nonzero
- the count of respondents, who answers are larger than zero, value
- the estimated value, var
- the estimated variance, se
- the estimated standard error, rse
- the estimated relative standard error (coefficient of variation), cv
- the estimated relative standard error (coefficient of variation) in percentage, absolute_margin_of_error
- the estimated absolute margin of error, relative_margin_of_error
- the estimated relative margin of error in percentage, CI_lower
- the estimated confidence interval lower bound, CI_upper
- the estimated confidence interval upper bound, confidence_level
- the positive value for confidence interval, S2_y_HT
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using non-calibrated weights, S2_y_ca
- the estimated variance of the y variable in case of total or the estimated variance of the linearised variable in case of the ratio of two totals using calibrated weights, S2_res
- the estimated variance of the regression residuals, var_srs_HT
- the estimated variance of the HT estimator under SRS for household, var_cur_HT
- the estimated variance of the HT estimator under current design for household, var_srs_ca
- the estimated variance of the calibrated estimator under SRS for household, deff_sam
- the estimated design effect of sample design for household, deff_est
- the estimated design effect of estimator for household, deff
- the overall estimated design effect of sample design and estimator for household
Eric Graf and Yves Tille, Variance Estimation Using Linearization for Poverty and Social Exclusion Indicators, Survey Methodology, June 2014 61 Vol. 40, No. 1, pp. 61-79, Statistics Canada, Catalogue no. 12-001-X, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/12-001-x2014001-eng.pdf
Guillaume Osier and Emilio Di Meglio. The linearisation approach implemented by Eurostat for the first wave of EU-SILC: what could be done from the second wave onwards? 2012
Guillaume Osier (2009). Variance estimation for complex indicators of poverty and inequality. Journal of the European Survey Research Association, Vol.3, No.3, pp. 167-195, ISSN 1864-3361, URL https://ojs.ub.uni-konstanz.de/srm/article/view/369.
Eurostat Methodologies and Working papers, Standard error estimation for the EU-SILC indicators of poverty and social exclusion, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Jean-Claude Deville (1999). Variance estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology, 25, 193-203, URL https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
Eurostat Methodologies and Working papers, Handbook on precision requirements and variance estimation for ESS household surveys, 2013, URL https://ec.europa.eu/eurostat/documents/3859598/5927001/KS-RA-13-029-EN.PDF.
Matti Langel, Yves Tille, Corrado Gini, a pioneer in balanced sampling and inequality theory. Metron - International Journal of Statistics, 2011, vol. LXIX, n. 1, pp. 45-65, URL doi:10.1007/BF03263549.
Morris H. Hansen, William N. Hurwitz, William G. Madow, (1953), Sample survey methods and theory Volume I Methods and applications, 257-258, Wiley.
Yves G. Berger, Tim Goedeme, Guillame Osier (2013). Handbook on standard error estimation and other related sampling issues in EU-SILC, URL https://ec.europa.eu/eurostat/cros/content/handbook-standard-error-estimation-and-other-related-sampling-issues-ver-29072013_en
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
library("data.table") library("laeken") data("eusilc") dataset <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) dataset1 <- dataset[1 : 1000] #use dataset1 by default without using fh_zero (finite population correction) aa <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, sort = NULL, Dom = NULL, gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = FALSE, outp_res = FALSE, type = "linarpt") aa ## Not run: # use dataset1 by default with using fh_zero (finite population correction) aa2 <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, fh_zero = TRUE, sort = NULL, Dom = "db040", gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, datasetX = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = FALSE, outp_res = FALSE, type = "linarpt") aa2 aa2$all_result # using dataset1 aa4 <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, sort = NULL, Dom = "db040", gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, datasetX = NULL, dataset = dataset, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = TRUE, outp_res = TRUE, type = "linarpt") aa4$lin_out[20 : 40] ## End(Not run)
library("data.table") library("laeken") data("eusilc") dataset <- data.table(IDd = paste0("V", 1 : nrow(eusilc)), eusilc) dataset1 <- dataset[1 : 1000] #use dataset1 by default without using fh_zero (finite population correction) aa <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, sort = NULL, Dom = NULL, gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, q = NULL, datasetX = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = FALSE, outp_res = FALSE, type = "linarpt") aa ## Not run: # use dataset1 by default with using fh_zero (finite population correction) aa2 <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, fh_zero = TRUE, sort = NULL, Dom = "db040", gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, datasetX = NULL, dataset = dataset1, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = FALSE, outp_res = FALSE, type = "linarpt") aa2 aa2$all_result # using dataset1 aa4 <- varpoord(Y = "eqIncome", w_final = "rb050", Y_thres = NULL, wght_thres = NULL, ID_level1 = "db030", ID_level2 = "IDd", H = "db040", PSU = "rb030", N_h = NULL, sort = NULL, Dom = "db040", gender = NULL, X = NULL, X_ID_level1 = NULL, g = NULL, datasetX = NULL, dataset = dataset, percentage = 60, order_quant = 50L, alpha = 20, confidence = .95, outp_lin = TRUE, outp_res = TRUE, type = "linarpt") aa4$lin_out[20 : 40] ## End(Not run)