New Zealand Crime and Safety Survey 2006 - Technical Report 

1 Introduction | 2 Sampling methodology | 3 Questionnaire development and testing | 4 Fieldwork methods and interviewers | 5 Checks and audits | 6 Response rate and interview length | 7 Classifications and coding | 8 Survey weights | 9 Imputation | 10 Variance estimation and significance tests | References | A1 Response rate by interviewer experience | A2 Sample and population profiles | A3 ACNeilsen area sampling frame | A4 Effect of area unit population changes | A5 Derivation of eligibility probability estimate | A6 Investigation of incident dates | A7 Contact sheets | A8 Showcards | A9 Selected CAPI screenshots  

10 Variance estimation and significance tests

 

Balanced repeated replication
Multiple imputation

While sample surveys like the NZCASS provide a practical and cost-effective means of collecting information on victimisation, the survey results are inherently subject to random sampling variation. The size of this variation must be estimated and considered to interpret the results sensibly. Variance estimation for the NZCASS is complicated by the survey’s complex sample design and the large amount of missing data. A balanced repeated replication method (Wolter, 1985) was used to accommodate the sample design and weighting[39], and the effect of imputation was estimated using multiple imputation.

Balanced repeated replication

Balanced repeated replication (BRR), like other resampling methods, uses the variation between the results for many sample "replicates" to estimate sampling variances (excluding imputation effects). BRR essentially uses an experimental design to determine a small "balanced" set of half-samples that contain all the information needed[40] to calculate a variance estimate as

 formula.

formula.

The strata from the main and booster samples have been treated as separate, giving 24 strata, and NAUs have been merged to form half-samples within strata using Lumley’s (2006) survey package. The weighting process has been run[41] using each of the 28 sets of BRR replicate weights, including the FPC and Fay’s adjustments, as input. This accounts for the effect of the full weighting framework.

One way of summarising how much the sampling variance is affected by the sample design, weighting and any imputation is to calculate how much smaller a simple random sample with the same variance would be. This is expressed as the ratio of the actual sample size to the size of this simple random sample, and called the design effe ct. Design effects can vary substantially from one statistic to another, even though these come from the same survey and are based on the same weights.

Design effects were calculated for the proportion giving each response for a wide range of variables, for person, household, and incident weights. Over 500 such statistics were analysed using person and household weights. For person weights, the design effects had a lower and upper quartile of 0.83 and 2.09 respectively, and an average of 1.64. Design effects were also calculated for the same measures using person weights but restricting the sample to Māori. The design effects for Māori were slightly smaller than for the full sample, having a lower and upper quartile of 0.80 and 1.84 respectively, and an average of 1.51. For household weights, the design effects (for the full sample) were generally smaller again, having a lower and upper quartile of 0.68 and 1.51 respectively, and an average of 1.22. For incident weights, design effects were calculated for over 250 statistics. The lower and upper quartiles were 1.70 and 4.06 respectively, and the average design effect was 3.19.

Some significance tests conducted for chapters 4 and 5 of the Key Findings report used design effects of 2 and 4 (round numbers close to the upper quartiles mentioned above) for analyses of people and incidents respectively.

Multiple imputation

The effect of imputation on the reliability of victimisation rates has been accounted for using multiple imputation (Rubin, 1987). Each stochastic imputation step was repeated 10 times, using parameter values drawn from their maximum likelihood distribution[42]. To produce each variance estimate, the 10 resulting imputed datasets were analysed using each of the 28 sets of replicate weights, producing 280 results. For a particular imputed dataset, say the

 jth

one, the results from all the BRR weights were combined using the BRR formula above to give the complete-data variance estimate

formula.

Once this was done for each imputed dataset, and the results were combined using Rubin’s standard combining rules:

formula.

where var

formula.

is the variance among the BRR variance estimates for the 10 imputed datasets. Confidence intervals were calculated using a t distribution with the appropriate degrees of freedom.

The variance estimates for victimisation rates assume that the imputation and analysis models are congenial (Meng, 1994). Model misspecification can cause multiple imputation to produce biased variance estimates.

The design effects for victimisation rates have been calculated for over 30 offence types. These design effects include the effect of imputation, and have an average of 2.0, and lower and upper quartiles of 1.2 and 2.2 respectively. The imputation effect alone usually increases their variances by a factor between 1.2 and 1.8 (the lower and upper quartiles). Sexual offences have particularly large imputation effects, ranging from approximately 2 to 10.


Footnotes

39 Standard linearisation software was considered as an alternative, but did not seem capable of reflecting the complex weighting framework used here.

40 At least for linear statistics.

41 However, the same non-response adjustment model was used across all replicates, i.e. model selection was not rerun for each replicate. This may lead to sampling errors being slightly underestimated.

42 Except for the duplication adjustment, where a simple pq/n formula was used for the parameter variance.