New Zealand Crime and Safety Survey 2006 - Technical Report 

1 Introduction | 2 Sampling methodology | 3 Questionnaire development and testing | 4 Fieldwork methods and interviewers | 5 Checks and audits | 6 Response rate and interview length | 7 Classifications and coding | 8 Survey weights | 9 Imputation | 10 Variance estimation and significance tests | References | A1 Response rate by interviewer experience | A2 Sample and population profiles | A3 ACNeilsen area sampling frame | A4 Effect of area unit population changes | A5 Derivation of eligibility probability estimate | A6 Investigation of incident dates | A7 Contact sheets | A8 Showcards | A9 Selected CAPI screenshots  

A6 Investigation of incident dates 

Investigation of Incident Dates in

New Zealand's 2001 and 2006 Crime Victim Surveys

James Reilly

24 August 2006

In both of the 2001 and 2006 crime victim surveys, a question early in the victim form asked for the approximate date of the incident. Since the estimates of victimisation rates related to the last full calendar year (called the reference year here), the date information was needed to establish which incidents fell into this year.

Dates were imputed for incidents without victim forms, making the assumption that incidents were spread evenly from the beginning of the reference year up to the date of the interview (as was also done in the earlier 1996 survey). An even spread would be expected if there was little change in victimisation levels within and between years.

 figure 1.

Figure 1. Proportion of victim form incidents falling in the reference year, by interview date. Results are shown for both the 2001 and 2006 surveys, along with a smooth curve fitted to this data. They fall below the expected proportion (the solid red line) throughout, with the shortfall growing larger later in the year.

However, the dates recorded in the victim forms show that the proportion of incidents reported in the reference year is lower than would be expected if they were evenly spread. An investigation has been conducted reviewing possible causes for this and the effect it may have had on victimisation rates. The results will help inform the interpretation of 2005 victimisation rates and comparisons between 2000 and 2005.

In the 2001 survey, only 46% of victim forms recorded that the incident occurred in the reference year, substantially lower than the expected 58%. The shortfall was smaller in the 2006 survey, where 72% of victim forms recorded that the incident occurred in the reference year, compared to the expected 78%. These proportions varied during the course of fieldwork, as shown in the following graph. Depending on the cause of the shortfall, it may have reduced the estimated victimisation levels in 2000 by 20% or more.

These figures relate to the dates recorded in the main victim forms only, not the dates for incidents recorded in the self-completion questionnaire sections. Because the date was recorded only for the most recent incident in each self-completion section, this affects the distribution of the dates, making the situation significantly more complex. In contrast, the incidents for the main victim forms were selected at random within offence type, which should have produced a representative selection of dates.

Possible causes for the shortfall include:

Seasonal crime patterns

Seasonality would induce fluctuations around the red line over the course of a year. We observe steady divergence over a ten month period, so this hypothesis does not explain the pattern seen in the data. Seasonal fluctuations would also have to be much stronger than those seen in recorded crime over 2004-2005 (which varied from approximately 31,000 to 36,000 offences a month) to produce effects as large as those observed here.

Increase in crime levels

 While an increase in crime levels could produce the observed pattern of dates, the steady divergence from the expected proportion would require an ongoing increase in crime. If this increase was linear, the magnitude of the shortfall would have required crime levels in 2001 to be roughly 25% higher on average than in 2000, peaking 50% higher late in 2001. However recorded crime levels for the two calendar years are very similar (427,230 offences in 2000 versus 426,526 in 2001). 

Recency effect

Recency effects are a widely recognised source of bias in retrospective surveys. Respondents often remember events as having occurred more recently than they truly did. This effect, also known as forward telescoping, leads to overestimates of rates when questions asking about all events in a period leading up to the interview. For example, newspaper readership is often estimated based on recall of readership in the last seven days, and the recency effects are known to occur in this setting. The crime survey situation is more complex, since we focus on incidents in the reference year. Recency bias is defined here as a tendency for incidents to be recalled as too recent, affecting both the reported incident dates and the numbers reported at the screeners. Some incidents that actually occurred during the reference year would be reported in the following year (between the end of the reference year and the interview date), and some incidents from the year before the reference year would be reported in the reference year. In other words, some incidents occurring in 2005 would be reported in 2006, and some 2004 incidents would be reported as occurring in 2005. These effects will balance out to some degree, although the former effect is expected to be stronger.

Although it is unclear how large the net effect would be, a thought experiment may help. Think of two adjacent years, 10 years ago. If we asked about the proportion of incidents in each year, we would expect a net recency effect near zero, i.e. equal numbers in each year. Then think of asking about adjacent periods 7 years ago, 5 years ago, and so on. As the years move closer to the present, we would expect to get greater numbers in the more recent year if recency effect became larger as periods get closer to present. For two periods running up to the present, with the most recent varying in size, this tendency could produce a pattern generally similar to that shown in the graph above. For example, the 50% ratio we observed 7 months into the interview year could reflect a 10 month period during which incidents were attributed to 2006 (Sep 05 – June 06, i.e. an extra 4 months) and a 10 month 2005 attribution period (Nov 04 – Aug 05, i.e. an extra 2 months, less the 4 months misattributed to 2006). However it seems that the recency effect would need to be strong and change sharply to explain the extent of the variation seen in these crime surveys.

figure 2.

Figure 2. Model for the recency effect. This graph shows an approximate numerical model for the recency effect which would be consistent with the observed results, but requires both a strong peak effect and sharp changes in the extent of the effect. Models without the peak are possible, but require a stronger effect. 

Under this interpretation, our incident counts would include some unwanted incidents (from 2004, but reported as in 2005) and exclude some we want (from 2005, but reported as in 2006). The shortfall pattern seems to require that the latter dominate, so our 2005 counts would be too low. The "even spread" imputation procedures used for the 2001 survey would not have accounted for this effect, leading to underestimated victimisation rates.

Recall bias

Another possible explanation for the shortfall is recall bias. People do not have perfect memories, so not every incident will be reported. Problems with recall generally get worse as one goes further back in time, so we would expect more incidents to go unreported from the reference year than from the more recent period leading up to the interview. This might explain the observed shortfall pattern. Recall problems would also be expected to affect less serious incidents more than highly serious incidents, since serious incidents are likely to be remembered better. In contrast, it is not obvious why recency effects or seasonality might exhibit this behaviour. Figure 3 shows that incidents reported to the police suffer from much less of a shortfall than other incidents, which supports the recall bias theory. A slightly more complex model for the proportion reported in the reference year, which also includes whether the respondent regarded the incident as a crime, also confirms this effect.

figure-3.

Figure 3. Proportion of incidents in the reference year, split by whether the incident was reported to the police. This graph shows that there is very little shortfall from the expected proportion for incidents that were reported to the police, at least until late in the year. For the first half of the year, virtually the entire shortfall appears to be confined to incidents that were not reported.

The shortfall only provides an indirect measure of any recall bias, since a proportion of incidents occurring after the reference year will also be omitted. This lowers the base that the proportion is calculated from by an unknown amount, although it should be quite small for interviews conducted early during the year, and so the shortfall at that point may not be far from the actual recall bias. The data indicates that the shortfall there may be quite small, perhaps only 5%. If recall bias increases steadily over time, this suggests that the level of recall bias in the base would be even smaller, and so the shortfall would be a reasonable approximation to level of recall bias in the reference year. Victimisation estimates for 2000 would have been deflated by about 20-25% on average, while victimisation estimates for 2005 would only be 8-10% too low overall.

Of course, some offence types would be more affected than others. It might be possible to refine the above estimates by modeling the proportion in the reference year based on various factors (such as reporting to police, whether seen as a crime, offence type).

Self-completion incidents would have also been affected by failure of the even spread imputation assumption, and rates for the types of offences reported there would have been more severely underestimated.

Combinations of the above

While recall bias appears to be the only plausible single cause of the shortfalls, it is certainly possible that some combination of the above effects is to blame. This is much more difficult to quantify, however, and recall bias would probably remain the dominant effect. For these reasons, it is sensible to rely predominantly on the recall interpretation.

Conclusion

Recall bias seems the most likely explanation for the observed incident date patterns. Seasonality cannot explain the observed patterns. Although a recency explanation is mathematically possible, it requires a very substantial and rapidly changing effect, making this explanation seem implausible. A combination of effects is also possible, although it would probably be dominated by recall bias. In the absence of other theories, the recall interpretation should lead our thinking.

Under this interpretation, recall problems appear to have led to moderately substantial underestimation of victimisation rates for non-sexual offences committed by strangers in 2000, deflating these by over 20% on average. However, due to fieldwork occurring earlier during the year, recall problems will cause the 2005 rates to be underestimated by less than half this much, causing an increase in estimated rates of over 10%. These effects will be heavier for less serious incidents, and lighter for more serious incidents.

Future surveys would ideally be conducted as close to the reference period as possible, since a few months can make a substantial difference to the expected amount of bias. However the same fieldwork period as the preceding survey should be used if comparability is required.

Appendix to dates investigation

A statistical model predicting the proportion of incidents reported in the reference year, based on the expected proportion for the interview date and indicators of the perceived seriousness of the incidents. The model was fitted using 2006 survey data.

Variables:

in2005 = whether the incident was reported in the reference year (2005)

Q119 = whether the police knew about the incident (reverse scale)

Q137_a_crime = whether the respondent perceived the incident as a crime

lpredRefYr = the naïve expected proportion of incidents in the reference year, based on the interview date, transformed to a logit scale.

Call:

glm(formula = in2005 ~ Q119 + Q137_a_crime + lpredRefYr, family = binomial(),

data = vfdate)

Deviance Residuals:

Min 1Q Median 3Q Max

-2.0675 -1.3601 0.7074 0.8490 1.1032

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.19661 0.21640 -0.909 0.3636

Q119 -0.21156 0.08425 -2.511 0.0120 *

Q137_a_crime 0.23821 0.08134 2.929 0.0034 **

lpredRefYr 1.08792 0.11601 9.378 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 4240.0 on 3598 degrees of freedom

Residual deviance: 4129.4 on 3595 degrees of freedom

AIC: 4137.4

Number of Fisher Scoring iterations: 4