Sampling and non-sampling issues

This section looks at a number of issues one must take into account when analysing the 2002 NATSISS or when interpreting results published elsewhere. It begins with a discussion of sampling error and non-sampling error so that analysts appreciate the relevant issues when interpreting the data. The section then benchmarks data from the 2002 NATSISS to other comparable sources. The remainder of the section then looks at a number of non-sampling issues which arise when comparing data items through time, comparing data to other contemporary sources, or interpreting results from ABS publications. These issues are not necessarily all the potential pitfalls involved with analysing the data. Rather, this section uses several examples in an attempt to highlight that analysts need to be aware of the context of the survey before making conclusions based on the data.

Sampling error

Sampling error arises from the fact that samples differ from their populations in that they are usually small sub-sets of the total population. Therefore, survey sample results should be seen only as estimations. Henry (1990) note that sampling errors cannot be calculated for non-probability samples, but they can be determined for probability samples, such as the 2002 NATSISS. First, to determine sample error, look at the sample size. Then, look at the sampling fraction—the percentage of the population that is being surveyed. The more people surveyed, the smaller the error. This error can also be reduced, according to Fox and Tracy (1986), by increasing the representativeness of the sample.

The 2002 NATSISS gathered information from 9359 Indigenous Australians. This was estimated to represent about 1 in 30 Indigenous Australians aged over 15 at the time of the survey (ABS 2005c). Such a large sample allows reasonably accurate inferences to be made about the population as a whole, as is shown by the quite similar results from 2002 NATSISS and other data sources (for which sampling error is not an issue). For example, the number of CDEP participants as recorded by the 2002 NATSISS is almost identical to administrative records (2005c: 27). However, irrespective of the concordance of results with administrative or population data sources, all data in the 2002 NATSISS is subject to sampling error and hence analysts must be aware of the reliability of estimates.

A standard error is the extent to which we expect a calculation from a sample of individuals to differ from what the same calculation would be if we had information for the whole population. Alternatively, it can be thought of as the amount of variation around the estimated value due to the fact that only a sample of people was taken.

Fortunately, sampling error can be quantified with reasonable accuracy. To the credit of the ABS, and in comparison to previous surveys, information is now given in the survey publications and CURF to enable outside users to make their own estimates of the possible sampling error. Therefore, in contrast to much of the existing analysis of the 1994 NATSIS, it is now possible to put some figures around how confident we can be about certain conclusions we make from the data. [3] That is, we are able to calculate the standard error of an estimate.

Standard errors for estimates of means and proportions can be used in two ways: constructing confidence intervals and performing hypothesis tests. Strictly speaking, a confidence interval refers to what one would expect to occur under repeated sampling. So, a 95 per cent confidence interval means that, if a given value was the true population value, then in 95 per cent or 19 out of 20 repeated samples, we would expect the estimated value from a sample to lie within the range of the confidence interval. Although it is not strictly true, a 95 per cent confidence interval can also be thought of as the area bounding an estimate within which there is only a 5 per cent chance that the true population percentage is outside this range (but a sample of this size came up with the estimate that it did).

Instead of putting a range around the estimate, we may also want to test whether a certain estimate is different from another value. This value may be one we come up with, or another estimated value from the survey. To see whether this difference is significant in a statistical sense, a researcher can perform a hypothesis test. They would come up with a hypothesis, and then decide whether they can or cannot reject this hypothesis based on the estimated value from the sample, and the associated standard error. Note that one can only say conclusively that the population value is not a given percentage, never equivocally that it is.

There are three main ways in which one can estimate standard errors and hence estimate confidence intervals or perform hypothesis tests. Firstly, the ABS provides an approximation of the standard errors in tabular form which can be used with an Excel spreadsheet created by CAEPR for this conference. Secondly, the ABS provides a spreadsheet for estimates from the initial results publication that gives (relative) standard errors. Users can perform their own calculations with these estimates. Finally, on the CURF there are replicate weights that can be used to construct exact estimates of standard errors which users can then use to perform hypothesis tests. [4] There are formulas for deriving standard errors from replicate weights provided in ABS (2005c: 19).

Benchmarking 2002 NATSISS data against population estimates in the 2001 Census

This section benchmarks some selected results from the 2002 NATSISS using similar data from the 2001 Census to provide an indication of the underlying population estimates. The census estimates are calculated using almost the entire Indigenous population (i.e. less a relatively small undercount), and hence are reasonably reliable (from a sampling point of view). As indicated above, 2002 NATSISS data have sampling error associated with them, and hence the benchmarking exercise uses standard errors to calculate the 95 per cent confidence intervals around the results. These confidence intervals are represented as ‘whiskers’ around the 2002 NATSISS estimates in the following figures.

Strictly speaking, one should only compare 2002 NATSISS results to private dwellings in the census. However, the non-private dwelling estimates are also presented to allow us to see who is missing from the sample. Figure 4.1 shows the proportion of the population aged 15 and over with various levels of attainment at secondary school in non-remote areas. Figure 4.2 illustrates the same results for remote areas.

Residents in non-private dwellings in both remote and non-remote areas appear to be more heterogeneous than those in private dwellings. That is, non-private dwellings are more likely to contain both people who did not go to school and people who have Year 12 or equivalent. This reflects the fact that non-private dwellings include a diverse range of residences: prisons and hostels that contain a disproportionate number of people who are extremely disadvantaged; and nursing homes and hotels that may include people with better access to resources (e.g. people with later-year secondary qualifications).

Figure 4.1. School attainment in non-remote areas, 2001 and 2002

School attainment in non-remote areas, 2001 and 2002

Figure 4.2. School attainment in remote areas, 2001 and 2002

School attainment in remote areas, 2001 and 2002

In non-remote areas, it would appear that compared to the census, 2002 NATSISS had slightly lower completion rates for the later years of schooling. That is, there was a significantly lower proportion of Indigenous residents in private dwellings who have completed Year 11 or Year 12, no significant difference between those who completed Year 10 or Year 9, and a significantly higher proportion who were reported to have completed Year 8 or less (all at the 5% level of significance). 2002 NATSISS respondents were significantly more likely to be at school but less likely to have not gone to school at all.

In remote areas, there appeared to be less systematic variation. There is no significant difference between 2002 NATSISS estimates and private dwellings in the census for Still at school, Year 8 or below, Year 10, and Year 11. However, there was a significant difference between 2002 NATSISS and comparable census results for Year 12, Year 9 and for ‘Did not go to school’.

Figures 4.3 and 4.4 report labour force outcomes in the 2001 Census and 2002 NATSISS, once again presented for non-remote and then remote areas. Residents of non-private dwellings tend to have lower labour force participation rates than those in private dwellings, despite the apparent heterogeneity noted above. This probably reflects that being in jail and nursing homes tends to constrain an individual’s opportunities to work or look for employment, irrespective of whether they completed the later years of secondary school.

Figure 4.3. Labour force status in non-remote areas, 2001 and 2002

Labour force status in non-remote areas, 2001 and 2002

Figure 4.4. Labour force status in remote areas, 2001 and 2002

Labour force status in remote areas, 2001 and 2002

In non-remote areas, the proportion employed in the 2002 NATSISS is not significantly different from the 2001 Census estimates for private dwellings. However, compared to the census, more NATSISS respondents are classified as unemployed, but fewer are classified as being not in the labour force. This could either represent a change in the outcomes over time, seasonality differences caused by the respective collections occurring at different times in the year, or the greater opportunity within a face-to-face interview to clarify what is meant by unemployment.

In remote areas, there appears to have been a shift from respondents being not in the labour force in the census to being employed in the 2002 NATSISS. This apparent shift occurs despite the fact that 2002 NATSISS provides accurate estimates of CDEP scheme employment that is disproportionately concentrated in remote areas. Once again, it is unclear as to whether the difference between 2002 NATSISS and the census is a result of differential timing of the collections (although the magnitudes make that unlikely) or a greater chance of clarification in a face-to-face interview compared to the largely self-enumerated census. What we do know, though, is that the differences we have reported in the labour market and educational status are unlikely to have been caused by sampling error alone.

Notwithstanding this, overall there appears to be less difference between the labour force status estimates from the 2002 NATSISS and the adjacent census than was evident in the 1994 NATSIS. Hunter and Taylor (2001) showed that there was a substantially higher employment and unemployment rates in the earlier survey when compared to the 1996 Census. While participation rates were again higher when survey methodology was used in 2002, unemployment-to-population ratios were less different in 2002 than they were in 1994. Indeed, unemployment ratios in remote areas were not significantly different in the 2002 NATSISS and the census. This may reflect the fact that the estimates of CDEP scheme employment appeared to be accurately measured in the 2002 NATSISS, and consequently there was less scope for mis-classification of CDEP scheme employment as unemployment.

Benchmarking these variables showed a number of differences that are unlikely to have been explained by sampling error alone. Although it is only possible to speculate, these differences could have been caused by:

  • the question sequencing in the two data collections

  • the use of face-to-face interviewing in the 2002 NATSISS as opposed to self-reporting in the census

  • real changes in the year or so between the collections, or

  • differences   caused   by   the   timing   of   the   census   and   the   survey (i.e. seasonality).

These can all be classified, to a certain extent, as non-sampling error (either in the census or the 2002 NATSISS). Although the differences were mostly small, the remainder of this section looks at three areas where non-sampling error is perhaps a bigger issue.

Non-sampling errors

Non-sampling errors can be defined as errors arising during the course of survey activities rather than resulting from the sampling procedure. [5] Unlike sampling errors, there is no simple and direct method of estimating the size of non-sampling errors. In most surveys, it is not practical to measure the possible effect on the statistics of the various potential sources of error arising from things other than the statistical sample. However, there has been a considerable amount of research on the kinds of errors that are likely to arise in different kinds of surveys. By examining the procedures and operations of a specific survey, experienced survey analysts may be able to assess its quality. Rarely will this produce actual error ranges, as for sampling errors. In most cases, the analyst can only state that, for example, the errors are probably relatively small and will not affect most conclusions drawn from the survey, or that the errors may be fairly large and inferences are to be made with caution. In rare instances, researchers may be able to say with some confidence in what direction the error might be.

Non-sampling errors can be classified into two groups: random errors whose effects approximately cancel out if fairly large samples are used; and biases which tend to create errors in the same direction and thus cumulate over the entire sample. With large samples, systematic errors, and resultant biases, are the principal causes for concern about the quality of a survey. For example, if there is an error in the questionnaire design, this could cause problems with the respondent’s answers, which in turn, can create processing errors, etc. These types of errors often lead to a bias in the final results and analyses. In contrast to sampling variance and random non-sampling error, bias caused by systematic non-sampling errors cannot be reduced by increasing the sample size.

Non-sampling errors can occur because of problems in coverage, response, non-response, data processing, estimation and analysis. The following discussion is adapted from an excellent exposition on the subject from the Statistics Canada web site (see the non-sampling error section of Statistics Canada 2005).

An error in coverage occurs when there is an omission, duplication or wrongful inclusion of the units in the population or sample. Omissions are referred to as under-coverage, while duplication and wrongful inclusions are called over-coverage. These errors are caused by defects in the survey frame: inaccuracy, incompleteness, duplication, inadequacy and obsolescence. There may be errors in sample selection, or part of the population may be omitted from the sampling frame, or weights to compensate for disproportionate sampling rates may be omitted. Coverage errors may also occur in field procedures (e.g. if a survey is conducted but the interviewer misses several households or people).

Response errors result from data that have been requested, provided, received or recorded incorrectly. The response errors may occur because of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process. Subject matter experts are often in a good position to identify flaws in such aspects of the survey.

Poor questionnaire design is a common aspect of non-sampling error. It is essential that sample survey or census questions are worded carefully in order to avoid introducing bias. If questions are misleading or confusing, the responses may end up being distorted.

As alluded to above, an interviewer and facilitators can influence how a respondent answers the survey questions. This may occur when the interviewer is too friendly or aloof or prompts the respondent. To prevent this, interviewers must be trained to remain neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact on the respondent’s answer.

Respondents can also provide incorrect answers by their own volition. Faulty recollections (recall bias), tendencies to exaggerate or underplay events, and inclinations to give answers that are more ‘socially desirable’, are several reasons why a respondent may provide a false answer. Individuals may conceal the truth out of fear or suspicion of the survey process and the institutions sponsoring it (i.e. governments and their agencies). Other respondent errors may arise through a failure to understand the underlying concepts or a basic lack of knowledge about the information requested.

Non-sampling errors can also arise from the survey process. Using proxy responses (taking answers from someone other than the respondent) or a lack of control over the survey procedures are just two ways of increasing the possibility of response errors. Processing errors sometimes emerge during the preparation of the final data files. For example, errors can occur while data are being coded, captured, edited or imputed. Coder bias is usually a result of poor training or incomplete instructions, variance in coder performance (e.g. tiredness or illness), data entry errors, or machine malfunction (some processing errors are caused by errors in computer programs). Sometimes, errors are incorrectly identified during the editing phase. Even when errors are discovered, they can be corrected improperly because of poor imputation procedures.

Non-response errors—another category of non-sampling error—can also result from having not obtained sufficient answers to survey questions. Complete non-response errors occur when the survey fails to measure some of the units in the selected sample. Reasons for this type of error may be that the respondent is unavailable or temporarily absent; the respondent is unable or refuses to participate in the survey; or the dwelling is vacant. If a significant number of people do not respond to a survey, the results may be biased, since the characteristics of the non-respondents may differ from those who have participated. Given the high rates of mobility among Indigenous people, it is difficult to discount these issues in the 2002 NATSISS (and indeed other surveys involving mobile populations, Hunter & Smith 2002). Taylor and Kinfu’s chapter on mobility will explore these issues in some detail.

Researchers and policy makers need to make themselves familiar with the discussion of non-sampling error in ABS (2004c: 56–7). The discussion on imputation issues for 2002 NATSISS in ABS (2004c: 61–2) is also relevant: analysts should pay careful attention to the list of variables where imputed data was used. To summarise that discussion, a small amount of missing data was imputed for a range of educational and training variables due to errors in the CAI instrument. While the ABS has been admirably candid about non-sampling errors in their publications, it is necessary to explore a few specific examples, to assist researchers and policy-makers in interpreting 2002 NATSISS data.

Possible non-sampling issues for selected variables in 2002 NATSISS

This section looks at three sets of variables in the 2002 NATSISS for which the possibility of non-sample error is likely to be pronounced. By non-sample error, we mean aspects of the survey methodology that are likely to result in differences between results from the 2002 NATSISS and other published data. The variables examined are: hunting, gathering and fishing; substance use and high-risk drinking; and education variables. This is not an exhaustive list of variables that are potentially problematic—rather, the discussion provides several examples of some of the issues users need to be on top of when interpreting 2002 NATSISS data.

Hunting, gathering and fishing can be an important part of the social life of a person’s community, as well as their wider social relations. Although this may be true for many non-Indigenous Australians (especially in rural areas), it is particularly true for Indigenous Australians. However, for many Indigenous Australians—especially in remote areas—hunting, gathering and fishing also provide an important supplement to cash income and are hence an important part of their economic life. This is also true, although perhaps to a lesser extent, in non-remote areas (Altman, Gray & Halasz 2005).

Altman, Buchanan and Biddle’s chapter on the customary sector illustrate that the information on the customary economy in the 1994 NATSIS was potentially biased by the structure of the questionnaire. Uncertainty about the definition of voluntary work meant that many people who may have engaged in hunting, fishing and gathering probably did not indicate this—for example, in Kununurra the incidence of both variables was zero, an implausible finding. That is, there was a strong geographic correlation between the incidence of voluntary work and hunting, fishing and gathering. This is probably driven by the response to the voluntary work question, as the anthropological evidence points to the existence of substantial hunting and gathering by the Indigenous people in the area. Consequently, the 1994 measures of hunting, fishing and gathering tend to understate the extent of the customary economy.

The way in which information on hunting, gathering and fishing was collected in the 2002 survey also makes it difficult to obtain a meaningful measure of the extent to which individuals participate in such activities. In 2002, the question was asked as part of a group cultural activity, and refers to ‘fishing or hunting in a group’. In the 1994 NATSIS, the questions were asked as part of the category of voluntary work. However, an analysis of the Wallis Lake Catchment (which covers non-remote areas of Forster/Tuncurry) showed that local Indigenous people did not necessarily see hunting, gathering and fishing as being either voluntary work or a group cultural activity (Altman, Gray & Halasz 2005).

The ABS was probably right not to include hunting, gathering and fishing as a question incorporated under voluntary work, but collecting the information as a group cultural activity is not an adequate alternative. Needless to say, given that different questions were asked in 1994 and 2002, and the fact that both were problematic, the change in hunting, fishing and gathering between the two surveys is uninformative, if not meaningless.

The scope for non-sampling error is also apparent in the substance use questions. In NCAs, these questions were asked via a voluntary self-enumerated form. Pilot testing by the ABS concluded that due to English literacy problems in CAs, better information could have been obtained by asking respondents to respond verbally to questions asked by the interviewer. However, the very low prevalence of substance use in CAs led the ABS to conclude that the information obtained was not reliable and, as such, information is only available for NCAs.

The prevalence of high-risk drinking in 2002 NATSISS is substantially lower than was found in both the 2001 NHS (conducted by the ABS) and the 1994 NDSHS, Urban Aboriginal & TSI Peoples Supplement. The NDSHS had 3000 confidential interviews from metropolitan and other urban areas.

The relatively low prevalence of alcohol abuse identified in the chapter by Brady and Chikritzhs may have been caused by the different survey methodology (e.g. populations sampled, sample size, survey method, alcohol questions). Unlike the NDSHS, the alcohol questions in the 2002 NATSISS were neither confidential nor self-completed. Although a one-to-one interview was invited, respondents often answered questions in the presence of other family members and may have been reluctant to give accurate estimates. That is, asking other members to leave the room is an implication in itself. Another issue may be recall problems when questions refer to a longer time period (e.g. 12 months rather the last two weeks).

The important issue arising from the discussion of substance abuse is that non-sampling error is likely to be particularly important when asking sensitive questions about embarrassing issues or illicit activities

There were sequencing errors in the section of the NCA questionnaire on education, and the ABS needed to impute values for these variables based on the responses to other questions and information from the 2001 Census. The first sequencing error affected the 733 respondents aged 20–24 who were not studying full-time. These individuals were not asked whether they were currently studying, nor were they asked the type of education institute they were attending. Those individuals in that age group who were studying part-time would therefore not have been recorded as such. Using data on Abstudy receipts and distributional information from the census, the ABS coded 4 per cent of this sample with missing information as studying part-time.

An additional sequencing error occurred with the 1399 respondents who had used employment support services in the last 12 months. These individuals were sequenced past four questions on vocational training. According to the ABS, imputation for the five vocational training questions was conducted using ‘donor records’ (ABS 2005c: 19), where information from another person was matched to records based on sex, age and labour force status. These donor records were derived from other individuals in the sample. The ABS should indicate which categories of individuals were used to impute records so that analysts can assess whether they can discount the possibility of any significant bias arising from the imputation procedure.

In general, it would be useful for the ABS to publish the exact imputation techniques for both of these problems, so other researchers using the CURF can test the sensitivity of their results. It may be the case that the methodology used by the ABS is as robust as possible for most purposes, but certain applications or modelling frameworks may require a different technique.

A further issue with regard to the education questions is that we can see a difference that cannot be explained by sampling error alone. This is clear when comparing the data on the proportion of people with non-school qualifications in the 2001 Census with the results from the 2002 NATSISS. This can be shown in Figure 4.5, where once again the ‘whiskers’ refer to the 95 per cent confidence interval.

Figure 4.5. Non-school qualifications, 2001 and 2002

Non-school qualifications, 2001 and 2002

This discrepancy between the 2002 NATSISS and the 2001 Census data is too great to be explained by the year or so gap in between the two data collections. The difference has, however, been noted in ABS (2005c) and is explained as a consequence of census reporting being undertaken by people on behalf of those being enumerated. That is, the non-sampling error in this comparison quite possibly arises from the census methodology.