Appendix C: Statistical methods

Comparisons and tests of statistical significance

This report includes statistical tests of the significance of comparisons of rates between population groups. Any statistical comparison applied to 1 variable must take account of any other potentially relevant variables. For example, any comparison of participation by state must also take account of differences in the distribution of age and sex between the states. These other variables are known as ‘confounding’ variables.

Crude rates

A ‘crude rate’ is defined as the number of events over a specified period of time (for example, a year) divided by the total population. (For example, a crude cancer incidence rate is defined as the number of new cases of cancer in a specified period of time, divided by the population at risk.) Crude mortality rates and cancer incidence rates are expressed in this report as number of deaths or new cases per 100,000 population. ‘Crude participation rate’ is expressed as a percentage.

Age specific rates

Age specific rates provide information on the incidence of a particular event in an age group, relative to the total number of people at risk of that event in the same age group. They are calculated by dividing the number of events occurring in each specified age group by the corresponding ‘at risk’ population in the same age group, and then multiplying the result by a constant (for example, 100,000) to derive the rate. Age specific rates are often expressed per 100,000 population.

Age standardised rates

A crude rate provides information on the number of, for example, new cases of cancer or deaths from cancer in the population at risk in a specified period. No age adjustments are made when calculating a crude rate. Since the risk of cancer is heavily dependent on age, crude rates are not suitable for looking at trends or making comparisons across groups in cancer incidence and mortality.

More meaningful comparisons can be made by using age standardised rates, with such rates adjusted for age in order to facilitate comparisons between populations that have different age structures – for example, between Aboriginal and Torres Strait Islander people and other Australians. This standardisation process effectively removes the influence of age structure on the summary rate.

Two methods are commonly used to adjust for age: direct and indirect standardisation.

In this report, the direct standardisation approach presented by Jensen and others (1991) is used. To age standardise using the direct method, the first step is to obtain population numbers and numbers of cases (or deaths) in age ranges – typically 5 year age ranges.

The next step is to multiply the age specific population numbers for the standard population (in this case, the Australian population at 30 June 2001) by the age specific incidence rates (or death rates) for the population of interest (such as those in a certain socioeconomic group or those who lived in Major cities). The next step is to sum across the age groups and divide this sum by the total of the standard population to give an age standardised rate for the population of interest. Finally, this is expressed per 10,000 or 100,000, as appropriate.

Confidence intervals

Population numbers for incidence and mortality and screening have a natural level of variability for a single year above and below what might be expected in the mean over many years. The percentage variability is small for large population numbers but high for small numbers such as mortality in a young age group. One measure of the likely difference is that of standard error, which indicates the extent to which a population number might have varied by chance in only 1 year of data. In the 95% confidence interval, there are around 19 chances in 20 that the difference will be less than 2 standard errors.

There are several methods for calculating confidence intervals. The 95% confidence intervals (CIs) in this report were calculated using a method developed by Dobson and others (1991). This method calculates approximate confidence intervals for a weighted sum of Poisson parameters.

Interpretation of confidence intervals

Some indicators have a 95% confidence interval presented along with the rates. This is because the observed value of a rate may vary due to chance, even where there is no variation in the underlying value of the rate. The 95% confidence interval represents a range (interval) over which variation in the observed rate is consistent with this chance variation. In other words, there is a 95% chance that the true value of the rate is somewhere within this range.

These confidence intervals can be used as a guide to whether differences in a particular rate are consistent with chance variation. Where the confidence intervals do not overlap, the difference between rates is greater than that which could be explained by chance and is regarded as statistically significant.

It is important to note that the overlapping of confidence intervals does not imply that the difference between 2 rates is definitely due to chance. Instead, an overlapping confidence interval represents a difference in rates that is too small to allow differentiation between a real difference and one that is due to chance variation. It can therefore only be stated that no statistically significant differences were found, and not that no differences exist.

The approximate comparisons presented might understate the statistical significance of some differences, but they are sufficiently accurate for the purposes of this report.

As with all statistical comparisons, care should be exercised in interpreting the results of the comparison. If 2 rates are statistically significantly different from each other, this means that the difference is unlikely to have arisen by chance. Judgement should, however, be exercised in deciding whether the difference is of any clinical significance.