## Methods

### Age-standardised rates (ASR)

A crude rate provides information on the number of, for example, new cases of cancer or deaths from cancer by the population at risk in a specified period. No age adjustments are made when calculating a crude rate. Since the risk of cancer heavily depends on age, crude rates are not suitable for looking at trends or making comparisons across groups in cancer incidence and mortality.

More meaningful comparisons can be made by using ASRs, with such rates adjusted for age in order to facilitate comparisons between populations that have different age structures—for example, between Indigenous Australians and other Australians. This standardisation process effectively removes the influence of age structure on the summary rate.

There are two methods commonly used to adjust for age: direct and indirect standardisation. In this report, the direct standardisation approach presented by Jensen and colleagues (1991) is used. To age-standardise using the direct method, the first step is to obtain population numbers and numbers of cases (or deaths) in age ranges—typically 5-year age ranges. The next step is to multiply the age-specific population numbers for the standard population (in this case, the Australian population as at 30 June 2001) by the age-specific incidence rates (or death rates) for the population of interest. The next step is to sum across the age groups and divide this sum by the total of the standard population to give an ASR for the population of interest. Finally, this is expressed per 10,000 or 100,000 population, as appropriate.

### Age-specific rates

Age-specific rates provide information on the incidence of a particular event in an age group relative to the total number of people at risk of that event in the same age group. It is calculated by dividing the number of events occurring in each specified age group by the corresponding ‘at-risk’ population in the same age group and then multiplying the result by a constant (for example, 100,000) to derive the rate. Age-specific rates are often expressed per 100,000 population.

### Australian Cancer Database

All forms of cancer, except basal and squamous cell carcinomas of the skin, are notifiable diseases in each Australian state and territory. This means there is legislation in each jurisdiction that requires hospitals, pathology laboratories and various other institutions to report all cases of cancer to their central cancer registry. An agreed subset of the data collected by these cancer registries is supplied annually to the AIHW, where it is compiled into the ACD. The ACD currently contains data on all cases of cancer diagnosed from 1982 to 2015 for all states and territories with the exception of 2015 New South Wales data.

Cancer reporting and registration is a dynamic process, and records in the state and territory cancer registries may be modified if new information is received. As a result, the number of cancer cases reported by the AIHW for any particular year may change slightly over time and may not always align with state and territory reporting for that same year.

For more information on the ACD please see the ACD 2014 Data Quality Statement.

### Enhancements and other events impacting incidence data

The development of the new NSW Cancer Registries system has resulted in a delay in processing incidence data; therefore, the most recent NSW data available for inclusion in the ACD are for 2014. Hence, the 2015 NSW incidence data were estimated by the AIHW (see the next subsection for detail of procedure). These estimates were combined with the actual data supplied by the other seven state and territory cancer registries to form the 2015 ACD.

### Estimating 2015 cancer incidence for NSW, excluding prostate cancer

With the exception of prostate cancer (which is explained in the following subsection), cancer incidence for NSW in 2015 was estimated by projecting the sex- and age-specific incidence rates observed in NSW during 2005–2014. The time series were stratified by the following variables:

- sex
- 5-year age group (0–4, …, 80–84, 85+)
- 4-character ICD-O-3 topography code (C00.0, …, C80.9)
- 4-digit ICD-O-3.1 histology code (8000, …, 9992).

For each time series, the process was as described below:

- if any of the rates in the series was zero, the mean of the 10 rates was used as the estimate of the 2015 rate
- if none of the rates was zero, least squares linear regression was used to find the straight line of best fit through the time series
- a 5% level of significance was used to test the hypothesis that the slope of the line was different from zero
- If the slope was not statistically significantly different from zero, the mean of the 10 rates was used as the estimate of the 2015 rate
- if the slope was positive, the straight line of best fit was extrapolated to obtain the estimate of the 2015 rate
- if the slope was negative, the time series was fitted with a log-linear model (that is, the logs of the rates were fitted with a straight line), the line was extrapolated one year ahead to give an estimate of log(rate) for 2015 and this was converted to an estimate of the rate for 2015
- the estimated incidence rates for 2015 were then multiplied by the Estimated Resident Populations for 2015 to obtain the estimated incidence numbers.

### Estimating 2015 prostate cancer incidence for NSW

Due to the effect of PSA testing, prostate cancer incidence rates have fluctuated considerably over time, making the methodology described in the previous subsection unreliable for estimating the incidence of prostate cancer. Instead, the estimates of 2015 prostate cancer incidence for NSW were based on the relationship between the age-specific incidence rates in NSW and those in the other seven states and territories combined. These combined jurisdictions will be referred to as the single jurisdiction OTH in what follows (OTH for ‘other’). The general procedure is as follows.

For a given age group, for each year between 2005 and 2014 divide the age-specific incidence rate in NSW by the age-specific incidence rate in OTH. Use the average of these ten ratios as the estimated ratio for 2015. Multiply the estimated ratio for 2015 by the actual age-specific incidence rate in OTH for 2015. This gives the estimated age-specific incidence rate for 2015 for NSW, which can then be converted to a count by multiplying by the relevant population.

The procedure described in the previous paragraph breaks down if any of the ten incidence rates in OTH is zero. This happens to occur for each age group 0–4 to 30–34. In these cases, calculate the age-specific incidence rates in NSW and OTH for the ten years 2005–2014 combined instead of separately. Divide the NSW rate by the OTH rate to obtain the estimate of the 2015 ratio and then proceed as above.

The procedure described in the previous paragraph breaks down if all of the ten incidence rates in OTH are zero. This happens to occur for age group 5–9. In this case, calculate the age-specific incidence rate in NSW for the ten years 2005–2014 combined and use it as the estimate for the age-specific incidence rate for 2015. Then proceed as above.

### International Classification of Diseases for Oncology (ICDO)

Cancers were originally classified solely under the ICD classification system, based on topographic site and behaviour. However, during the creation of the Ninth Revision of the ICD in the late 1960s, working parties suggested creating a separate classification for cancers that included improved morphological information. The first edition of the ICD-O was subsequently released in 1976 and, in this classification, cancers were coded by both morphology (histology type and behaviour) and topography (site).

Since the First Edition of the ICD-O, a number of revisions have been made, mainly in the area of lymphoma and leukaemia. The current edition, the Third Edition (ICD-O-3), was released in 2000 and is used by most state and territory cancer registries in Australia, as well as by the AIHW in regard to the ACD.

### National Mortality Database

The AIHW National Mortality Database (NMD) contains information provided by the Registries of Births, Deaths and Marriages and the National Coronial Information System—and coded by the ABS—for deaths from 1964 to 2016. Registration of deaths is the responsibility of each state and territory Registry of Births, Deaths and Marriages. These data are then collated and coded by the ABS and are maintained at the AIHW in the NMD.

In the NMD, both the year in which the death occurred and the year in which it was registered are provided. For the purposes of this report, actual mortality data are shown based on the year the death occurred, except for the most recent year (namely 2016) where the number of people whose death was registered is used. Previous investigation has shown that the year of death and its registration coincide for the most part. However, in some instances, deaths at the end of each calendar year may not be registered until the following year. Thus, year of death information for the latest available year is generally an underestimate of the actual number of deaths that occurred in that year.

In this report, deaths registered in 2014 and earlier are based on the final version of cause of death data; deaths registered in 2015 and 2016 are based on revised and preliminary versions, respectively, and are subject to further revision by the ABS.

The data quality statements underpinning the AIHW NMD can be found on abs.gov.au:

- ABS quality declaration summary for Deaths, Australia (ABS cat. no. 3302.0)
- ABS quality declaration summary for Causes of death, Australia (ABS cat. no. 3303.0).

For more information on the AIHW NMD see Deaths data at AIHW.

### Population Data

Throughout this report, population data were used to derive rates of, for example, cancer incidence and mortality. The population data were sourced from the ABS using the most up-to-date estimates available at the time of analysis.

To derive its estimates of the resident populations, the ABS uses the 5-yearly Census of Population and Housing data and adjusts it as described here:

- All respondents in the Census are placed in their state or territory, Statistical Local Area and postcode of usual residence; overseas visitors are excluded.
- An adjustment is made for persons missed in the Census.
- Australians temporarily overseas on Census night are added to the usual residence Census count.

Estimated resident populations are then updated each year from the Census data, using indicators of population change, such as births, deaths and net migration. More information is available from __abs.gov.au__.

### Prevalence

Limited-duration prevalence is expressed as *N-year prevalence *throughout this report. *N-year prevalence *on a given index date—where *N *is any number 1, 2, 3 and so on—is defined as the number of people alive at the end of that day who had been diagnosed with cancer in the past *N* years. For example:

- 1-year prevalence is the number of living people who were diagnosed in the past year to 31 December 2015
- 5-year prevalence is the number of living people who were diagnosed in the past 5 years to 31 December 2015. This includes the people defined by 1-year prevalence.

Note that prevalence is measured by the number of people diagnosed with cancer, not the number of cancer cases. An individual who was diagnosed with two separate cancers will contribute separately to the prevalence of each cancer. However, this individual will contribute only once to prevalence of all cancers combined. For this reason, the sum of prevalence for individual cancers will not equal the prevalence of all cancers combined.

Differences in limited-duration prevalence are presented according to age in the report. Note that while age for survival and incidence statistics refers to the age at diagnosis, prevalence age refers to the age at the point in time from which prevalence was calculated, or 31 December 2015 in this report. Therefore, a person diagnosed with cancer in 1982 when they turned 50 that year would be counted as age 83 in the prevalence statistics (as at the end of 2015).

### Projections—Estimating the incidence of cancer, excluding prostate cancer

Estimates of national incidence in 2016–2021 were calculated using the same approach as discussed in the above section “Estimating 2015 cancer incidence for NSW”. Note the following:

- estimates were made for Australia as a whole, not for individual jurisdictions
- instead of using the topography and histology codes to define the cancer groups, ICD-10 codes were used (for example breast or melanoma of the skin as well as groupings such as head and neck cancers which is a consolidation of cancers of the lip, tongue, mouth, salivary glands, oropharynx, nasopharynx, hypopharynx and other sites in the pharanyx).
- the incidence estimates already made for 2015 for NSW were treated as real data for the purposes of estimating Australian incidence for 2016–2021
- the 10 years of incidence data used as the baseline were 2006–2015
- for populations, the ABS Estimated Resident Populations were used for 2006–2017, and the ABS population projection series 29(B) for 2018–2021 (ABS 2013).

### Projections—Estimating the incidence of prostate cancer

MBS item 66655 (PSA test) enables testing activity for prostate cancer to be quantified. At the time this analysis was undertaken, the number of services of item 66655 was available up to the end of 2017. It has been noted previously that there is a positive correlation between the number of services of item 66655 in a given year and the incidence of prostate cancer in the following year (AIHW & AACR 2012). This relationship is employed in the following explanation of how the estimates of prostate cancer incidence for 2016–2018 were derived. The data used were as follows:

- year: 2002, …, 2017
- MBS age group: 0–4, then 10-year age groups 5–14, …, 75–84, and 85+
- prostate cancer incidence: number of cases of prostate cancer, 2003–2015
- PSA tests: number of services of item 66655 in 2002–2017 (Medicare Australia). By hypothesis, these data are correlated with incidence in 2003–2018.

The number of cases and number of tests were converted to case rates and claim rates by dividing by the relevant populations. A combination of visual data exploration and linear or log-linear regression was used to model the case rate as a function of the claim rate and/or year. Estimated case rates for 2016–2018 were obtained by applying the model to the PSA data for 2015–2017. Estimated incidence counts for 2016–2018 were obtained by multiplying the case rates by the relevant populations.

The final step was to convert the estimated incidence counts for the 10-year MBS age groups (5–14, …, 75–84) to 5-year age groups, consistent with incidence data. For a given 10-year age group the ‘younger age group’ is defined to be the 5-year age group consisting of the first five years of the range and the ‘older age group’ is defined as the other five years. The data used in this step were as follows:

- year: 2003, …, 2018
- 10-year age group: 5–14, 15–19, …, 75–84
- number of cases in each 10-year age group, including estimates for 2016–2018
- number of cases in each younger age group, 5–9, 15–19, …, 75–79 for 2003–2015.

Linear regression was used to model the number of cases in the younger age group as a function of year and number of cases in the 10-year age group. The estimated number of cases in the younger age group for 2016–2018 was obtained by applying the model to those years. The estimated number of cases in the older age group was obtained by subtracting the estimate for the younger age group from the estimate for the 10-year age group.

At this point, there were incidence estimates available for each 5-year age group for each year from 2016 to 2018. Estimates for 2019 could not be obtained by the same method as there were no PSA data for 2018 available at the time of the analysis. Projections for 2019–2021 were carried out using the methodology that was used for all other cancers except that 2009–2018 was used as the baseline instead of 2006–2015.

### Projections—Estimating the mortality of cancer

This method is the same as the incidence projections with the exceptions that:

- the 10-year baseline for incidence is 2006–2015 while the baseline for mortality is 2007–2016.
- NSW 2015 data is obtained from the NMD and is not estimated

### Relative survival

Relative survival is a measure of the survival of people with cancer compared with that of the general population. It is the standard approach used by cancer registries to produce population-level survival statistics and is commonly used as it does not require information on cause of death. Relative survival reflects the net survival (or excess mortality) associated with cancer by adjusting the survival experience of those with cancer for the underlying mortality that they would have experienced in the general population.

Relative survival is calculated by dividing observed survival by expected survival, where the numerator and denominator have been matched for age, sex and calendar year.

Observed survival refers to the proportion of people alive for a given amount of time after a diagnosis of cancer; it is calculated from population-based cancer data. Expected survival refers to the proportion of people in the general population alive for a given amount of time and is calculated from life tables of the entire Australian population. (Ideally these life tables should be restricted to the population of Australians who do not have cancer but such life tables are unavailable. It is standard practice around the world to use life tables for the entire population.)

A simplified example of how relative survival is interpreted is shown in Figure G1. Given that 6 in 10 people with cancer are alive 5 years after their diagnosis (observed survival of 0.6) and that 9 in 10 people from the general population are alive after the same 5 years (expected survival of 0.9), the relative survival of people with cancer would be calculated as 0.6 divided by 0.9, which is 0.67. This means that individuals with cancer are 67% as likely to be alive for at least 5 years after their diagnosis as are their counterparts in the general population.

#### Figure 1: How relative survival is calculated

The survival statistics in this report were produced using a modified version of a SAS program written by Dickman (2004) and employed the period method (Brenner and Gefeller 1996) with 1-year intervals. Observed survival was calculated from data in the ACD. Expected survival was calculated using the Ederer II method whereby matched people in the general population are considered to be at risk of death until the corresponding cancer patient dies or is censored (Ederer and Heise 1959).

#### Calculation of conditional relative survival

Conditional survival is the probability of surviving *j *more days, given that an individual has already survived *i *days. It was calculated using the formula:

where

*S*(*j*|*i*) is the probability of surviving at least *j *more days given that the person has already survived at least *i *days

*S*(*i *+* j*)is the probability of surviving at least *i *+ *j* days

*S*(*i*) is the probability of surviving at least *i *days.

Confidence intervals for conditional survival were calculated using a variation of Greenwood’s (1926) formula for variance (Skuladottir & Olsen 2003):

where

*d _{k}* is the number of deaths

*r _{k}* is the number at risk during the

*k*th interval.

The 95% confidence intervals were constructed assuming that conditional survival estimates follow a normal distribution.

### Risk to age 75 or 85

The calculations of risk shown in this report are measures that approximate the risk of developing (or dying from) cancer before the age of 75 or 85, assuming that the risks at the time of estimation remained throughout life. It is based on a mathematical relationship with the cumulative rate.

The cumulative rate is calculated by summing the age-specific rates for all specific age groups:

The factor of 5 is used to indicate the 5 years of life in each age group and the factor of 100 is used to present the result as a percentage. As age-specific rates are presented per 100,000 population, the result is divided by 100,000 to return the age-specific rates to a division of cases by population. Cumulative risk is related to cumulative rate by the expression:

where the cumulative rate is expressed as a percentage.

The risk is expressed as a ‘1 in *n*’ proportion by taking the inverse of the above formula:

For example, if *n* equals 3, the risk of a person in the general population being diagnosed with cancer before the age of 75 (or 85) is 1 in 3. Note that these figures are average risks for the total Australian population. An individual person’s risk may be higher or lower than the estimated figures, depending on their particular risk factors.