Methods

Age-adjusted survival

Comparisons of cancer survival rates over time may be affected by changes in the age composition of those diagnosed. For example, if more older people are diagnosed with cancer over time and older people have lower survival rates, improvements in survival over time may be offset by the increasingly older age of people diagnosed with cancer.

In order to calculate age-adjusted survival we first choose a fixed period, called the base period, and take note of the age composition of the people who were diagnosed with the cancer of interest during that period. We calculate age-adjusted survival for other periods by assuming that the age composition of patients in the other period is the same as that of the base period. Thus the age-adjusted survival is effectively the survival that would have occurred had there been no change in age composition from the base period.

Age-adjusted survival is different to age-standardised survival. An age-standardised rate uses the same standard population for all cancers and sexes (including persons). Using a standard population allows meaningful comparisons between different cancers, sexes and across time. In contrast, age-adjusted survival rates use a population relevant to the specific cancer (or cancer group) and sex to allow meaningful comparisons across time. Age-adjusted survival rates are only intended to enhance the understanding of how survival has changed over time for the specific cancer and sex and are not directly comparable with other cancers or sexes.

CdiA does not currently report on age-standardised survival but future releases are expected to contain age-standardised survival rates.

Age-standardised rates (ASR)

A crude rate provides information on the number of, for example, new cases of cancer or deaths from cancer by the population at risk in a specified period. No age adjustments are made when calculating a crude rate. Since the risk of cancer heavily depends on age, crude cancer incidence and mortality rates are not as suitable for looking at changes over time or making comparisons between different population groups if there are differences in those populations’ age structures.

More meaningful comparisons can be made using ASRs, with such rates adjusted for age in order to facilitate comparisons between populations that have different age structures—for example, between Indigenous Australians and other Australians. This standardisation process effectively removes the influence of age structure on the summary rate.

There are two methods commonly used to adjust for age: direct and indirect standardisation. In this report, the direct standardisation approach presented by Jensen and colleagues (1991) is used. To age-standardise using the direct method, the first step is to obtain population numbers and numbers of cases (or deaths) in age ranges—typically 5-year age ranges. The next step is to multiply the age-specific population numbers for the standard population (in this case, the Australian population as at 30 June 2001) by the age-specific incidence rates (or death rates) for the population of interest. The next step is to sum across the age groups and divide this sum by the total of the standard population to give an ASR for the population of interest. Finally, this is expressed per 100,000 population in this report.

Age-specific rates

Age-specific rates provide information on the incidence of a particular event in an age group relative to the total number of people at risk of that event in the same age group. It is calculated by dividing the number of events occurring in each specified age group by the corresponding ‘at-risk’ population in the same age group and then multiplying the result by a constant (for example, 100,000) to derive the rate. Age-specific rates are often expressed per 100,000 population.

Australian Cancer Database

All forms of cancer, except basal and squamous cell carcinomas of the skin, are notifiable diseases in each Australian state and territory. This means there is legislation in each jurisdiction that requires hospitals, pathology laboratories and various other institutions to report all cases of cancer to their central cancer registry. An agreed subset of the data collected by these cancer registries is supplied annually to the AIHW, where it is compiled into the ACD. The ACD currently contains data on all cases of cancer diagnosed from 1982 to 2017 for all states and territories with the exception of 2017 Northern Territory data.

Cancer reporting and registration is a dynamic process, and records in the state and territory cancer registries may be modified if new information is received. As a result, the number of cancer cases reported by the AIHW for any particular year may change slightly over time and may not always align with state and territory reporting for that same year.

For more information on the ACD please see the ACD 2017 Data Quality Statement.

Estimating 2017 cancer incidence for the Northern Territory

Northern Territory incidence data for 2017 was not available for inclusion in the 2017 ACD. The AIHW used linear regression to model the trend in NT during the years 2007 to 2016, stratified by sex and cancer site/type. The line was extrapolated to 2017 to obtain the estimated number of cases in 2017. These cases were then allocated pro-rata to various strata on the basis of the number of cases observed in those strata in NT in the years 2012–2016. The strata were diagnosis age group by topography by histology by behaviour.

Estimating death-certificate-only cases for NSW for 2017

If a person’s death certificate states that they had cancer, in most cases the cancer registry already has other evidence of the cancer. However, in about 1.5% of cases, despite the registry’s subsequent enquiries with relevant institutions, the registry is unable to find any other evidence of the cancer. Such cases are called death-certificate-only (DCO) cases.

The New South Wales Cancer Registry was unable to submit its DCO cases for 2017 in time to be included in the 2017 ACD. The AIHW estimated the number of DCO cases for NSW for 2017 by assuming they would be the same as they were in NSW in 2016, stratified by sex, diagnosis age group, topography, histology and behaviour.

Estimating late registrations of cancer for 2017

Late registrations are cases of cancer that have not been registered by the cancer registry by the time the registry needs to submit its data to the AIHW. Almost all late registrations have a diagnosis year equal to that of the most recent year of the ACD, in this case 2017. Experience has shown that late registrations account for about 1% of cases in that year. For example, it is expected that about 1% of cases for diagnosis year 2017 are not part of the 2017 ACD; they will appear for the first time in the 2018 ACD (with a diagnosis year of 2017). The AIHW has made estimates of these cases based on the late registrations for 2016 that appeared for the first time in the 2017 ACD. Note that in the case of NT the most recent year of data is 2016, not 2017, so estimates of late registrations were made for 2016 for NT.

International Classification of Diseases for Oncology (ICDO)

Cancers were originally classified solely under the ICD classification system, based on topographic site and behaviour. However, during the creation of the Ninth Revision of the ICD in the late 1960s, working parties suggested creating a separate classification for cancers that included improved morphological information. The first edition of the ICD-O was subsequently released in 1976 and, in this classification, cancers were coded by both morphology (histology type and behaviour) and topography (site).

Since the First Edition of the ICD-O, a number of revisions have been made, mainly in the area of lymphoma and leukaemia. The current edition, the Third Edition (ICD-O-3), was released in 2000 and is used by most state and territory cancer registries in Australia, as well as by the AIHW in regard to the ACD.

National Mortality Database

The AIHW National Mortality Database (NMD) contains information provided by the Registries of Births, Deaths and Marriages and the National Coronial Information System—and coded by the ABS—for deaths from 1964 to 2019. Registration of deaths is the responsibility of each state and territory Registry of Births, Deaths and Marriages. These data are then collated and coded by the ABS and are maintained at the AIHW in the NMD.

In the NMD, both the year in which the death occurred and the year in which it was registered are provided. For the purposes of this report, actual mortality data are shown based on the year the death occurred, except for the most recent year (namely 2019) where the number of people whose death was registered is used. Previous investigation has shown that the year of death and its registration coincide for the most part. However, in some instances, deaths at the end of each calendar year may not be registered until the following year. Thus, year of death information for the latest available year is generally an underestimate of the actual number of deaths that occurred in that year.

In this report, deaths registered in 2016 and earlier are based on the final version of cause of death data; deaths registered in 2017 are based on revised versions and deaths registered in 2018 and 2019 are based on the preliminary versions. Revised and preliminary versions are subject to further revision by the ABS.

The data quality statements underpinning the AIHW NMD can be found on the following ABS internet pages:

For more information on the AIHW NMD see Deaths data at AIHW.

Population Data

Throughout this report, population data were used to derive rates of, for example, cancer incidence and mortality. The population data were sourced from the ABS using the most up-to-date estimates available at the time of analysis.

To derive its estimates of the resident populations, the ABS uses the 5-yearly Census of Population and Housing data and adjusts it as described here:

  • All respondents in the Census are placed in their state or territory, Statistical Local Area and postcode of usual residence; overseas visitors are excluded.
  • An adjustment is made for persons missed in the Census.
  • Australians temporarily overseas on Census night are added to the usual residence Census count.

Estimated resident populations are then updated each year from the Census data, using indicators of population change, such as births, deaths and net migration. More information is available from the ABS website.

Prevalence

Limited-duration prevalence is expressed as N-year prevalence throughout this report. N-year prevalence on a given index date—where N is any number 1, 2, 3 and so on—is defined as the number of people alive at the end of that day who had been diagnosed with cancer in the past N years. For example:

  • 1-year prevalence is the number of living people who were diagnosed in the past year to 31 December 2016
  • 5-year prevalence is the number of living people who were diagnosed in the past 5 years to 31 December 2016. This includes the people defined by 1-year prevalence.

Note that prevalence is measured by the number of people diagnosed with cancer, not the number of cancer cases. An individual who was diagnosed with two separate cancers will contribute separately to the prevalence of each cancer. However, this individual will contribute only once to prevalence of all cancers combined. For this reason, the sum of prevalence for individual cancers will not equal the prevalence of all cancers combined.

Projections - Estimating the incidence of cancer

Please note that no adjustments have been made to the projections to account for the potential impact of COVID.

Estimates of national incidence in 2018–2021 was estimated by projecting the sex- and age-specific incidence rates observed in Australia during 2008–2017. The time series were stratified by the following variables:

  • sex
  • 5-year age group (0–4, …, 85–89, 90+)
  • 4-character ICD-O-3 topography code (C00.0, …, C80.9)
  • 4-digit ICD-O-3.1 histology code (8000, …, 9992).

For each time series, the process was as described below:

  • least squares linear regression was used to find the straight line of best fit through the time series
  • if the slope was positive, the straight line of best fit was extrapolated to obtain the estimate of the 2017 rate
  • if the slope was negative, the time series floor was set to 0
  • the estimated incidence rates for 2017 were then multiplied by the Estimated Resident Populations for 2017 to obtain the estimated incidence numbers.

Note the following:

  • estimates were made for Australia as a whole, not for individual jurisdictions
  • instead of using the topography and histology codes to define the cancer groups, ICD-10 codes were used (for example breast or melanoma of the skin as well as groupings such as head and neck cancers which is a consolidation of cancers of the lip, tongue, mouth, salivary glands, oropharynx, nasopharynx, hypopharynx and other sites in the pharanyx).
  • The incidence estimate made for 2017 for Northern Territory were treated as real data for the purposes of estimating Australian incidence for 2018–2021.

  • the 10 years of incidence data used as the baseline were 2008–2017
  • for populations, the ABS Estimated Resident Populations were used for 2008–2019, and the ABS population projection series B for 2020–2021 (ABS 2018).

Projections - Estimating the mortality of cancer

Please note that no adjustments have been made to the projections to account for the potential impact of COVID.

This method is the same as the incidence projections with the exceptions that:

  • the 10-year baseline for incidence is 2008-2017 while the baseline for mortality is 2010-2019.
  • Northern Territory 2017 data is obtained from the NMD and is not estimated

Relative survival

Relative survival is a measure of the survival of people with cancer compared with that of the general population. It is the standard approach used by cancer registries to produce population-level survival statistics and is commonly used as it does not require information on cause of death. Relative survival reflects the net survival (or excess mortality) associated with cancer by adjusting the survival experience of those with cancer for the underlying mortality that they would have experienced in the general population.

Relative survival is calculated by dividing observed survival by expected survival, where the numerator and denominator have been matched for age, sex and calendar year.

Observed survival refers to the proportion of people alive for a given amount of time after a diagnosis of cancer; it is calculated from population-based cancer data. Expected survival refers to the proportion of people in the general population alive for a given amount of time and is calculated from life tables of the entire Australian population. (Ideally these life tables should be restricted to the population of Australians who do not have cancer but such life tables are unavailable. It is standard practice around the world to use life tables for the entire population.)

A simplified example of how relative survival is interpreted is shown in Figure G1. Given that 6 in 10 people with cancer are alive 5 years after their diagnosis (observed survival of 0.6) and that 9 in 10 people from the general population are alive after the same 5 years (expected survival of 0.9), the relative survival of people with cancer would be calculated as 0.6 divided by 0.9, which is 0.67. This means that individuals with cancer are 67% as likely to be alive for at least 5 years after their diagnosis as are their counterparts in the general population.

Figure M1: Simplified example of how relative survival is calculated

Observed survival is 6 out of 10, i.e. 0.6. Expected survival is 90 out of 100, i.e. 0.9. Therefore relative survival is 0.6 divided by 0.9, which is 0.67, or 67%2525.

The survival statistics in this report were produced using a modified version of a SAS program written by Dickman (2004) and employed the period method (Brenner and Gefeller 1996) with 1-year intervals. Observed survival was calculated from data in the ACD. Expected survival was calculated using the Ederer II method whereby matched people in the general population are considered to be at risk of death until the corresponding cancer patient dies or is censored (Ederer and Heise 1959).

Calculation of conditional relative survival

Conditional survival is the probability of surviving j more days, given that an individual has already survived i days. It was calculated using the formula:

formula S of j given i equals S of i plus j divided by S of i
where

S(j|i) is the probability of surviving at least j more days given that the person has already survived at least i days

S(i + j) is the probability of surviving at least i + j days

S (i) is the probability of surviving at least i days

Confidence intervals for conditional survival were calculated using a variation of Greenwood's (1926) formula for variance (Skuladottir & Olsen 2003):

Cumulative rate equals 5 times the sum of the age-specific rates times 100 divided by 100,000. Cumulative risk equals 1 minus e to the minus cumulative rate divided by 100. n equals 1 divided by the cumulative risk.
where

dk is the number of deaths

rk is the number at risk during the kth interval.

The 95% confidence intervals were constructed assuming that conditional survival estimates follow a normal distribution.

Risk

We use 19 age groups, numbered 1 to 19. Age group i (= 1 to 18) is 5 years wide and comprises all ages in the interval (5i - 55i). Age group 19 comprises all ages 90 and above. The cancer under consideration is referred to as “the cancer”. This could be a specific cancer, a group of related cancers or all cancers combined. There are two different measures of risk, one adjusted for competing mortality and one not adjusted. For brevity, these are called the adjusted risk (AR) and unadjusted risk (UR). The full notation is as follows, where D is for diagnosis and M is for mortality.

 

ARD(5i) = adjusted risk of being diagnosed with the cancer before age 5i (i = 1 to 18),

ARD(∞) = adjusted lifetime risk of being diagnosed with the cancer,

ARM(5i) = adjusted risk of dying from the cancer before age 5i (i = 1 to 18),

ARM(∞) = adjusted lifetime risk of dying from the cancer,

 

and similarly for URD and URM.

 

For each age group i, the following three rates are used in the risk formulas.

Di = rate of first ever diagnosis of the cancer (the first in one's life, not the first in age group i) ,

Mi = rate of death from the cancer ,

Ai = rate of death from all causes (including the cancer) ,

 

Note that the denominator of Di is the general population, not the population of people who have never been diagnosed with the cancer.

Risk not adjusted for competing mortality

As this measure of risk is not adjusted for competing mortality, the formulas are relatively simple and do not involve Ai. The formulas come from Day (1987).

 

URD(5i) =   One minus e to the power of negative 5 times (the sum of D1 to Di). ,      i = 1, 2, ..., 18

URD(∞) = 1.

 

URM(5i) =  One minus e to the power of negative 5 times (the sum of M1 to Mi).,     i = 1, 2, ..., 18

URM(∞) = 1.

 

Note that the lifetime risks are necessarily 1. Not adjusting for competing mortality is equivalent to the scenario where it is impossible to die of any cause other than the cancer. Hence every person must eventually be diagnosed with the cancer and eventually die from it. This is why it is not informative to report unadjusted lifetime risks.

Risk adjusted for competing mortality

The formulas in this section come from Fay et al. (2003). The risk of diagnosis is as follows.

There are 3 equations. Equation 1 is ARD of 5 = (D1 divided by A1) times 1 minus e to the power of negative 5 A1. Equation 2 is ARD of 5i = ARD of (5i-5) + (Di divided by Ai) times (1 minus e to the power of negative 5 Ai) times e to the negative five times (the sum of A1 to Ai-1). Equation 3 is ARD of infinity equals ARD of 90 plus (D19 divided by A19) times e to the power of negative five times (the sum of A1 to A18).

The formula for risk of death is the same as above except that Mi replaces Di throughout.

There are 3 equations. Equation 1 is ARM of 5 = (M1 divided by A1) times 1 minus e to the power of negative 5 A1. Equation 2 is ARM of 5i = ARM of (5i-5) + (Mi divided by Ai) times (1 minus e to the power of negative 5 Ai) times e to the negative five times (the sum of A1 to Ai-1). Equation 3 is ARM of infinity equals ARM of 90 plus (M19 divided by A19) times e to the power of negative five times (the sum of A1 to A18).

Use of a proxy to calculate risk of diagnosis

In order to calculate the risk of diagnosis we need the age-specific rates, Di, at which people are being diagnosed with the cancer for the first time in their lives. This requires knowledge of each person’s cancer history from birth. As the Australian Cancer Database (ACD) starts from the beginning of 1982, this is impossible for most age groups and will remain impossible for many decades to come. In order to estimate the risk of diagnosis we need a satisfactory proxy for Di.

The best available estimate of Di is obtained by using the entire history of the ACD. That is, instead of counting first ever diagnoses (which is impossible) we count “first from 1/1/1982” diagnoses. However, using such an estimate would mean that we couldn’t produce a consistent time series of risks. This is because each estimate in the time series would be based on a different amount of “lookback time” for previous diagnoses. The estimate in 1982 would be based on at most one year of lookback time, the estimate in 1983 would be based on up to two years of lookback time, and so on.

In order to enable the production of a time series of risks, the AIHW has chosen to use a lookback time of up to one calendar year for both the adjusted and unadjusted risks of diagnosis. That is, for the year for which the risks are being calculated, lookback goes back to the 1st of January of that year. Using this method we are in fact counting the number of people (not cancers) diagnosed in the year under consideration, irrespective of whether they have been diagnosed with the same cancer in a previous year. AIHW analysis has shown that this method provides a satisfactory estimate of Di, except for the group “all cancers combined”. No suitable period of lookback time was identified for this group. As such, AIHW does not produce a time series of risk of diagnosis for all cancers combined. However, the best available estimate for the latest year of data available is produced. This estimate is based on lookback to the beginning of 1982. Based on the analysis referred to above, this estimate is likely to be a few percentage points higher than the true value.