## Methods

### Age-adjusted survival

Comparisons of cancer survival rates over time may be affected by changes in the age composition of those diagnosed. For example, if more older people are diagnosed with cancer over time and older people have lower survival rates, improvements in survival over time may be offset by the increasingly older age of people diagnosed with cancer.

In order to calculate age-adjusted survival we first choose a fixed period, called the base period, and take note of the age composition of the people who were diagnosed with the cancer of interest during that period. We calculate age-adjusted survival for other periods by assuming that the age composition of patients in the other period is the same as that of the base period. Thus the age-adjusted survival is effectively the survival that would have occurred had there been no change in age composition from the base period.

Age-*adjusted* survival is different to age-*standardised* survival. An age-standardised rate uses the same standard population for all cancers and sexes (including persons). Using a standard population allows meaningful comparisons between different cancers, sexes and across time. In contrast, age-adjusted survival rates use a population relevant to the specific cancer (or cancer group) and sex to allow meaningful comparisons across time. Age-adjusted survival rates are only intended to enhance the understanding of how survival has changed over time for the specific cancer and sex and are not directly comparable with other cancers or sexes.

CdiA does not currently report on age-standardised survival but future releases are expected to contain age-standardised survival rates.

### Age-standardised rates (ASR)

A crude rate provides information on the number of, for example, new cases of cancer or deaths from cancer by the population at risk in a specified period. No age adjustments are made when calculating a crude rate. Since the risk of cancer heavily depends on age, crude cancer incidence and mortality rates are not as suitable for looking at changes over time or making comparisons between different population groups if there are differences in those populations’ age structures.

More meaningful comparisons can be made using ASRs, with such rates adjusted for age in order to facilitate comparisons between populations that have different age structures – for example, between Indigenous Australians and other Australians. This standardisation process effectively removes the influence of age structure on the summary rate.

There are two methods commonly used to adjust for age: direct and indirect standardisation. In this report, the direct standardisation approach presented by Jensen and colleagues (1991) is used. To age-standardise using the direct method, the first step is to obtain population numbers and numbers of cases (or deaths) in age ranges – typically 5-year age ranges. The next step is to multiply the age-specific population numbers for the standard population (in this case, the Australian population as at 30 June 2001) by the age-specific incidence rates (or death rates) for the population of interest. The next step is to sum across the age groups and divide this sum by the total of the standard population to give an ASR for the population of interest. Finally, this is expressed per 100,000 population in this report.

In addition to rates age-standardised to the 2001 Australian Standard Population, the CdiA report also offers rates age-standardised to the year of release. The basic trend analysis between the two rates is often similar. However, the 2023 population is overall, much older than the 2001 population. Cancer is more common in the older populations and accordingly, the 2023 age-standardised rates are often higher than the 2001 and are more relevant to cancer today. The 2001 Australian Standard Population is available as the current Australian standard. World Health Organisation and Segi age standardised incidence rates are also available in the summary data visulation as well as Excel data tables.

### Age-specific rates

Age-specific rates provide information on the incidence of a particular event in an age group relative to the total number of people at risk of that event in the same age group. It is calculated by dividing the number of events occurring in each specified age group by the corresponding ‘at-risk’ population in the same age group and then multiplying the result by a constant (for example, 100,000) to derive the rate. Age-specific rates are often expressed per 100,000 population.

### Australian Cancer Database

All forms of cancer, except basal and squamous cell carcinomas of the skin, are notifiable diseases in each Australian state and territory. This means there is legislation in each jurisdiction that requires hospitals, pathology laboratories and various other institutions to report all cases of cancer to their central cancer registry. An agreed subset of the data collected by these cancer registries is supplied annually to the AIHW, where it is compiled into the ACD. The ACD currently contains data on all cases of cancer diagnosed from 1982 to 2019 for all states and territories.

Cancer reporting and registration is a dynamic process, and records in the state and territory cancer registries may be modified if new information is received. As a result, the number of cancer cases reported by the AIHW for any particular year may change slightly over time and may not always align with state and territory reporting for that same year.

For more information on the ACD please see the ACD 2019 Data Quality Statement.

### Estimating late registrations of cancer for 2019

In recent CdiA reports, the most recent year of incidence data included an estimate for late registrations. This year’s release does not include estimates for late registrations. Late registrations are likely to still occur and incidence counts and rates may be understated to some extent.

### International Classification of Diseases for Oncology (ICDO)

Cancers were originally classified solely under the ICD classification system, based on topographic site and behaviour. However, during the creation of the Ninth Revision of the ICD in the late 1960s, working parties suggested creating a separate classification for cancers that included improved morphological information. The first edition of the ICD-O was subsequently released in 1976 and, in this classification, cancers were coded by both morphology (histology type and behaviour) and topography (site).

Since the First Edition of the ICD-O, a number of revisions have been made, mainly in the area of lymphoma and leukaemia. The current edition, the Third Edition (ICD-O-3), was released in 2000 and is used by most state and territory cancer registries in Australia, as well as by the AIHW in regard to the ACD.

### National Mortality Database

The AIHW National Mortality Database (NMD) contains information provided by the Registries of Births, Deaths and Marriages and the National Coronial Information System – and coded by the ABS – for deaths from 1964 to 2020. Registration of deaths is the responsibility of each state and territory Registry of Births, Deaths and Marriages. These data are then collated and coded by the ABS and are maintained at the AIHW in the NMD.

In the NMD, both the year in which the death occurred and the year in which it was registered are provided. For the purposes of this report, actual mortality data are shown based on the year the death occurred, except for the most recent year (namely 2021) where the number of people whose death was registered is used. Previous investigation has shown that the year of death and its registration coincide for the most part. However, in some instances, deaths at the end of each calendar year may not be registered until the following year. Thus, year of death information for the latest available year is generally an underestimate of the actual number of deaths that occurred in that year.

In this report, deaths registered in 2018 and earlier are based on the final version of cause of death data; deaths registered in 2019, 2020 and 2021 are based on revised and preliminary versions, respectively, and are subject to further revision by the ABS.

The data quality statements underpinning the AIHW NMD can be found on the following ABS internet pages:

- ABS quality declaration summary for
*Deaths, Australia* - ABS quality declaration summary for
*Causes of death, Australia*.

For more information on the AIHW NMD see *Deaths data at AIHW**.*

### Population Data

Throughout this report, population data were used to derive rates of, for example, cancer incidence and mortality. The population data were sourced from the ABS using the most up-to-date estimates available at the time of creating this report.

To derive its estimates of the resident populations, the ABS uses the 5-yearly Census of Population and Housing data and adjusts it as described here:

- All respondents in the Census are placed in their state or territory, Statistical Local Area and postcode of usual residence; overseas visitors are excluded.
- An adjustment is made for persons missed in the Census.
- Australians temporarily overseas on Census night are added to the usual residence Census count.

Estimated resident populations are then updated each year from the Census data, using indicators of population change, such as births, deaths and net migration. More information is available from the ABS website.

The 2023 to 2033 population estimates were sourced from the Centre of Population January 2023 update of the National age and sex structure, 2021–22 to 2032–33.

### Prevalence

Limited-duration prevalence is expressed as *N-year prevalence *throughout this report. *N-year prevalence *on a given index date – where *N *is any number 1, 2, 3 and so on – is defined as the number of people alive at the end of that day who had been diagnosed with cancer in the past *N* years. For example:

- 1-year prevalence is the number of living people who were diagnosed in the past year to 31 December 2019
- 5-year prevalence is the number of living people who were diagnosed in the past 5 years to 31 December 2019. This includes the people defined by 1-year prevalence.

Note that prevalence is measured by the number of people diagnosed with cancer, not the number of cancer cases. An individual who was diagnosed with two separate cancers will contribute separately to the prevalence of each cancer. However, this individual will contribute only once to prevalence of all cancers combined. For this reason, the sum of prevalence for individual cancers will not equal the prevalence of all cancers combined.

### Projections - Estimating the incidence of cancer

Please note that no adjustments have been made to the projections to account for the potential impact of COVID.

Estimates of national incidence in 2020–2023 were estimated by projecting the sex- and age-specific incidence rates observed in Australia during 2010–2019. The time series were stratified by the following variables:

- sex
- 5-year age group (0–4, …, 85–89, 90+)
- 4-character ICD-O-3 topography code (C00.0, …, C80.9)
- 4-digit ICD-O-3.1 histology code (8000, …, 9992).

For each time series, the process was as described below:

- least squares linear regression was used to find the straight line of best fit through the time series
- if the slope was positive, the straight line of best fit was extrapolated to obtain the estimate of the 2020 rate
- if the slope was negative, the time series floor was set to 0
- the estimated incidence rates for 2020 were then multiplied by the Estimated Resident Populations for 2020 to obtain the estimated incidence numbers.

Note the following:

- estimates were made for Australia as a whole, not for individual jurisdictions
- for the majority of cancers, instead of using the topography and histology codes to define the cancer groups, ICD-10 codes were used (for example breast or melanoma of the skin as well as groupings such as head and neck cancers which is a consolidation of cancers of the lip, tongue, mouth, salivary glands, oropharynx, nasopharynx, hypopharynx and other sites in the pharanyx).
- the 10 years of incidence data used as the baseline were 2010–2019
- for populations, the ABS Estimated Resident Populations were used for 2010–2022, and the 2023 population estimates were sourced from the Centre of Population.
- The method for projecting cancer incidence rates relies on the assumption that incidence trends are likely to provide a useful basis to project future cancer incidence rates and counts. For prostate cancer, this has not been the case in more recent years. Prostate cancer incidence trends now use the latest available incidence rates by age, applied to the relevant populations by age, to arrive at projected incidence and counts.

### Projections - Estimating the mortality of cancer

This method is the same as the incidence projections with the exceptions that:

- the 10-year baseline for incidence is 2010–2019 while the baseline for mortality from the NMD is 2012–2021 and the baseline for mortality from the ACD is 2009–2018.

### Relative survival

Relative survival is a measure of the survival of people with cancer compared with that of the general population. It is the standard approach used by cancer registries to produce population-level survival statistics and is commonly used as it does not require information on cause of death. Relative survival reflects the net survival (or excess mortality) associated with cancer by adjusting the survival experience of those with cancer for the underlying mortality that they would have experienced in the general population.

Relative survival is calculated by dividing observed survival by expected survival, where the numerator and denominator have been matched for age, sex and calendar year.

Observed survival refers to the proportion of people alive for a given amount of time after a diagnosis of cancer; it is calculated from population-based cancer data. Expected survival refers to the proportion of people in the general population alive for a given amount of time and is calculated from life tables of the entire Australian population. (Ideally these life tables should be restricted to the population of Australians who do not have cancer but such life tables are unavailable. It is standard practice around the world to use life tables for the entire population.)

A simplified example of how relative survival is interpreted is shown in Figure G1. Given that 6 in 10 people with cancer are alive 5 years after their diagnosis (observed survival of 0.6) and that 9 in 10 people from the general population are alive after the same 5 years (expected survival of 0.9), the relative survival of people with cancer would be calculated as 0.6 divided by 0.9, which is 0.67. This means that individuals with cancer are 67% as likely to be alive for at least 5 years after their diagnosis as are their counterparts in the general population.

#### Figure M1: Simplified example of how relative survival is calculated

The survival statistics in this report were produced using a modified version of a SAS program written by Dickman (2004) and employed the period method (Brenner and Gefeller 1996) with 1-year intervals. Observed survival was calculated from data in the ACD. Expected survival was calculated using the Ederer II method whereby matched people in the general population are considered to be at risk of death until the corresponding cancer patient dies or is censored (Ederer and Heise 1959).

### Calculation of conditional relative survival

Conditional survival is the probability of surviving *j *more days, given that an individual has already survived *i *days. It was calculated using the formula:

where

*S*(*j*|*i*) is the probability of surviving at least *j *more days given that the person has already survived at least *i* days

*S*(*i* + *j*) is the probability of surviving at least *i* + *j* days

*S* (*i*) is the probability of surviving at least *i* days

Confidence intervals for conditional survival were calculated using a variation of Greenwood's (1926) formula for variance (Skuladottir & Olsen 2003):

where

*d _{k} *is the number of deaths

*r _{k} *is the number at risk during the

*k*th interval.

The 95% confidence intervals were constructed assuming that conditional survival estimates follow a normal distribution.

### Risk

We use 19 age groups, numbered 1 to 19. Age group *i* (*i *= 1 to 18) is 5 years wide and comprises all ages in the interval (5*i - *5*, *5*i)*. Age group 19 comprises all ages 90 and above. The cancer under consideration is referred to as “the cancer”. This could be a specific cancer, a group of related cancers or all cancers combined. There are two different measures of risk, one adjusted for competing mortality and one not adjusted. For brevity, these are called the adjusted risk (*AR*) and unadjusted risk (*UR*). The full notation is as follows, where *D* is for diagnosis and *M* is for mortality.

*ARD*(5*i*) = adjusted risk of being diagnosed with the cancer before age 5*i* (*i* = 1 to 18),

*ARD*(∞) = adjusted lifetime risk of being diagnosed with the cancer,

*ARM*(5*i*) = adjusted risk of dying from the cancer before age 5*i* (*i* = 1 to 18),

*ARM*(∞) = adjusted lifetime risk of dying from the cancer,

and similarly for *URD* and *URM*.

For each age group *i*, the following three rates are used in the risk formulas.

*D*_{i }= rate of first ever diagnosis of the cancer (the first in one's life, not the first in age group *i*)_{ },

*M*_{i }= rate of death from the cancer_{ },

*A*_{i }= rate of death from all causes (including the cancer)_{ },

Note that the denominator of *D*_{i }is the general population, not the population of people who have never been diagnosed with the cancer.

#### Risk not adjusted for competing mortality

As this measure of risk is not adjusted for competing mortality, the formulas are relatively simple and do not involve *A*_{i}. The formulas come from Day (1987).

*URD*(5*i*) = , *i* = 1, 2, ..., 18

*URD*(∞) = 1.

*URM*(5*i*) = , *i* = 1, 2, ..., 18

*URM*(∞) = 1.

Note that the lifetime risks are necessarily 1. Not adjusting for competing mortality is equivalent to the scenario where it is impossible to die of any cause other than the cancer. Hence every person must eventually be diagnosed with the cancer and eventually die from it. This is why it is not informative to report unadjusted lifetime risks.

#### Risk adjusted for competing mortality

The formulas in this section come from Fay *et al.* (2003). The risk of diagnosis is as follows.

The formula for risk of death is the same as above except that *M*_{i} replaces *D*_{i} throughout.

#### Use of a proxy to calculate risk of diagnosis

In order to calculate the risk of diagnosis we need the age-specific rates, *D*_{i}, at which people are being diagnosed with the cancer for the first time in their lives. This requires knowledge of each person’s cancer history from birth. As the Australian Cancer Database (ACD) starts from the beginning of 1982, this is impossible for most age groups and will remain impossible for many decades to come. In order to estimate the risk of diagnosis we need a satisfactory proxy for *D*_{i}.

The best available estimate of *D*_{i} is obtained by using the entire history of the ACD. That is, instead of counting first ever diagnoses (which is impossible) we count “first from 1/1/1982” diagnoses. However, using such an estimate would mean that we couldn’t produce a consistent time series of risks. This is because each estimate in the time series would be based on a different amount of “lookback time” for previous diagnoses. The estimate in 1982 would be based on at most one year of lookback time, the estimate in 1983 would be based on up to two years of lookback time, and so on.

In order to enable the production of a time series of risks, the AIHW has chosen to use a lookback time of up to one calendar year for both the adjusted and unadjusted risks of diagnosis. That is, for the year for which the risks are being calculated, lookback goes back to the 1st of January of that year. Using this method we are in fact counting the number of people (not cancers) diagnosed in the year under consideration, irrespective of whether they have been diagnosed with the same cancer in a previous year. AIHW analysis has shown that this method provides a satisfactory estimate of *D*_{i}, except for the group “all cancers combined”. No suitable period of lookback time was identified for this group. As such, AIHW does not produce a time series of risk of diagnosis for all cancers combined. However, the best available estimate for the latest year of data available is produced. This estimate is based on lookback to the beginning of 1982. Based on the analysis referred to above, this estimate is likely to be a few percentage points higher than the true value.