Linkage findings

Scope of the data

The first version of the data was made available to approved researchers in December 2022 and had around 250,000 records linked to a range of administrative data sets. The current version (Version 2) has expanded the data coverage to more than 6 million linked records with more recent case data (New South Wales), more jurisdictions (Victoria and Queensland) and more data sets as listed in Table 1 below. The linked data will continue to be regularly updated, with the aim to include all jurisdictions and additional sources of information in future versions.

Please refer to the data variables list for the temporal scope of each of the datasets and how it differs between versions.

Table 1: List of data sets included between versions

Version 1 (released in December 2022)

Number of linked records = 250,821

Version 2 (released in July 2023)

Number of linked records = 6,415,740

State/territory notifiable disease data on COVID-19 cases from:

  • Australian Capital Territory
  • New South Wales
  • Northern Territory
  • South Australia
  • Tasmania

 

 

State/territory notifiable disease data on COVID-19 cases from:

  • Australian Capital Territory
  • New South Wales (updated data included)
  • Northern Territory
  • South Australia
  • Tasmania
  • Victoria 
  • Queensland
Australian Immunisation Register (AIR) – whole of population Australian Immunisation Register (AIR) – whole of population
Medicare Benefits Schedule (MBS) – cases only Medicare Benefits Schedule (MBS) – whole of population
Medicare Consumer Directory (MCD) – whole of population Medicare Consumer Directory (MCD) – whole of population
National Death Index (NDI) – whole of population National Death Index (NDI) – whole of population
Pharmaceutical Benefits Scheme (PBS, including Repatriation Schedule of Pharmaceutical Benefits (RPBS) information) – cases only Pharmaceutical Benefits Scheme (PBS, including Repatriation Schedule of Pharmaceutical Benefits (RPBS) information) – whole of population
National Notifiable Disease Surveillance System (NNDSS) – cases only National Notifiable Disease Surveillance System (NNDSS) – cases only
National Hospitals Morbidity Database (NHMD) – cases only National Hospitals Morbidity Database (NHMD) – whole of population
National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD) – cases only National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD) – whole of population
National Aged Care Data Clearinghouse (NACDC) – cases only National Aged Care Data Clearinghouse (NACDC) – whole of population
 

Australian New Zealand Intensive Care Survey (ANZICS) Adult Patient Database (APD) – whole of population

  Australian and New Zealand Paediatric Intensive Care Registry (ANZPICR) – whole of population

Linkage rates by jurisdiction

Generally, linkage results depend on the accuracy and completeness of the linkage variables provided to AIHW: more accurate and complete data result in better linkage rates.  For more information on how the data is linked, please refer to the above section on Data and methods.

Figure 2 shows the number of records that were linked and those that were unable to be linked by state and territory. For all jurisdictions, linkage rates have generally remained the same or improved slightly, where over 90% of records supplied for the project were linked in both Version 1 and 2. There was a notable increase in number of records supplied from New South Wales where there were over 3 million records linked in Version 2, compared to just over 70,000 cases linked in Version 1, though the linkage rate for New South Wales remained similar to Version 1 at 98%. In Version 2, data coverage was expanded to include two more jurisdictions (Victoria and Queensland), where over 95% of data supplied from these jurisdictions were linked. The lower linkage rate (93%) in the Northern Territory may be due to limited address information provided with the case data, to which the AIHW is working with the Northern Territory to improve this rate. New data supply for South Australia, Tasmania, Australian Capital Territory and Northern Territory is still ongoing, hence there is no change in the linkage rates for these jurisdictions in Version 2.

Figure 2: Number of records and percentage linked by jurisdictions

The segmented horizontal bar chart compares the linkage rates for participating jurisdictions for Version 1 and 2. In both Version 1 and 2, all jurisdictions have over 90% of records linked, where Tasmania has the highest percentage of linked records (99%), followed by New South Wales and Victoria (98%), while Northern Territory has the lowest percentage of linked records (93%).

Linkage rates by population groups

Table 2 describes the linkage rates by age group and sex/gender. Linkage rates can differ by population groups, and it is important to consider this when doing analysis on linked data. For example, individuals who change addresses whilst renting may be underrepresented in linkage studies. Table 2 shows that the linkage rate has largely improved for Version 2 compared to Version 1, where the linkage rate for all groups remains at well over 90%, except the ‘Other’ sex/gender category. Sex is one of the key variables used to link records, therefore, where sex is not reported consistently, or as neither male nor female (‘Other’ in Table 2 below) linkage rates are lower. The linkage rate for ‘Other’ has shown considerable improvement from 3% in Version 1 to about 77% in Version 2, though the linkage rate remains lower than males or females. There were no other large differences observed in linkage rates across the age groups.

Table 2: Linkage rates by population groups

 

Version 11

No. of records linked (%)

Version 11

No. of records not linked (%)

Version 2

No. of records linked (%)

Version 2

No. of records not linked (%)

Sex/gender2

 

 

 

 

Male

125,673 (96.4%)

4,689 (3.6%)

3,020,677 (97.7%)

72,564 (2.3%)

Female

125,075 (97.2%)

3,553 (2.8%)

3,382,173 (97.8%)

75,163 (2.2%)

Other3

73 (3.0%)

2,353 (97.0%)

13,765 (77.3%)

4,031 (22.7%)

Age group4

 

 

 

 

0-15

47,241 (96.6%)

1,675 (3.4%)

1,141,652 (97.2%)

33,225 (2.8%)

16-29

73,074 (95.1%)

3,739 (4.9%)

1,463,851 (97.2%)

42,122 (2.8%)

30-49

79,326 (95.9%)

3,422 (4.1%)

2,104,378 (98.3%)

35,554 (1.7%)

50-69

39,433 (96.9%)

1,252 (3.1%)

1,253,801 (98.8%)

15,343 (1.2%)

70+

11,747 (95.9%)

506 (4.1%)

452,888 (94.7%)

25,205 (5.3%)

  1. Results for Version 1 (released on 16 December 2022) are based on those participating states and territories as detailed in Figure 2 and will not be directly comparable to the figures in the previously released web report ‘Establishing a COVID-19 linked dataset’ which also includes Victoria.
  2. As reported by the state and territory.
  3. Other includes records where sex or gender is not reported, or sex is reported as neither male nor female.
  4. Age group is based on age as at 31 December 2022. Records with missing information on birth date are excluded. Person IDs with more than one year of birth and/or sex were restricted to the most recent notification date (only small number of records were affected). Where the notification dates were equal, a random record was used.