Data and methods

Ethics approvals

The project has obtained ethics approval from the AIHW Ethics Committee, and additional approval from the Human Research Ethics Committee of Northern Territory Department of Health and Menzies School of Health Research, and the New South Wales Population and Health Services Research Ethics Committee (NSW PHSREC). A National Mutual Acceptance Scheme led by NSW PHSREC is in place for the Australian Capital Territory, South Australia, Tasmania, and Victoria. A data disclosure agreement with Queensland was established for the purpose of this project.

In addition to the ethics approvals outlined above, approval has also been received from the data custodian of each state/territory or national dataset.

How were the data linked?

As an Accredited Data Service Provider, the AIHW is accredited to provide complex integration, de-identification and secure access to linked data. 

COVID-19 case linkage variables (names, addresses, dates of birth and sex) provided by jurisdictions to the AIHW were probabilistically linked to the AIHW National Linkage Spine (NLS). The AIHW NLS combines linkage variables from Medicare Consumer Directory (MCD), National Death Index (NDI), Australian Immunisation Register (AIR) and uniquely covers almost all the population of Australia who are eligible for Medicare.

Unlike previous versions, Version 2.6 of the linked data only includes MCD originating individuals and therefore, the linked data does not contain AIR-only and NDI-only cases. This is to align with the AIHW Enhanced Medicare Spine, which only includes individuals eligible for Medicare based on the MCD data.

Probabilistic record linkage is a data linkage method that makes an explicit use of probabilities to determine whether a pair of records is a match for the same person, or not. Records are matched by name, sex, address and date of birth.

Including full address information in data linkage helps to manage issues with incomplete or inaccurate names, as the combination of date of birth and address is unique for most Australians. This approach also reduces linkage bias, which can occur when linking methods unintentionally favour certain groups over another. For example, people who do not identify as female may be unintentionally removed during linkage (due to data quality concerns), causing potential bias due to an under-sampling of babies from same-sex couples. Additionally, using full address information enhances linkage accuracy, especially for individuals with common surnames, where names alone may not be sufficient for accurate linkage. The resulting COVID-19 Register does not, however, contain any identifying information.

Analytical information on COVID-19 cases from states and territories and the Commonwealth Department of Health and Aged Care National Notifiable Disease Surveillance System (NNDSS) were combined with information from the NDI, Medical Benefits Schedule (MBS), Pharmaceutical Benefits Scheme (PBS), the National Hospitals Morbidity Database (NHMD), the National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD), the National Aged Care Data Clearinghouse (NACDC), the Australian and New Zealand Intensive Care Society (ANZICS), the AIR and the National Disability Insurance Scheme (NDIS) to create a de-identified linked research data set. Figure 1 outlines the linkage processes for the current version of the project (Version 2.6). 

After both the initial and re-linkages, date of death and cause of death information from the NDI is released to the states and territories that provide the original notifiable disease data, for incorporation into their local notifiable disease systems. The aim of this is to improve NNDSS data completeness and utility, in a nationally consistent way, and add to the research potential of both the state and territory collections and the NNDSS. 

The AIHW data linkage protocols are based on the Five Safes framework which reinforce management of the privacy and confidentiality of data. These protocols prescribe strict separation of identifiers and analytical data. This means AIHW linkage staff do not have access to the personal identifiers and analytical data at the same time for the duration of the project. See the AIHW’s Data Governance framework for more information.

Figure 1: COVID-19 linked data flow

The figure shows the flow of data linked for the COVID-19 Register, starting from the COVID-19 case information and content information from the NNDSS.

A longitudinal resource for COVID-19 cases

The COVID-19 Register will be updated to include case data up to December 2022 for all jurisdictions. This date corresponds with the relaxation in COVID-19 testing and reporting requirements, which limit the completeness of data after this time. For deaths data, Australian Bureau of Statistics coded cause of death information will be incorporated as it becomes available. Updating the content provides a growing longitudinal resource for COVID-19 cases and allows research into the patients’ health journey over time. Providing data back to the states and territories enhances data completeness of their notifiable disease systems.