Data and methods

This report has been archived. Content previously included in this report can now be found at COVID-19 register and linked data set and COVID-19 linked data set: Linkage results.

Ethics approvals

Before any data could be linked, the project needed to receive ethics approval and funding. The project has ethics approval from the AIHW Ethics Committee, and additional approval from the Human Research Ethics Committee of Northern Territory Department of Health and Menzies School of Health Research, and the NSW Population and Health Services Research Ethics Committee (NSW PHSREC). A National Mutual Acceptance Scheme led by NSW PHSREC is in place for the Australian Capital Territory, South Australia, Tasmania, and Victoria.

In addition to the ethics approvals outlined above, the data custodian of each state/territory or national dataset also had to approve data usage in line with any jurisdictional requirements.

How was the data linked?

As a Commonwealth Accredited Data Service Provider, the AIHW has the expertise and infrastructure to undertake complex national data linkage.

Linkage variables of the COVID-19 cases were sourced from participating states and territories for linkage purposes. This was then linked with information on AIHW’s linkage spine (Medicare Consumer Directory (MCD), National Death Index (NDI) and Australian Immunisation Register (AIR)), using probabilistic record linkage. Probabilistic record linkage is a data linkage method that makes an explicit use of probabilities to determine whether a pair of records is a match for the same person, or not. Records are matched by name, sex, address and date of birth.

Analytical information on COVID-19 cases from states and territories and the Commonwealth Department of Health National Notifiable Disease Surveillance System (NNDSS) has been combined with information from the NDI, Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS, including Repatriation Schedule of Pharmaceutical Benefits (RPBS) information), the National Hospitals Morbidity Database (NHMD), the National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD), the National Aged Care Data Clearinghouse (NACDC) and the AIR to create a de-identified linked research data set. Figure 1 outlines the linkage processes for the current project.

The AIHW data linkage protocols prescribe strict separation of identifiers and analytical data within the AIHW linkage team, so that where staff have access to personal identifiers and analytical data for study participants, they will not have access to the identifiers and analytical data at the same time for the duration of the project.

Figure 1. COVID-19 linked data flow

The figure shows the flow of data to be linked in the project. Two boxes, one from the States and territories and the other from the Department of Health are pointing to AIHW, showing the flow of COVID-19 case information, and content information from the NNDSS respectively. Subsequent boxes show how each of the datasets is added on to create a de-identified linked data, stored in a secure access environment. A feedback loop shows linked deaths data being returned to the jurisdictions after linkage.

How often will the data be updated?

The project aims to re-link information periodically to identify additional deaths, and to update data where available. Australian Bureau of Statistics (ABS) coded cause of death information will be incorporated as it becomes available.

How are linked data being returned to states and territories?

After both the initial and re-linkages, date of death and cause of death information from the NDI will be released to the states and territories that provided the original notifiable disease data, for incorporation into their local notifiable disease systems. The aim of this is to improve NNDSS data completeness and utility, in a nationally consistent way, and add to the research potential of both the state and territory collections and the NNDSS.

What data sets are included?

Data sets available as of December 2022 are listed in Figure 2 below. Future iterations of the project will look to add additional sources of information.

Figure 2: Figure 2: COVID-19 linked datasets available as of March 2023

State/territory notifiable diseases data: COVID-19 cases

Medicare Benefits Schedule (MBS)

Medicare Consumer Directory (MCD)

Pharmaceutical Benefits Scheme (PBS)

National Notifiable Disease Surveillance System (NNDSS)

National Hospital Morbidity Database: admitted patient care data (NHMD)

National Aged Care Data Clearinghouse: aged care data (NACDC)

National Non-Admitted Patient Emergency Department Care Database: emergency department presentations (NNAPEDCD)

National Death Index (NDI)

Australian Immunisation Register (AIR)

How can the data be accessed?

All users who want to access the de-identified research data will be required to submit to AIHW a project proposal including a data analysis plan and a signed Australian Institute of Health and Welfare Act 1987 s29 Undertaking of Confidentiality form. This form protects the privacy of individuals by making it a criminal offence to disclose information about the participants of a study, punishable by fines and/or imprisonment. Data will not be provided to, accessed, or used by another, unauthorised party. Access is strictly controlled within a secure remote access environment, with no access allowed to other project workspaces.

Due to the detailed and sensitive nature of the data, access will only be provided via secure research environments where AIHW can apply appropriate vetting and management processes in line with AIHW’s Five Safes Framework.

In this first stage of the project, only government researchers or those funded by government will be eligible to apply for access. All other researchers will have access to the data in future stages of the project once all relevant ethics and data custodian approvals for access and use arrangements have been obtained and a suitable secure environment is available.