## Methods

### Logistic regression

A set of multivariable logistic regression models were used to estimate odds ratios and 95% confidence intervals (CI) for associations between the long-term health conditions and CALD variables, adjusting for age and social determinants of health variables. The models do not include biomedical risk factors, behavioural risk factors, environmental measures or other factors which influence a person’s health outcomes (AIHW 2022).

Logistic regression is the most common method used to analyse the association between a binary outcome (the dependent variable) and a number of exposure variables (the independent variables). It is used in statistics to estimate the probability of an event occurring (such as a long-term health condition) based on the underlying data used to create a model.

It is important to note the modelling shows association between variables but does not explain what is causing the association.

Logistic regression models the association between the outcome variable and exposure variables producing odds ratios and 95% confidence intervals:

• The outcome variables in this report were the selected long-term health conditions – constructed in binary form (i.e. reporting the condition or not reporting the condition)
• The exposure variables (covariates) in this report were the selected CALD, age and social determinant of health variables.

### Odds ratios

An odds ratio is used to measure the odds of an event occurring given an exposure or characteristic (for example reporting a certain cultural and linguistic background) compared with the odds of an outcome (for example reporting a long-term chronic health condition) in the absence of that exposure. It also represents the ratio of odds of an event in one group, compared with that in another (the reference) group.

An odds ratio of less than 1 means that the odds of an outcome occurring for a group with a certain characteristic is lower than that for the reference group.

An odds ratio of 1 means that the odds of an outcome occurring for a group with a certain characteristic is not different than that for the reference group for the respective characteristic.

An odds ratio of greater than 1 means that the odds of an outcome occurring for a group with a certain characteristic is higher than that for the reference group.

It is important to note that the odds ratio is a point estimate derived through a statistical process. For this reason, ninety-five per cent (95%) confidence intervals (CI) are also presented to indicate the statistical precision and significance. Confidence intervals show the likely range within which the true parameter value should fall.  A wide confidence interval indicates that the true value may vary substantially from the reported estimate. The result is interpreted as having a statistically significant association (that is, less likely to chance) if the value of 1.00 does not sit within the confidence interval range.

### Modelling strategy

Regression modelling may be used to predict future values of the outcome variable, or to develop an explanatory model of exposures that are associated with the outcome. The aim of the modelling strategy for these analyses was to develop a set of binomial logistic regression models to estimate odds ratios and 95% confidence intervals (CI) for associations between the long-term health conditions and CALD variables, adjusting for age and social determinants of health variables. Binomial logistic regression modelling was chosen as the dependent variables being modelled were binary and the data met the criteria for binomial logistic modelling. Results were stratified for males and females.

The modelling strategy consisted of several steps for each long-term health condition to answer the specific questions of this report:

1. Unadjusted associations between each long-term health condition (the dependent variable) and each CALD variable (the independent variable), separately.
2. A series of models where each model from step 1 was adjusted for age and each social determinant of health variable (covariates), separately.
3. A multivariable model for each long-term health condition and CALD variable that was adjusted for all covariates, together.
4. A set of models that specifically explored the interaction between English proficiency and the number of years since first arriving to Australia.

Results were stratified for males and females. The set of regression models that used the country of birth or the main languages used at home variables included specific countries of birth and languages, respectively. Country and language groups with a denominator less than 30 or a numerator less than 20 for a selected long-term health condition were excluded from the analyses. This exclusion criteria resulted in different specific country and language groups for females and males depending on the long-term health condition, so sample sizes differed in each set of analysis depending on sex and each long-term health condition variable.

The commentary for the regression modelling component of this report focuses on the odds of reporting long-term health conditions for a population of certain CALD characteristic compared with the reference group, across the set of regression models from the unadjusted models to the fully adjusted models. The greater the change in the odds ratio (in the negative or positive direction), the greater was the impact adjusting (or controlling) for other covariates in the models (i.e., age, social determinants of health). This, in turn, indicates the extent that a social determinant of health explains associations between CALD variables and long-term health conditions.