Technical notes

2021 NSHS data collection and reporting methodology

Introduction

This appendix provides an overview of the 2021 National Social Housing Survey (NSHS) data collection and reporting methodology. Further information on the 2021 NSHS methodology, including a copy of the final questionnaire, can be found in the 2021 NSHS methodological report prepared by Lonergan Research, available from the Australian Institute of Health and Welfare website.

Data collection

The data quality statement for the 2021 NSHS is available online. Key information is as follows.

Survey scope

The 2021 NSHS collected information from tenants of 3 social housing programs – public housing (PH), community housing (CH), and state owned and managed Indigenous housing (SOMIH).

Data collection methodology

The COVID‑19 pandemic impacted both the timing of (postponed from 2020) and planned method of conducting the 2021 NSHS.

Among PH, CH and SOMIH tenants (the latter Queensland, South Australia and Tasmania only), the 2021 NSHS was conducted via a mail-out paper questionnaire, with an option provided for online completion.

Among SOMIH tenants in New South Wales, and a small number of ACT CH tenants, the 2021 NSHS was conducted via face-to-face interview. Where tenants were not at home, a drop-at-home survey pack was left at the property.

Face-to-face interviews were initially intended to be used to conduct the 2021 NSHS for SOMIH and ICH tenants in Queensland. Due to COVID‑19, Queensland SOMIH tenants instead received mail-out questionnaires and the survey was not conducted with ICH tenants.

The 2021 NSHS used the same survey instrument across PH, CH and SOMIH, with the exception of some state specific additions (for ACT PH and SA PH/SOMIH). Before 2010, the survey content differed slightly across programs, reflecting different areas of interest in relation to each program. Since 2012, the adoption of more consistent survey instruments has allowed greater data comparability across social housing programs. See the NSHS 2021 methodological report for more information.

Each jurisdiction provided information for each tenancy and each social housing program to Lonergan Research. To protect tenancy privacy and confidentiality, information was handled in line with relevant legislation. All remoteness areas were included in the sample. For the postal component of the survey, various factors (see Survey and interview response rates) may have affected the number of responses received from tenants in these areas.

Sample design

Consistent with 2018, stratified sampling was undertaken to reduce sampling error and to maximise the chance that jurisdiction/program sample targets were met. Minimum sample quotas were again employed for remoteness-based strata. For New South Wales, additional stratification was undertaken based on Department of Communities and Justice districts. Quotas were set for each jurisdiction/housing strata, as shown in Table A1. The actual responses received are shown in Table A2.

Table A1: Quotas set for 2021 NSHS, by housing program and state/territory

Jurisdiction

PH

SOMIH

CH

NSW

500

500

540

Vic

500

. .

350

Qld

1,000

500

500

WA

500

. .

350

SA

500

300

700

Tas

500

200

350

ACT

500

. .

200

NT

500

n.a.

n.a.

. .               Not applicable (state or territory does not have the program)

n.a.            Not available (jurisdiction not in scope for the 2021 NSHS in the program

Survey and interview response rates

The response rate for the mail-out/online component of the 2021 NSHS was 26%; for face-to-face interviews, it was 52%. Some non-response bias is expected. The ‘Sample alignment with administrative data section’ examines key differences between the sample population and the actual population—therefore providing some indication of the potential for non-response bias. Apart from sample weighting (see ‘Weighting’ following this section), no adjustments have been made for non-response bias.

Changes to the management of tenant privacy for the 2021 NSHS meant that Lonergan Research was unable to be provided with personal information for PH tenants in New South Wales, Victoria and the Northern Territory. Letters were addressed ‘to the tenant’, which impacted particularly on remote areas that often require personalised mail to be received, resulting in many being returned to sender. Where no personal information was provided, tenants could not be sent digital reminders. Response rates dropped sharply in all three of these jurisdictions.

Lockdowns and mail delays due to COVID‑19 also impacted on response rates in 2021. Response rates by housing program and jurisdiction are provided in Table A2.

Table A2: 2021 NSHS coverage and response rates (%), by housing program, by state and territory

Program

NSW

Vic

Qld

WA

SA

Tas

ACT

NT

PH

 

 

 

 

 

 

 

 

Responses (no.)

487

475

949

513

561

517

583

471

Response rate

18.9

21

28.5

31

45.2

31.8

32

20.2

CH

 

 

 

 

 

 

 

 

Responses (no.)

564

314

509

443

677

342

201

n.a.

Response rate

22.6

24.9

26.2

32.3

29.4

28.5

27.2

n.a.

SOMIH

 

 

 

 

 

 

 

 

Responses (no.)

528

. .

522

. .

263

52

. .

n.a.

Response rate

52.2

. .

19.6

. .

20.6

24.4

. .

n.a.

Notes

  1. For the mail-out/online component, the response rate was calculated as the number of completed surveys returned as a percentage of the total tenants mailed (excluding any that were returned to sender). For SOMIH face-to-face surveys, the response rate calculated as the number of completed interviews as a percentage of the total number of interviews attempted.
  2. SOMIH tenants were surveyed via face-to-face interviews in New South Wales and via mail-out in Queensland, South Australia and Tasmania. Response rates between the 2 methodologies are not directly comparable.

Weighting

Consistent with the 2018 NSHS, a grouped weighting methodology was employed. Population groups were created across 3 variables: housing type, jurisdiction, and remoteness. The weighting was calculated as follows: the number of households in each population group divided by the number of usable survey responses. All population counts were confirmed by the states and territories.

Sampling error

The estimates are subject to sampling error. Relative standard errors (RSEs) are calculated for findings from the 2021 NSHS to help the reader assess the reliability of the estimates. Only estimates with RSEs of less than 25% are considered sufficiently reliable for most purposes. Results subject to RSEs of between 25% and 50% are marked as such and should be considered with caution. Those with RSEs greater than 50% are considered too unreliable and are not published. To help interpret the results further, 95% confidence intervals (the estimate plus or minus 2 standard errors) are available online as supplementary tables to the 2021 NSHS.

Non-sampling error

The estimates are subject to both sampling and non-sampling errors. The survey findings are based on self-reported data. Non-sampling errors can arise from errors in reporting of responses (for example, failure of respondents’ memories or incorrect completion of the survey form), or the unwillingness of respondents to reveal their true responses. Further non-sampling errors can arise from coverage, interviewer or processing errors. It is also expected that there is some level of non-response error where there are higher levels of non-response from certain subpopulations.

Comparability with previous NSHSs

Surveys in this series began in 2001. Over time, the survey’s methodology and questionnaire design have been modified. The sample design and the questionnaire of the 2021 survey differ in some respects from previous versions of the survey. Full details are available in the NSHS 2021 methodological report.

The revisions of the survey undertaken for the 2021 NSHS were the most substantial since 2012. These revisions included some restructuring of sections, changes to question wording, the addition of COVID‑19 related questions and new state specific questions (for SA PH/SOMIH and ACT PH).

The 2021 NSHS sampling and stratification methods were similar to those for the 2018 survey: a sample was randomly selected from each stratum. Some additional location based stratification was undertaken for New South Wales in 2021.

For the 2021 NSHS, caution should be used when comparing trend data or data between states and territories due to differences in response rates and non-sampling errors. Some substantial decreases in response rates for mail-out surveys was observed in 2021.

As in 2016 and 2018, the data collected for SOMIH was sourced using 2 methodologies; via mail-out and via face-to-face interview. Since 2016, the mail-out approach was used for SOMIH tenants in South Australia and Tasmania and the face-to-face approach was used for SOMIH tenants in New South Wales. However, in 2021 the approach in Queensland for these tenants changed from face-to-face (used in 2016 and 2018) to mail-out. Different methodologies not only influence the overall response rate, but also have potential impacts on the completion of each question and how tenants perceived and responded to questions. Trend data from before 2016 (and also in 2016 and 2018 for Queensland) and comparisons between states and territories, should therefore be interpreted with caution.

Refer to data quality statements for the 2014 NSHS, 2016 NSHS, 2018 NSHS and 2021 NSHS and their accompanying technical reports before comparing data across surveys.

Reporting methodology – respondents versus households

Responses to the NSHS can report either:

  • information about the social housing tenant completing the survey (the respondent), such as age and gender
  • information provided by the respondent:
    • that refers to themselves and other individuals in the social housing household, such as whether there are any adults in the household currently working full time
    • on behalf of all members of their household, such as whether the location of their dwellings meet the needs of the household.

In each instance, this is noted under the relevant chart or table throughout the report.

It is important to distinguish between household-level responses and responses to those questions that specifically target the individual who completed the survey. Responses related to the individual completing the survey may not apply to other members of the household.

It should also be noted that, where survey respondents have provided information on behalf of other household members, they have not been asked if they had consulted members in formulating their responses.

Missing data

Some survey respondents did not answer all questions, either because they were unable or unwilling to provide a response. The survey responses for these people were retained in the sample, and the corresponding values were set to missing. Cleaning rules resulted in the imputation of responses for some missing values. Missing responses were excluded from the numerator and denominator of estimates presented in this report.

Sample alignment with administrative data

As part of the NSHS, tenants who responded to the survey were asked to report the gender and age of all members of their household; they were also asked questions to establish if anyone in the household was Indigenous or had a need for assistance due to disability. Table B1 compares the age and gender distribution of all 2021 NSHS household members with similar information from administrative data collections. The distribution of 2021 NSHS households across selected household-level characteristics is also compared with corresponding information from administrative data collections. For this analysis, the 2021 NSHS data were weighted. Weighting helps account for over- or under-representation of particular groups of tenants in the responding sample, to the extent that these differences reflect differences across jurisdiction by remoteness by housing program categories (these are the groups, or strata, used to determine weights for sample responses).

As Table B1 shows, while there was broad alignment between the 2021 NSHS and administrative data results, there were also some differences, particularly among SOMIH households. This may be partly due to the much smaller size of that program, so that relatively small differences in numbers would lead to greater differences in proportions.

Within PH and CH, older tenants appeared to be over-represented in the NSHS, compared with administrative data, while the profile of NSHS SOMIH tenants was younger than in the administrative data. SOMIH was conducted via face-to-face interviews in New South Wales which contributed more than one-third of the total SOMIH sample. It may be that the different collection methodologies resulted in different response biases.

One characteristic recording a noticeable difference between 2021 NSHS results and the corresponding information drawn from administrative data is household composition. For all programs, the proportion of sole parents with children was markedly higher in the NSHS than in the administrative data collections, and the proportion of group or mixed composition households was lower in the NSHS.

While most of the NSHS analysis in this report drew on information about the entire time a tenant had been living in social housing, in Table B1, NSHS information about time in the current home was used, as that information would more closely compare with information about tenure length from administrative data collections. Even so, it appeared that households who had been in social housing for longer were over-represented in the NSHS, particularly among SOMIH tenants.

Finally, there were some discrepancies between the NSHS and administrative data in the proportions of Indigenous households, and households where there was a household member with disability.

Table B1: Distribution of 2021 NSHS households and occupants across selected characteristics, compared with distribution in 2021 administrative collections (%)
  PH
NSHS 2021
PH
Admin. data
SOMIH
NSHS 2021
SOMIH
Admin. data
CH
NSHS 2021
CH
Admin. data

Gender (all occupants)

 

 

 

 

 

 

Males

42

44

41

45

41

44

Females

53

55

57

55

53

55

Not stated

5

1

2

0

6

1

Age (years) (all occupants)

 

 

 

 

 

 

Under 5

3

5

7

6

4

5

5 to 17

15

21

32

31

13

19

18 to 24

5

8

10

11

6

9

25 and over

69

67

49

52

69

66

Not stated

7

0

3

0

8

1

Household composition

 

 

 

 

 

 

Single adult

57

57

24

22

59

62

Couple only

10

7

6

5

10

6

Sole parent with dependent children

17

13

40

25

17

12

Couple with dependent children

6

3

9

8

4

3

Group and mixed composition

4

17

15

40

5

16

Not stated

7

4

6

1

5

3

Tenure length

 

 

 

 

 

 

2 years or less

15

15

16

17

23

n.a.

Over 2 years–5 years

17

18

17

21

19

n.a.

Over 5 years–10 years

15

19

19

26

24

n.a.

Over 10 years–15 years

13

12

10

12

13

n.a.

Over 15 years–20 years

11

10

11

6

6

n.a.

Over 20 years

25

18

25

9

10

n.a.

Not stated

4

8

2

8

4

 

Indigenous household status

 

 

 

 

 

 

Indigenous household

11

13

96

100

11

11

Not Indigenous household

78

66

2

0

79

85

Not determined

11

21

2

0

10

5

Household disability status

 

 

 

 

 

 

Person/s in household with disability

28

38

17

19

25

30

No person in household with disability

71

51

83

48

74

64

Not determined

1

10

1

34

1

6

Note: Components within each characteristic may not add to 100% because of rounding.

Sources: AIHW administrative data collections; NSHS 2021

Regression analysis – details

Regression analysis of NSHS data was used to examine the statistical relationships between multiple explanatory factors and tenant satisfaction. This type of statistical technique shows which individual factors are significantly associated with tenant satisfaction, after simultaneously accounting for the confounding effects of the other factors included in the model (see, for example, Sperandei 2014).

In particular, regression analysis was used to help answer the following key questions:

  • What are the most important factors associated with tenant satisfaction, after accounting for differences in geography, demographics and housing-related factors?
  • Do the factors associated with satisfaction differ depending on the type of housing program?
  • How do we account for apparent differences in satisfaction between different populations? What factors best explain the observed differences?

This appendix provides a detailed description of the regression analysis method and results.

Method

Logistic regression was the statistical technique used for this analysis. Logistic regression is an appropriate analytical technique to use when the outcome variable has
2 categories. In the analysis used for this report, the outcome variable had two categories: whether the social housing tenant was satisfied (satisfied or very satisfied) or not satisfied (neither satisfied nor dissatisfied, dissatisfied or very dissatisfied) with the services provided by their housing organisation.

A regression model was developed that included variables available in the NSHS data set (referred to as factors in this report) that had been identified in previous analyses as being potentially related to tenant satisfaction, along with key geographic and sociodemographic factors (Table C1). This model (Model 1) was used to analyse all social housing tenants in the 3 main programs combined – PH, CH and SOMIH. Similar models were used to analyse tenants within each program – (Models 2–4). The only differences in Models 2–4 compared with Model 1 were:

  • Models 2–4 did not include 'housing program' as a variable, as each was single-program only.
  • Model 3 (SOMIH) did not include the variable 'Whether Indigenous household' as the SOMIH program is specifically targeted at Indigenous households.

More information about the variables used in the analysis is provided in Table C1. In order to have a point of reference, so that the direction and size of a factor’s relationship with satisfaction can be seen, a base case (reference category) is assigned for each variable in the model (for example, for the variable housing program, the base case is PH). The reference group is a hypothetical group of tenants with all the base case characteristics combined.

Base cases for each variable were selected because they provide a useful point of reference—for example, they were the bottom or top of a variable range (for example, age group 0–34, education less than Year 10, employed); they represented the most common group (for example, PH, major cities, females, Non-Indigenous households, households without disability, no structural problem, 7 working facilities, ‘adequate’ home utilisation, and house as the previous dwelling type); or they represent a benchmark for tenant satisfaction (for example, Queensland, couples without children,  and living in social housing for 0–5 years).

The logistic regressions were computed in SAS using PROC SURVEYLOGISTIC, which provides for including a survey weight. The survey weight was included in these analyses to partly account for over- or under-representation (by housing program, state/territory and remoteness and program type) of particular groups of tenants in the responding sample.

Table C1: Variables and categories used in the regression model

Variable/category

Variable construction

Outcome variable:

Tenant satisfaction

Satisfied

Not satisfied

Observations with invalid or missing responses were excluded from the analysis.

Satisfied = Very satisfied or satisfied

Not satisfied = Neither satisfied nor dissatisfied, Dissatisfied, Very dissatisfied

Explanatory variables (factors)

State/territory

NSW, Vic, Qld (Base case), WA, SA, Tas, ACT, NT

As recorded.

No missing or invalid responses.

Remoteness

Major cities (base case), Inner regional, Outer regional, and Remote/Very remote

Categories ‘Remote’ and ‘Very remote’ were combined.

No missing or invalid responses.

Age group (years)

0–34 (base case)

35–44, 45–54, 55–64, 65 and over

Observations with invalid or missing responses were excluded from the analysis.

‘14 years and under’, ‘15–19 years’, ‘20–24 years’ and ‘25–34 years’ were combined, and ‘65–74’ and ‘75 years or over’ were combined.

Sex

Observations with invalid or missing responses were excluded from the analysis.

Female (base case)

 

Male, non-binary

 

Highest level of education

Bachelor degree or above, Certificate, Diploma or Advanced Diploma, Years 11–12, lower than year 10, Year 10 (base case)

Observations with invalid or missing responses were excluded from the analysis.

Categories ‘Year 11’ and ‘Year 12’ were combined.

Categories ‘Did not go to school’, ‘Year 6 or below’, ‘Year 7’, ‘Year 8’ and ‘Year 9’ were combined.

Employment status

Observations with invalid or missing responses were excluded from the analysis.

Employed (base case)

 

Not employed

 

Whether Indigenous household (this factor not in SOMIH model)

Indigenous household

Household not Indigenous (base case)

Observations with invalid or missing responses for any of the relevant questions were excluded from the analysis.

Classified as Indigenous if tenant identified that they or another member of their household were Indigenous.

Classified as Non-Indigenous if tenant (a) did not identify any member of their household (including themselves) as Indigenous and (b) identified that they (and any other members of the household) were not Indigenous.

Whether person with disability in household

1 or more persons with disability in household, other households (base case)

Observations with invalid or missing responses for the relevant questions were excluded from the analysis.

Classified as at least 1 person with disability in household if tenant identified that they or another member of their household had a need for assistance with self-care, body movement or communication activities due to a long-term health condition or disability. Else classified as no household members with disability.

Living situation

Single person living alone (base case)

Single parent (Single person with 1 or more children in household)

Couple with no children in household

Couple with 1 or more children in household

Other households

Observations with invalid or missing responses were excluded from the analysis.

‘Other households’ were also excluded from the analysis; this represented a very small number of observations.

Categories ‘Extended family with 1 or more children in household’, ‘Extended family with no children in household’ and ‘Group of unrelated adults’ were combined.

Housing program

Public housing (base case), Community housing, State owned and managed Indigenous housing

As recorded.by fieldwork provider

No missing or invalid responses.

 

Number of structural problems

0 (base case), 1, 2, 3+

 

Observations with invalid or missing responses were excluded from the analysis.

Number of working facilities

0–6 (base case), all 7 nominated

Observations with invalid or missing responses were excluded from the analysis.

Housing utilisation

Overcrowded, Adequate (base case), Underutilised

Observations with invalid or missing responses to the relevant questions were excluded from the analysis.

Refer to Canadian National Occupancy Standard definition in Glossary.

Time living in social housing (years)

0–5 (base case), 6–10, 11–15, 16+

Observations with invalid or missing responses were excluded from the analysis.

Categories ‘Less than a year’, ‘1–2 years’ and ‘3–5 years’ were combined, categories ’16–20’ and ’21 or more’ were combined.

Previous dwelling type

House/townhouse/flat (base case)

Other than a house/townhouse/flat

Observations with invalid or missing responses were excluded from the analysis.

All categories other than ‘House/townhouse/flat’ were combined into a single category, comprising: caravan/cabin/boat/mobile home, no dwelling/improvised dwelling/motor vehicle/tent, and temporary accommodation/institution/other.

Results

The results from the regression analysis are in the form of predicted probabilities. These are the likelihood, estimated by the models, of a tenant’s reporting that they are satisfied given they hold a particular set of characteristics (a category for each of the factors included in the model). This can be compared with the predicted probability for the reference group, who hold all the base case characteristics. A higher probability for a particular category (say, the category CH for the factor housing program), when compared the reference group, indicates that the category of interest (in the example just given, CH) is positively associated with tenant satisfaction in comparison to the base case (for housing program the base case is PH). A negative difference between the category of interest and the reference group indicates a negative association (for example, SOMIH versus the base case of PH).

The predicted probability (expressed as a percentage) was derived from the SAS PROC SURVEYLOGISTIC outputs, which were in the form of odds and odds ratios. This was done as follows (see ABS 2012; Eckel 2008):

Step 1. The predicted probability for the reference group was calculated. The log-odds for the reference group is reported in the SAS output as the model intercept. To convert this to a predicted probability, the log-odds was converted to odds by exponentiating the log-odds. The odds was then converted to a predicted probability using the formula:

Image is of an equation. Predicted probability equals odds divided by bracketed 1+odds, multiplied by 100.

Step 2. The odds ratio (reported in the SAS output) for each factor category was applied to the reference group odds (obtained from Step 1) to obtain the odds for that factor (with all other factors having the reference category values). This was then converted to a predicted probability using the formula provided in Step 1.

Step 3. The difference between the predicted probability for the factor category and the reference group was obtained.

Table C2 shows the predicted probability of the reference group for each model, and the number of observations for each.

Table C2: Summary of logistic regression models

 

M1 – All tenants

M2 – PH only

M3 – SOMIH

M4 – CH

Predicted probability of reference group (%)

92

90

96

95

Number of observations

5997

2917

1060

2020

Note: See Table C1 for the base case for each variable in the models— these are the characteristics of the reference groups

Factor by factor, the regression results presented in Table R.3 show:

The predicted probability of satisfaction for a tenant with the characteristics of the reference group (the base case categories combined), except in the factor of interest (category as shown).

The p value of model estimates – this indicates the level of confidence we can have in there being a relationship between a factor category and the outcome (satisfaction). The smaller the p value, the greater the confidence of an association between the factor and the outcome. A typical convention is to describe p values of less than 0.05 as being statistically significant (with a 95% level of confidence). However, there may be results that do not meet this standard but are still of importance or interest (perhaps they complement/align with other findings, or the magnitude of the association is large). Conversely, not all differences with a p value < 0.05 are necessarily important or noteworthy, especially if the effect is small.

An example will illustrate how to use the results from Table R.3 by examining the factor structural problems using Model 1 (M1). The preceding table (Table C2) shows the predicted probability of being satisfied for the reference group in M1 is 95%. The base case for the factor structural problems is 0 structural problems in the home. The results presented in Table R3 for the categories 1 structural problem through to 3 or more structural problems allow us to see the predicted change in satisfaction when comparing tenants with no structural problems to tenants with one or more, while holding all other factors constant. The predicted probability in M1 of being satisfied for tenants living with 3 or more structural problems is 72%. This is substantially lower than the probability of being satisfied for the reference group (95%), with a category of 0 structural problems. Not only is the effect large, it is also statistically significant (p<0.0001).