Psychometric Properties of the 4-Metre Walk Test in the Canadian Longitudinal Study on Aging

Ava Mehdipour; Marla Beauchamp; Julie Richardson; Ayse Kuspinar

doi:10.1017/S0714980826100592

Psychometric Properties of the 4-Metre Walk Test in the Canadian Longitudinal Study on Aging

Published online by Cambridge University Press: 01 April 2026

Julie Richardson and

Ava Mehdipour: Affiliation:
School of Rehabilitation Science, McMaster University Faculty of Health Sciences , Canada
Marla Beauchamp: Affiliation:
School of Rehabilitation Science, McMaster University Faculty of Health Sciences , Canada
Julie Richardson: Affiliation:
School of Rehabilitation Science, McMaster University Faculty of Health Sciences , Canada McMaster University Department of Health Research Methods Evidence and Impact , Canada
Ayse Kuspinar*: Affiliation:
School of Rehabilitation Science, McMaster University Faculty of Health Sciences , Canada
*: Corresponding author: La correspondance et les demandes de tirésàpart doivent être adressées à : / Correspondence and requests for offprints should be sent to: Ayse Kuspinar, School of Rehabilitation Science, McMaster University, 1400 Main St. W., IAHS, L8S 1C7, Hamilton, ON, Canada (kuspinaa@mcmaster.ca).

Article contents

Abstract
Objectives
Methods
Results
Conclusions
Background and objectives
Research design and methods
Results
Discussion and implications
Data availability statement
Author contribution
Competing interests
Disclaimer
Footnotes
References

Rights & Permissions

Abstract

Objectives

This study assessed the construct validity, predictive validity, and responsiveness of the 4-metre walk test (4MWT) in community-dwelling older Canadians.

Methods

Baseline and 3-year follow-up data from the Canadian Longitudinal Study on Aging were examined, including participants ≥ 65 years with 4MWT assessments. Secondary outcomes included physical and self-report measures and healthcare utilization (e.g., hospitalization and emergency department visits).

Results

Baseline data on 12,433 and follow-up data on 10,107 participants were analysed. For construct validity, low-to-high correlations with the comparator measures (rho = 0.25 [with the Life Space Assessment] to 0.72 [with the Timed-Up and Go]) and known-groups differences of 0.15 m/s (assistive device use) and 0.04 m/s (falls) were found. For predictive validity, areas under the curve ranged from 0.51 to 0.59 for healthcare utilization, indicating poor prediction. For responsiveness, low-to-moderate correlations between change scores were found (rho = 0.01–0.44).

Conclusions

Findings demonstrated partial support for construct validity and responsiveness and no support for predictive validity.

Résumé

Cette étude a évalué la validité conceptuelle, la validité prédictive et la réactivité du test de marche de 4 mètres (TM-4) chez les personnes âgées qui vivent à domicile au Canada. Les données de référence et les données de suivi sur 3 ans de l’Étude longitudinale canadienne sur le vieillissement ont été examinées, y compris celles de participants de plus de 65 ans qui ont fait l’objet d’évaluations TM-4. Les paramètres d’évaluation secondaires comprenaient des mesures physiques et autodéclarées et le taux d’utilisation des soins de santé (p. ex., hospitalisations et visites aux urgences). Les données de référence de 12 433 participants et les données de suivi de 10 107 participants ont été analysées. Sur le plan de la validité conceptuelle, des corrélations faibles à élevées avec les mesures de comparaison (rho = 0,25 [avec l’évaluation de l’espace de vie] à 0,72 [avec le test Timed-Up and Go]) et des différences entre groupes connus de 0,15 m/s (utilisation d’aides techniques) et 0,04 m/s (chutes) ont été observées. Sur le plan de la validité prédictive, les aires sous la courrbe variaient de 0,51 à 0,59 pour l’utilisation des soins de santé, ce qui indique une faible prédiction. Sur le plan de la réactivité, des corrélations faibles à modérées entre les scores de changement ont été observées (rho = 0,01–0,44). Les résultats ont démontré un soutien partiel pour la validité conceptuelle et la réactivité, et aucun soutien pour la validité prédictive.

Keywords

aging psychometric properties 4-metre walk test validity responsiveness CLSA vieillissement propriétés psychométriques test de marche de 4 mètres (TM-4)validité réactivité ÉLCV

Information

Type: Article
Information: Canadian Journal on Aging / La Revue canadienne du vieillissement , Volume 45 , Issue 2 , June 2026 , pp. 291 - 298

DOI: https://doi.org/10.1017/S0714980826100592 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright: © The Author(s), 2026. Published by Cambridge University Press on behalf of The Canadian Association on Gerontology

Background and objectives

The 4-metre walk test (4MWT) is one of the most widely used mobility tests in older adults (Rydwik et al., Reference Rydwik, Bergland, Forsen and Frändin2012). The 4MWT, with different protocol variations (i.e., speed and starting protocol), demonstrates sound psychometric properties in older adults. It is reliable when performed over-time (Kim et al., Reference Kim, Park, Lee and Lee2016) and correlates well with measures of lower extremity function (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012; Kim et al., Reference Kim, Park, Lee and Lee2016; Mansson et al., Reference Mansson, Pettersson, Rosendahl, Skelton, Lundin-Olsson and Sandlund2022) and activities of daily living (ADL) (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012). Moreover, the 4MWT is able to discriminate between different groups of individuals, such as those with frailty (Lee et al., Reference Lee, Patel, Costa, Bryce, Hillier, Slonim and Molnar2017) and mobility limitations (Riwniak et al., Reference Riwniak, Simon, Wages, Clark, Manini, Russ and Clark2020). There is also evidence in support of its predictive validity for frailty (Sutorius et al., Reference Sutorius, Hoogendijk, Prins and van Hout2016), multiple falls (Viccaro et al., Reference Viccaro, Perera and Studenski2011), ADL difficulties (Viccaro et al., Reference Viccaro, Perera and Studenski2011), and mortality (Veronese et al., Reference Veronese, Stubbs, Fontana, Trevisan, Bolzetta, Rui and Sergi2017) in older adults.

Although the 4MWT has been widely used and evaluated, clinicians and researchers should be cognisant of its psychometric evidence in relation to testing conditions, including its test protocol, environment, and participant population. Protocol variations to the test, such as its speed and starting protocol, can affect its reliability and validity; thus, findings should be interpreted with respect to the protocol (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Canada’s unique environmental factors, such as its harsh winter conditions, can lead to decreased walking, fear of falls, and fall-related emergency department visits (Bergen et al., Reference Bergen, Jubenvill, Shaw, Steen, Loewen, Mbabaali and Barclay2023; Huynh et al., Reference Huynh, Tracy, Thompson, Bang, McFaull, Curran and Villeneuve2021), and its universal healthcare system impacts healthcare decisions and polices for older adults (e.g., funding decisions for mobility aids). In 2022, 10.6% Canadians (above the age of 15) were reported to have mobility disability, with its prevalence varying by sex and age (Mobility disabilities, 2022, 2024). Therefore, when interpreting the 4MWT scores in older Canadians, clinicians and researchers should make decisions in relation to evidence found in older Canadians. For example, if the 4MWT is not predictive of falls, then 4MWT scores may not be informative for decisions regarding fall interventions. In Canadian older adults, the 4MWT has been found to be reliable between raters (Intraclass Correlation Coefficient [ICC] = 0.95–0.98) and within raters (ICC = 0.76–0.91) (Katsoulis et al., Reference Katsoulis, Mathur and Amara2021). Beauchamp et al. (Reference Beauchamp, Hao, Kuspinar, D’Amore, Scime, Ma and Kirkland2021) evaluated the intra-rater reliability of the 4MWT in a subsample of the Canadian Longitudinal Study on Aging (CLSA), a large national study on older adults, and reported an ICC of 0.67–0.69 in individuals 65 and older. To our knowledge, there is limited information with regard to the 4MWT’s validity in Canadian older adults. Only one study evaluated its construct validity and found that the 4MWT was able to discriminate between non-frail and frail Canadians (accuracy = 94.2%) (Lee et al., Reference Lee, Patel, Costa, Bryce, Hillier, Slonim and Molnar2017). To ensure that the 4MWT can be employed in research and clinical practice to successfully evaluate Canadian older adults’ mobility status, its ability to accurately capture Canadian older adults’ mobility needs to be determined. In a previous study using CLSA data, the 4MWT was found to have poor predictive ability in identifying fallers, raising questions about the utility of the test for examining fall risk (Beauchamp et al., Reference Beauchamp, Kuspinar, Sohel, Mayhew, D’Amore, Griffith and Raina2022). There remains a need to evaluate the extent to which the 4MWT has predictive validity for other adverse outcomes, such as hospitalization and emergency department visits, in Canadian community-dwelling older adults. If found to be predictive of healthcare utilization, such as healthcare professional (HCP) visits, hospitalization, and emergency department visits, the 4MWT could be used as a screening tool for these outcomes. Furthermore, there is limited research on the responsiveness of the 4MWT in older adults. Responsiveness is an important psychometric property as it determines whether the tool can capture change in mobility over time.

The primary aim of this study was to evaluate the psychometric properties of the 4MWT in Canadian community-dwelling older adults. The CLSA data set was used to evaluate the psychometric properties of the 4MWT since it is a large, comprehensive study that provides longitudinal data in a population-based sample of older Canadians (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Oremus, Patterson and Hogan2009). This study aimed to address the following objectives:

1. To assess the construct validity (convergent and known-groups validity) of the 4MWT for accurately reflecting mobility in Canadian community-dwelling older adults.
2. To determine the predictive validity of the 4MWT for healthcare utilization at 3 years in Canadian community-dwelling older adults.
3. To assess the ability of the 4MWT to detect changes in mobility over time (responsiveness) in Canadian community-dwelling older adults.

Moreover, to ensure appropriateness of scores for both males and females and different age groups, the secondary aim of this study was to evaluate the psychometric properties of the 4MWT in Canadian community-dwelling older adults by sex and age subgroups.

Research design and methods

Sample

Secondary data analyses of baseline and first follow-up data from the CLSA comprehensive cohort were performed to address the objectives of this study. Data from individuals who were ≥ 65 years and completed the 4MWT at baseline were only examined. The CLSA is a large, nation-wide, longitudinal study of around 50,000 people between 45 and 85 years residing in Canada, being tracked for various outcomes prospectively over 20 years (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Oremus, Patterson and Hogan2009, Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). The CLSA included individuals who were community-dwelling at baseline, spoke English or French, and provided consent (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). Those who had cognitive impairments, were living on First Nations reserves, were full-time members of the Canadian Armed Forces, or were living in institutions (e.g., long-term care) were excluded (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). The CLSA consists of two cohorts: the comprehensive (n = ~30,000) and tracking (n = ~20,000). Physical tests including the 4MWT were administered in the comprehensive cohort; therefore, only data from this cohort were examined. Baseline data collection was completed between 2011 and 2015, and first follow-up data collection was completed between 2015 and 2018. For objective 1, cross-sectional analyses of the baseline data were conducted and for objectives 2 and 3, longitudinal analyses of the baseline and first follow-up data (i.e., 3-year follow-up) were conducted. Individuals with incomplete or missing data were excluded. Ethics approval was received by the Hamilton Integrated Research Ethics Board (#15460).

Outcomes

The primary outcome measure was the 4MWT collected at baseline and first follow-up. Data collection was performed at a data collection site (11 sites across Canada) by research staff using a standardized procedure (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019; Timed (4-metre) Walk Test, 2014). Participants were instructed to walk a 4-metre distance at their usual pace from a static start (no acceleration zone) with no deceleration and were timed by an assessor using a stopwatch. The participant was instructed to start with their feet behind the starting line and the timer started after the staff said ‘Ready, Set, Go’ and stopped when the participant completely crossed the 4-metre finish line. Scores were reported in seconds but converted to metres per second for this analysis to represent walking speed.

Outcome measures used to assess convergent validity and responsiveness were the Timed-Up and Go (TUG) (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991), the Standing Balance test (Measuring Standing Balance, 2014), the Life-Space Assessment (LSA) (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005), and the Basic ADL and Instrumental ADL (IADL) questionnaire modified from the Older Americans’ Resources and Services (OARS) Multidimensional Assessment Questionnaire (Fillenbaum, Reference Fillenbaum2013). Baseline assessments were used to evaluate convergent validity, and change scores between baseline and first follow-up (i.e., 3 years) were used to assess responsiveness. The TUG is a reliable and valid performance-based measure for older adults that assesses mobility and balance (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991; Spagnuolo et al., Reference Spagnuolo, Jürgensen, Iwama and Dourado2010). Participants are instructed to stand up from a chair, walk 3-metres, turn 180 degrees, and walk and sit back down (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991). The TUG is scored in seconds, starting from when the participant stands up and stopping when they sit down. The standing balance test is a measure of balance, highly correlated with age (Bohannon et al., Reference Bohannon, Larkin, Cook, Gear and Singer1984). Participants are asked to hold their balance on one foot for as long as possible and then repeat on the other foot (Measuring Standing Balance, 2014). The test is scored in seconds with the timer stopping when the participant loses their balance, with a maximum time of 60 seconds. The best time (between left and right leg) was considered for this analysis. The LSA is a psychometrically sound measure of life-space mobility (i.e., the extent of movement within one’s environment) in community-dwelling older adults (Kuspinar et al., Reference Kuspinar, Mehdipour, Beauchamp, Hao, Cino, Mikton and Raina2023). The LSA consists of five life-space levels, from inside one’s home to outside of their town, with scoring including frequency of visit and assistance (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). The LSA is scored from 0 to 120, with higher scoring indicating more life-space mobility (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). The modified OARS questionnaire used in the CLSA consists of 41 items covering ADLs and IADLs, with a derived total score ranging from 1, no functional impairment, to 5, total impairment, where meal preparation is given extra weighting.

For known-groups validity, history of falls (responding yes/no to ‘In the past 12 months, did you have any falls?’) and use of assistive devices at baseline were examined. Use of assistive devices was dichotomized into users and non-users (in the past 12 months) by considering participants’ responses to using the following devices: cane/walking stick, wheelchair, motorized scooter, walker, or leg braces. For predictive validity, healthcare utilization outcomes, such as HCP contact (family doctor, medical specialist, and rehabilitation specialist), emergency department visit, and hospitalization (i.e., hospital stay overnight) in the past 12 months, at 3 years were examined.

Analysis

Descriptive statistics (mean [standard deviation], median [interquartile range], or frequency [percentage]) were used to summarize sample characteristics and outcomes. Subgroup analyses were performed for sex (males and females) and age (65–74 years and 75+ years) groups, and data were stratified based on baseline sex and age. To determine convergent validity and responsiveness of the 4MWT, Pearson’s correlation coefficients were computed if data were normally distributed and Spearman’s correlation coefficients if data were not normally distributed. For convergent validity, high correlations of at least 0.50 were expected with the TUG as both the 4MWT and TUG are performance-based measures involving walking, and moderate correlations of 0.30–0.50 were expected with the standing balance test, the LSA, and the modified OARS as they measure related but dissimilar constructs (Mokkink et al., Reference Mokkink, Prinsen, Patrick, Alonso, Bouter, de Vet and Mokkink2018). Correlations below 0.30 were considered to be low, indicating unrelated and dissimilar constructs. For responsiveness, correlations among change scores were expected to be 0.10 units less than those hypothesized for convergent validity to account for measurement error from multiple administrations (i.e., high correlations of at least 0.40 for TUG and moderate correlations of 0.20–0.40 for the other comparators, with low correlations considered to be <0.20) (De Vet et al., Reference De Vet, Terwee, Mokkink and Knol2011). Tables 3 and 6 provide hypotheses for correlations, including the direction and magnitude of the correlation. To determine known-groups validity, statistical significance between groups were evaluated with independent t-tests if data were normally distributed and Wilcoxon test if data were not normally distributed. Using minimal important change values for the 4MWT (Perera et al., Reference Perera, Mody, Woodman and Studenski2006), a difference of at least 0.1 m/s was expected between groups to indicate evidence for known-groups validity. To determine predictive validity, Receiver Operating Characteristic curves were computed and areas under the curve (AUC) of at least 0.70 (on a 0–1 scale) were expected for acceptable discrimination (Mokkink et al., Reference Mokkink, Prinsen, Patrick, Alonso, Bouter, de Vet and Mokkink2018). AUC values ranging from 0 to 0.69 indicate inadequate discrimination with 0.50 indicating discrimination due to chance. Statistical significance was set at a p-value less than 0.05. Stata, version 15.1 (StataCorp, College Station, TX, USA), was used to perform the analyses.

Sample size

The CLSA sample size for participants ≥ 65 years in the comprehensive cohort is approximately 12,000. Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) guidelines recommend larger samples (> 100) for validation studies as confidence in the statistical values (e.g., correlations, AUCs) increases with larger samples (De Vet et al., Reference De Vet, Terwee, Mokkink and Knol2011). Therefore, a sample of ~12,000 allows for stratified analyses and confidence in results.

Results

Sociodemographic characteristics

The number of participants above the age of 65 in the cohort was 12,646. However, only 12,433 participants had baseline 4MWT scores and thus were included in the analysis for convergent and known-groups validity. Follow-up data on the 4MWT test were available for 10,107 of those participants; thus, they were included for predictive validity and responsiveness. The sample was predominantly white (96%), married (63%), completed a postsecondary education (72%), and indicated a household income over $ 20,000 (93%). Sociodemographic information of included participants can be found in Table 1 and Supplementary Tables 1 and 2.

Table 1.

Sociodemographic characteristics for entire sample (n = 12,433)

A data table summarizing sociodemographic characteristics for 12,433 participants, including age, sex, race, marital status, dwelling, education, health, mobility aids, income, falls, and chronic conditions. See long description.

^a Reported in median (IQR).

4MWT distribution

Table 2 displays distribution of measure scores for convergent validity and responsiveness (subgroup distributions can be found in Supplementary Tables 3 and 4). Baseline 4MWT mean scores were 0.91 (0.20) m/s for the entire sample, 0.95 (0.19) m/s for those between 65 and 74 years (n = 7,244), 0.86 (0.19) m/s for those greater than 75 years (n = 5,189), 0.89 (0.20) m/s for females (n = 6,197), and 0.93 (0.20) m/s for males (n = 6,236).

Table 2.

Distribution of the 4MWT and comparator measures for convergent validity and responsiveness for entire sample

A data table comparing 4MWT, TUG, standing balance, LSA, and modified OARS across convergent validity and responsiveness metrics for a large sample. See long description.

Note: The scores reported are for participants with baseline 4MWT scores.

4MWT = 4-metre walk test; LSA = Life Space Assessment; OARS = Older Americans’ Resources and Services questionnaire; SD=standard deviation; TUG = Timed-Up and Go.

^a Reported in median (IQR).

^b Scores included as part of the responsiveness analysis (i.e., participants with both baseline and follow-up scores).

Construct validity

Table 3 outlines convergent validity results for the entire sample (Supplementary Tables 5 and 6 display subgroup results). The correlation between the 4MWT and the TUG was rho = −0.72 for the entire sample, ranging from -0.72 to -0.68 for sex and age subgroups. The correlation between the 4MWT and the standing balance test was rho = 0.32 for the entire sample. When analysed by subgroup, the correlation was above 0.30 for females but below 0.30 for both age groups and for males. Correlations between the 4MWT and both the LSA and the modified OARS questionnaires were below the expected 0.30 cut-off for the entire sample (rho = 0.25 and − 0.26, respectively) and all subgroups. For convergent validity, 50% of hypotheses were met for the entire sample and the female subgroup, while 25% were met for other subgroups.

Table 3.

Convergent validity results of the 4MWT for entire sample

A four-column table summarizing convergent validity of 4MWT with T U G, standing balance, L S A, and modified O A R S, showing correlation values and whether each hypothesis was met. See long description.

LSA, Life Space Assessment; OARS, Older Americans’ Resources and Services questionnaire; TUG, Timed-Up and Go.

+, hypothesis met; −, hypothesis not met.

Table 4 outlines known-groups validity results for the entire sample (Supplementary Tables 7 and 8 display subgroups results). Although the 4MWT was statistically different between both fallers and assistive device user groups, the difference was greater than 0.1 m/s only for assistive device user groups (0.15 m/s for entire sample and 0.13–0.17 m/s for subgroups). For known-groups validity, 50% of hypotheses were met for the entire sample and all subgroups. Overall, 50% of results supported the construct validity of the 4MWT, with 33% to 50% of results supporting construct validity within different sex and age subgroups.

Table 4.

Known-groups validity results of the 4MWT for entire sample

A data table comparing 4 M W T scores for falls and assistive device use groups, showing mean differences and statistical significance for each outcome. See long description.

SD, Standard Deviation.

+, hypothesis met; −, hypothesis not met.

Predictive validity

Table 5 outlines predictive validity results for the entire sample (Supplementary Tables 9 and 10 display subgroups results). For the entire sample, AUC values ranged from 0.51 to 0.59 for all healthcare utilization outcomes. None of the a priori hypotheses (0%) were met for predictive validity, for the entire sample nor subgroups, with all AUC values being less than 0.70.

Table 5.

Predictive validity results of the 4MWT for entire sample

A data table comparing 4 M W T predictive validity for five health outcomes at three years, showing mean scores, A U C values, and hypothesis results for Yes and No responses. See long description.

CI, confidence interval; ED, emergency department; SD, standard deviation.

+, hypothesis met; −, hypothesis not met.

Responsiveness

Table 6 outlines responsiveness results for the entire sample (Supplementary Tables 11 and 12 display subgroup results). The correlation between 4MWT and TUG change scores was rho = −0.44 for the entire sample, with similar results for sex and age subgroups. Correlations of change scores between the 4MWT and other measures, such as the standing balance test, LSA, and modified OARS, were below the expected 0.20 cut-off for the entire sample (rho = 0.01, 0.04, and − 0.06, respectively) and all subgroups. Only hypotheses regarding correlations with the TUG were met for the entire sample and subgroups, resulting in 25% of results supporting responsiveness.

Table 6.

Responsiveness results of the 4MWT for entire sample

A four-row table compares 4MWT responsiveness to TUG, standing balance, LSA, and Modified OARS, showing only TUG meets the hypothesized correlation. See long description.

LSA, Life Space Assessment; OARS, Older Americans’ Resources and Services questionnaire; TUG, Timed-Up and Go.

+, hypothesis met; −, hypothesis not met.

Discussion and implications

Findings from this psychometric study suggest partial support for the construct validity and responsiveness of the 4MWT in Canadian older adults, with no support for its predictive validity for healthcare utilization. These findings were largely consistent across age and sex subgroups. Results from this study provide important evidence to inform the use of the 4MWT in research and clinical practice.

The construct validity of the 4MWT was evaluated to explore its accuracy in Canadian older adults. Our findings demonstrated moderate to high correlations (>0.50) with other performance-based mobility measures, such as the TUG and standing balance, and lower correlations (<0.30) with self-report measures of mobility and function. Therefore, the 4MWT reflects the construct of locomotor capacity better than self-report mobility. Both self-report measures in this study included concepts outside of one’s walking ability; thus, findings of lower correlations can be explained. For example, the LSA and the modified OARS questionnaires only include 1–2 items regarding one’s ability to walk (Fillenbaum, Reference Fillenbaum2013; Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). European-based studies in older adults found a correlation of rho = 0.41 between walking speed and LSA (Ullrich et al., Reference Ullrich, Werner, Bongartz, Kiss, Bauer and Hauer2019) and r = 0.39–0.63 between walking speed tests and measures of ADL and IADL (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012; Nybo et al., Reference Nybo, Gaist, Jeune, McGue, Vaupel and Christensen2001). Our findings indicate that walking speed may not be fully reflective of constructs of ‘life-space mobility’ and ‘daily activities’ in Canadian older adults. Thus, inferences regarding older adults’ daily activities, such as housework or self-care, and the extent to which one moves within their environment should not be made from one’s walking speed score. Even though more hypotheses were met for the construct validity of the 4MWT in females compared to males, differences in correlations were within 0.10. Good quality studies have demonstrated the validity of walking speed tests in accurately reflecting older adults’ mobility across different countries and in comparison with different health outcome measures (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024).

To inform its construct validity, known-groups validity of the 4MWT was also evaluated, providing information for its ability to discriminate between groups. We tested two hypotheses: (1) difference between fallers/non-fallers and (2) assistive device users/non-users, and although both were found to be statistically different, only differences between assistive device users and non-users were found to be clinically important. Studies evaluating walking speed in older adults have also found a clinically important difference of at least 0.1 m/s between unaided and aided (e.g., using a cane, crutches, or a walker) groups (Kingston et al., Reference Kingston, Ferwerda, Fontaine, Keeping, Stewart, Ward and Zucker-Levin2021; Weiss et al., Reference Weiss, Seplaki, Wolff, Kasper and Agree2008). Thus, clinical inferences regarding older adults’ need for assistive devices can be made through walking speed assessments. Although other studies have found a difference of at least 0.10 m/s between fallers and non-fallers, such as 0.13 m/s (Morita et al., Reference Morita, Takamura, Kusano, Abe, Moji, Takemoto and Aoyagi2005), walking speed and its association with falls has not been strongly supported cross-sectionally nor prospectively with good-quality studies reporting AUCs less than 0.70 for known-groups validity (AUC = 0.69) (Middleton et al., Reference Middleton, Fulk, Herter, Beets, Donley and Fritz2016) and for predictive validity (AUC = 0.57–0.62) (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022; Beauchamp et al., Reference Beauchamp, Kuspinar, Sohel, Mayhew, D’Amore, Griffith and Raina2022).

This was the first study to explore the 4MWT’s predictive validity for healthcare utilization in community-dwelling older Canadians and for HCP and emergency department visits in community-dwelling older adults more broadly (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Our findings did not meet any of the a priori hypotheses, and thus, the 4MWT failed to predict healthcare utilization at 3 years in community-dwelling older Canadians. Previous studies evaluating the predictive validity of walking speed tests in community-dwelling older adults, including the 4MWT, for hospitalization also found AUCs less than 0.70 (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022; Viccaro et al., Reference Viccaro, Perera and Studenski2011). Findings from the literature and our study suggest that walking speed may not have strong predictive validity for prospective hospitalization outcomes. Although the predictive validity of walking speed tests has not been previously tested for HCP and emergency department visits in community-dwelling older adults, studies have examined the association of walking speed with prospective emergency department or general practitioner visits in other older adult populations, such as primary care patients and cancer patients, and found walking speed to not be significantly associated with such outcomes (O’Hoski et al., Reference O’Hoski, Bean, Ma, So, Kuspinar, Richardson and Beauchamp2020; Puts et al., Reference Puts, Monette, Girre, Wolfson, Monette, Batist and Bergman2010). It is important to note that healthcare utilization outcomes, such as hospitalization, were not exclusive to mobility-related injuries or conditions, explaining the poor predictive performance. For example, in contrast, the 4MWT has been found to be predictive of mobility-related outcomes such as frailty (AUC = 0.75 and 0.87) (Sutorius et al., Reference Sutorius, Hoogendijk, Prins and van Hout2016). Future studies can explore the predictive validity of the 4MWT in relation to mobility-related healthcare utilization.

This was the first study to examine the responsiveness of the 4MWT in a Canadian population. Our study only demonstrated support for the responsiveness of the 4MWT in community-dwelling older adults over time when compared to a physical measure of mobility that assessed walking (i.e., TUG), with correlations greater than 0.40. Correlations with balance and self-report measures of mobility and function ranged from 0.01 to 0.08. Therefore, only changes in physical mobility as it pertains to walking can be reflected by the 4MWT over 3 years. Only three other studies have evaluated the responsiveness of walking speed tests in community-dwelling older adults, with only one study reporting correlation of change scores with other health measures (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Mansson et al. (Reference Mansson, Pettersson, Rosendahl, Skelton, Lundin-Olsson and Sandlund2022) reported correlations ranging from 0.10 to 0.37 between change scores on the 4MWT and physical tests and 0.01–0.26 with self-report measures of mobility in community-dwelling older adults.

Validity evidence is contextual and should not be made in relation for the measure itself but rather the context of testing (Messick, Reference Messick1995). Not only should our findings be interpreted in relation Canadian community-dwelling older adults, but it should also be interpreted in relation to the 4MWT protocol. Walking speed tests with acceleration and deceleration zones have been found to report faster and more reliable speeds as they allow for speeding up and slowing down to capture one’s usual walking speed (Kim & Won, Reference Kim and Won2019; Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Thus, validity findings might be different if the 4MWT is administered using a dynamic start/end protocol.

The strengths of this study include the use of a large Canadian sample with longitudinal data and the inclusion of an array of outcome measures. The inclusion of the list of outcomes in the CLSA enables the psychometric testing of the 4MWT against different health measures, providing more evidence to inform its use and the type of measurement inferences that can be made by HCPs (e.g., walking speed strongly reflecting older adult’s physical mobility).

As a secondary analysis of existing data, this study was constrained by a pre-existing data set and predetermined study design. Consequently, limitations included aspects of the study design (e.g., the absence of acceleration and deceleration zones for the 4MWT) and the availability of comparison measures used to test hypotheses for validity and responsiveness. Moreover, although the CLSA aimed to capture a sample representative of Canadian older adults, majority of participants identified as white and had a postsecondary education. Also, speaking English and/or French was an eligibility criterion. Thus, findings cannot be applied to diverse social backgrounds. Furthermore, we recognize that our interpretations for construct validity and responsiveness were based on observed correlations and not the lower bound of the 95% confidence interval. However, this did not affect our overall conclusions as our confidence intervals were narrow given the large sample size in this study. Additionally, the 3-year follow-up may have influenced psychometric findings. For example, a previous study found higher AUC values for walking speed’s ability to predict hospitalizations when tested at 1 year, compared to 4 years (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022). With respect to our evaluation, AUC values may have been higher for healthcare utilization if it were evaluated at 1 year versus 3 years.

Using a large, population-based data set, we were able to demonstrate that the 4MWT reflected constructs of physical mobility cross-sectionally and longitudinally. Thus, providing evidence for its construct validity and responsiveness in community-dwelling older Canadians in relation to capturing the construct of locomotor capacity. Clinicians and researchers can use the 4MWT to track and monitor older adults’ locomotor capacity without making inferences on their daily life mobility, such as the extent of movement within their environment. Since we did not find evidence for its ability to predict healthcare utilization over 3 years in this population, analysts and policymakers should be cautioned against using the 4MWT to make long-term decisions regarding healthcare resources and costs for older Canadians.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/S0714980826100592.

Data availability statement

The data underlying this article were provided by the Canadian Longitudinal Study on Aging and cannot be publicly shared. Data are available from the Canadian Longitudinal Study on Aging (www.clsa-elcv.ca) for researchers who meet the criteria for access to de-identified CLSA data.

Acknowledgements

This research was made possible using the data/biospecimens collected by the Canadian Longitudinal Study on Aging (CLSA). Funding for the Canadian Longitudinal Study on Aging (CLSA) is provided by the Government of Canada through the Canadian Institutes of Health Research (CIHR) under grant reference: LSA 94473 and the Canada Foundation for Innovation, as well as the following provinces, Newfoundland, Nova Scotia, Quebec, Ontario, Manitoba, Alberta, and British Columbia. This research has been conducted using the CLSA data set Baseline Comprehensive Dataset version 7.0 and Follow-up 1 Comprehensive Dataset version 4.0, under Application Number 2209021. The CLSA is led by Drs. Parminder Raina, Christina Wolfson, and Susan Kirkland.

Author contribution

A.M., M.B., J.R., and A.K. contributed to the study conceptualization and design. A.M. contributed to data analysis and interpretation. A.M. drafted and edited the manuscript and M.B., J.R., and A.K. provided critical revisions. A.K. was the supervising author on the project.

Competing interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Disclaimer

The opinions expressed in this manuscript are the author’s own and do not reflect the views of the Canadian Longitudinal Study on Aging.

Footnotes

This project was completed under the affiliation presented above for Ava Mehdipour. Ava Mehdipour’s new affiliation: School of Nursing, Trinity Western University, Langley, BC, Canada.

References

Abolhassani, N., Fustinoni, S., & Henchoz, Y. (2022). Slowness as a predictor of functional decline in older adults: Comparison of Moberg picking-up test and walking speed. Journal of the American Medical Directors Association, 23(10), 1705–1705. https://doi.org/10.1016/j.jamda.2022.07.016.CrossRef Google Scholar PubMed

Beauchamp, M. K., Hao, Q., Kuspinar, A., D’Amore, C., Scime, G., Ma, J., … Kirkland, S. (2021). Reliability and minimal detectable change values for performance-based measures of physical functioning in the Canadian longitudinal study on aging. The Journals of Gerontology: Series A, 76(11), 2030–2038.Google Scholar PubMed

Beauchamp, M. K., Kuspinar, A., Sohel, N., Mayhew, A., D’Amore, C., Griffith, L. E., & Raina, P. (2022). Mobility screening for fall prediction in the Canadian longitudinal study on aging (CLSA): Implications for fall prevention in the decade of healthy ageing. Age and Ageing, 51(5), afac095.CrossRef Google Scholar PubMed

Bergen, K., Jubenvill, M., Shaw, K., Steen, E., Loewen, H., Mbabaali, S., & Barclay, R. (2023). Factors associated with outdoor winter walking in older adults: A scoping review. Canadian Journal on Aging/La Revue canadienne du vieillissement, 42(2), 316–327.Google Scholar PubMed

Bohannon, R. W., Larkin, P. A., Cook, A. C., Gear, J., & Singer, J. (1984). Decrease in timed balance test scores with aging. Physical Therapy, 64(7), 1067–1070.CrossRef Google Scholar PubMed

De Vet, H. C., Terwee, C. B., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine: A practical guide. Cambridge University Press.CrossRef Google Scholar

Fillenbaum, G. G. (2013). Multidimensional functional assessment of older adults: The Duke older Americans resources and services procedures. Psychology Press.CrossRef Google Scholar

Fusco, O., Ferrini, A., Santoro, M., Lo Monaco, M. R., Gambassi, G., & Cesari, M. (2012). Physical function and perceived quality of life in older persons. Aging Clinical and Experimental Research, 24(1), 68–73. http://doi.org/10.1007/BF03325356.CrossRef Google Scholar PubMed

Huynh, D., Tracy, C., Thompson, W., Bang, F., McFaull, S. R., Curran, J., & Villeneuve, P. J. (2021). Associations between meteorological factors and emergency department visits for unintentional falls during Ontario winters. Health Promotion and Chronic Disease Prevention in Canada, 41(12), 401–412.CrossRef Google Scholar PubMed

Katsoulis, K., Mathur, S., & Amara, C. E. (2021). Reliability of lower extremity muscle power and functional performance in healthy, older women. Journal of Aging Research, 1–9. https://doi.org/10.1155/2021/8817231.CrossRef Google Scholar PubMed

Kim, H.-J., Park, I., Lee, H. J., & Lee, O. (2016). The reliability and validity of gait speed with different walking pace and distances against general health, physical function, and chronic disease in aged adults. Journal of Exercise Nutrition & Biochemistry, 20(3), 46–50.CrossRef Google Scholar PubMed

Kim, M., & Won, C. W. (2019). Combinations of gait speed testing protocols (automatic vs manual timer, dynamic vs static start) can significantly influence the prevalence of slowness: Results from the Korean frailty and aging cohort study. Archives of Gerontology and Geriatrics, 81, 215–221. https://doi.org/10.1016/j.archger.2018.12.009.CrossRef Google Scholar PubMed

Kingston, D. C., Ferwerda, S., Fontaine, C., Keeping, M., Stewart, J., Ward, R., … Zucker-Levin, A. R. (2021). Implications of walking aid selection for Nonweightbearing ambulation on stance limb plantar force, walking speed, perceived exertion, and device preference in healthy adults 50 years of age and older. Foot & Ankle Orthopaedics, 6(1), 2473011421998939.CrossRef Google Scholar PubMed

Kuspinar, A., Mehdipour, A., Beauchamp, M. K., Hao, Q., Cino, E., Mikton, C., … Raina, P. (2023). Assessing the measurement properties of life-space mobility measures in community-dwelling older adults: A systematic review. Age and Ageing, 52(Suppl. 4), iv86–iv99.CrossRef Google Scholar PubMed

Lee, L., Patel, T., Costa, A., Bryce, E., Hillier, L. M., Slonim, K., … Molnar, F. (2017). Screening for frailty in primary care: Accuracy of gait speed and hand-grip strength. Canadian family physician Medecin de famille canadien, 63(1), e51–e57.Google Scholar PubMed

Mansson, L., Pettersson, B., Rosendahl, E., Skelton, D. A., Lundin-Olsson, L., & Sandlund, M. (2022). Feasibility of performance-based and self-reported outcomes in self-managed falls prevention exercise interventions for independent older adults living in the community. BMC Geriatrics, 22(1), 147. https://doi.org/10.1186/s12877-022-02851-9.CrossRef Google Scholar PubMed

Measuring Standing Balance. (2014). Canadian longitudinal study on aging. https://www.clsa-elcv.ca/wp-content/uploads/2023/06/sop_dcs_0023_v2.1_2014aug20_baseline_final_watermark.pdf Google Scholar

Mehdipour, A., Malouka, S., Beauchamp, M., Richardson, J., & Kuspinar, A. (2024). Measurement properties of the usual and fast gait speed tests in community-dwelling older adults: A COSMIN-based systematic review. Age and Ageing, 53(3), afae055.CrossRef Google Scholar PubMed

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741.CrossRef Google Scholar

Middleton, A., Fulk, G. D., Herter, T. M., Beets, M. W., Donley, J., & Fritz, S. L. (2016). Self-selected and maximal walking speeds provide greater insight into fall status than walking speed reserve among community-dwelling older adults. American Journal of Physical Medicine & Rehabilitation, 95(7), 475–482. https://doi.org/10.1097/PHM.0000000000000488.CrossRef Google Scholar PubMed

Mobility Disabilities, 2022. (2024). Statistics Canada. https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2024056-eng.htm#:~:text=In%202022%2C%2010.6%25%20of%20Canadians,individuals)%20had%20a%20mobility%20disability.Google Scholar

Mokkink, L. B., Prinsen, C., Patrick, D. L., Alonso, J., Bouter, L., de Vet, H. C., … Mokkink, L. (2018). COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs). User Manual, 78(1), 6–63.Google Scholar

Morita, M., Takamura, N., Kusano, Y., Abe, Y., Moji, K., Takemoto, T.-i., & Aoyagi, K. (2005). Relationship between falls and physical performance measures among community-dwelling elderly women in Japan. Aging Clinical and Experimental Research, 17, 211–216.CrossRef Google Scholar PubMed

Nybo, H., Gaist, D., Jeune, B., McGue, M., Vaupel, J. W., & Christensen, K. (2001). Functional status and self-rated health in 2,262 nonagenarians: The Danish 1905 cohort survey. Journal of the American Geriatrics Society, 49(5), 601–609. https://doi.org/10.1046/j.1532-5415.2001.49121.x.CrossRef Google Scholar

O’Hoski, S., Bean, J. F., Ma, J., So, H. Y., Kuspinar, A., Richardson, J., … Beauchamp, M. K. (2020). Physical function and frailty for predicting adverse outcomes in older primary care patients. Archives of Physical Medicine and Rehabilitation, 101(4), 592–598.CrossRef Google Scholar PubMed

Peel, C., Baker, P. S., Roth, D. L., Brown, C. J., Bodner, E. V., & Allman, R. M. (2005). Assessing mobility in older adults: The UAB study of aging life-space assessment. Physical Therapy, 85(10), 1008–1019.CrossRef Google Scholar PubMed

Perera, S., Mody, S. H., Woodman, R. C., & Studenski, S. A. (2006). Meaningful change and responsiveness in common physical performance measures in older adults. Journal of the American Geriatrics Society, 54(5), 743–749.CrossRef Google Scholar PubMed

Podsiadlo, D., & Richardson, S. (1991). The timed “up & go”: A test of basic functional mobility for frail elderly persons. Journal of the American Geriatrics Society, 39(2), 142–148.CrossRef Google Scholar

Puts, M., Monette, J., Girre, V., Wolfson, C., Monette, M., Batist, G., & Bergman, H. (2010). Does frailty predict hospitalization, emergency department visits, and visits to the general practitioner in older newly-diagnosed cancer patients? Results of a prospective pilot study. Critical Reviews in Oncology/Hematology, 76(2), 142–151.CrossRef Google Scholar

Raina, P., Wolfson, C., Kirkland, S., Griffith, L. E., Balion, C., Cossette, B., … Van Den Heuvel, E. (2019). Cohort profile: The Canadian longitudinal study on aging (CLSA). International Journal of Epidemiology, 48(6), 1752–1753j.CrossRef Google Scholar PubMed

Raina, P. S., Wolfson, C., Kirkland, S. A., Griffith, L. E., Oremus, M., Patterson, C., … Hogan, D. (2009). The Canadian longitudinal study on aging (CLSA). Canadian Journal on Aging/La Revue canadienne du vieillissement, 28(3), 221–229.Google Scholar PubMed

Riwniak, C., Simon, J. E., Wages, N. P., Clark, L. A., Manini, T. M., Russ, D. W., & Clark, B. C. (2020). Comparison of a multi-component physical function battery to usual walking speed for assessing lower extremity function and mobility limitation in older adults. Journal of Nutrition, Health and Aging, 24(8), 906–913. http://doi.org/10.1007/s12603-020-1432-2.CrossRef Google Scholar PubMed

Rydwik, E., Bergland, A., Forsen, L., & Frändin, K. (2012). Investigation into the reliability and validity of the measurement of elderly people’s clinical walking speed: A systematic review. Physiotherapy Theory and Practice, 28(3), 238–256.CrossRef Google Scholar PubMed

Spagnuolo, D. L., Jürgensen, S. P., Iwama, Â. M., & Dourado, V. Z. (2010). Walking for the assessment of balance in healthy subjects older than 40 years. Gerontology, 56(5), 467–473.CrossRef Google Scholar PubMed

Sutorius, F. L., Hoogendijk, E. O., Prins, B. A. H., & van Hout, H. P. J. (2016). Comparison of 10 single and stepped methods to identify frail older persons in primary care: Diagnostic and prognostic accuracy. BMC Family Practice, 17(100967792), 102. https://doi.org/10.1186/s12875-016-0487-y.CrossRef Google Scholar PubMed

Timed (4-metre) Walk Test. (2014). Canadian longitudinal study on aging. https://www.clsa-elcv.ca/wp-content/uploads/2023/06/sop_dcs_0021_v1.2_2014jul10_baseline_final_watermark.pdf Google Scholar

Ullrich, P., Werner, C., Bongartz, M., Kiss, R., Bauer, J., & Hauer, K. (2019). Validation of a modified life-space assessment in multimorbid older persons with cognitive impairment. The Gerontologist, 59(2), e66–e75.CrossRef Google Scholar PubMed

Veronese, N., Stubbs, B., Fontana, L., Trevisan, C., Bolzetta, F., Rui, M. D., … Sergi, G. (2017). A comparison of objective physical performance tests and future mortality in the elderly people. The journals of gerontology. Series A, Biological Sciences and Medical Sciences, 72(3), 362–368. https://doi.org/10.1093/gerona/glw139.Google Scholar

Viccaro, L. J., Perera, S., & Studenski, S. A. (2011). Is timed up and go better than gait speed in predicting health, function, and falls in older adults? Journal of the American Geriatrics Society, 59(5), 887–892. https://doi.org/10.1111/j.1532-5415.2011.03336.x.CrossRef Google Scholar PubMed

Weiss, C. O., Seplaki, C. L., Wolff, J. L., Kasper, J. D., & Agree, E. M. (2008). Self-selected walking speed was consistent when recorded while using a cane. Journal of Clinical Epidemiology, 61(6), 622–627.CrossRef Google Scholar PubMed

Table 1. Sociodemographic characteristics for entire sample (n = 12,433)Table 1. long description.

Table 2. Distribution of the 4MWT and comparator measures for convergent validity and responsiveness for entire sampleTable 2. long description.

Table 3. Convergent validity results of the 4MWT for entire sampleTable 3. long description.

Table 4. Known-groups validity results of the 4MWT for entire sampleTable 4. long description.

Table 5. Predictive validity results of the 4MWT for entire sampleTable 5. long description.

Table 6. Responsiveness results of the 4MWT for entire sampleTable 6. long description.

Mehdipour et al. supplementary material

DOI: https://doi.org/10.1017/S0714980826100592.sm001

File 50.1 KB

Article contents

Psychometric Properties of the 4-Metre Walk Test in the Canadian Longitudinal Study on Aging

Abstract

Résumé

Keywords

Information

Background and objectives

Research design and methods

Sample

Outcomes

Analysis

Sample size

Results

Sociodemographic characteristics

4MWT distribution

Construct validity

Predictive validity

Responsiveness

Discussion and implications

Supplementary material

Data availability statement

Acknowledgements

Author contribution

Competing interests

Disclaimer

Footnotes

References

Mehdipour et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests