Background and objectives
The 4-metre walk test (4MWT) is one of the most widely used mobility tests in older adults (Rydwik et al., Reference Rydwik, Bergland, Forsen and Frändin2012). The 4MWT, with different protocol variations (i.e., speed and starting protocol), demonstrates sound psychometric properties in older adults. It is reliable when performed over-time (Kim et al., Reference Kim, Park, Lee and Lee2016) and correlates well with measures of lower extremity function (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012; Kim et al., Reference Kim, Park, Lee and Lee2016; Mansson et al., Reference Mansson, Pettersson, Rosendahl, Skelton, Lundin-Olsson and Sandlund2022) and activities of daily living (ADL) (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012). Moreover, the 4MWT is able to discriminate between different groups of individuals, such as those with frailty (Lee et al., Reference Lee, Patel, Costa, Bryce, Hillier, Slonim and Molnar2017) and mobility limitations (Riwniak et al., Reference Riwniak, Simon, Wages, Clark, Manini, Russ and Clark2020). There is also evidence in support of its predictive validity for frailty (Sutorius et al., Reference Sutorius, Hoogendijk, Prins and van Hout2016), multiple falls (Viccaro et al., Reference Viccaro, Perera and Studenski2011), ADL difficulties (Viccaro et al., Reference Viccaro, Perera and Studenski2011), and mortality (Veronese et al., Reference Veronese, Stubbs, Fontana, Trevisan, Bolzetta, Rui and Sergi2017) in older adults.
Although the 4MWT has been widely used and evaluated, clinicians and researchers should be cognisant of its psychometric evidence in relation to testing conditions, including its test protocol, environment, and participant population. Protocol variations to the test, such as its speed and starting protocol, can affect its reliability and validity; thus, findings should be interpreted with respect to the protocol (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Canada’s unique environmental factors, such as its harsh winter conditions, can lead to decreased walking, fear of falls, and fall-related emergency department visits (Bergen et al., Reference Bergen, Jubenvill, Shaw, Steen, Loewen, Mbabaali and Barclay2023; Huynh et al., Reference Huynh, Tracy, Thompson, Bang, McFaull, Curran and Villeneuve2021), and its universal healthcare system impacts healthcare decisions and polices for older adults (e.g., funding decisions for mobility aids). In 2022, 10.6% Canadians (above the age of 15) were reported to have mobility disability, with its prevalence varying by sex and age (Mobility disabilities, 2022, 2024). Therefore, when interpreting the 4MWT scores in older Canadians, clinicians and researchers should make decisions in relation to evidence found in older Canadians. For example, if the 4MWT is not predictive of falls, then 4MWT scores may not be informative for decisions regarding fall interventions. In Canadian older adults, the 4MWT has been found to be reliable between raters (Intraclass Correlation Coefficient [ICC] = 0.95–0.98) and within raters (ICC = 0.76–0.91) (Katsoulis et al., Reference Katsoulis, Mathur and Amara2021). Beauchamp et al. (Reference Beauchamp, Hao, Kuspinar, D’Amore, Scime, Ma and Kirkland2021) evaluated the intra-rater reliability of the 4MWT in a subsample of the Canadian Longitudinal Study on Aging (CLSA), a large national study on older adults, and reported an ICC of 0.67–0.69 in individuals 65 and older. To our knowledge, there is limited information with regard to the 4MWT’s validity in Canadian older adults. Only one study evaluated its construct validity and found that the 4MWT was able to discriminate between non-frail and frail Canadians (accuracy = 94.2%) (Lee et al., Reference Lee, Patel, Costa, Bryce, Hillier, Slonim and Molnar2017). To ensure that the 4MWT can be employed in research and clinical practice to successfully evaluate Canadian older adults’ mobility status, its ability to accurately capture Canadian older adults’ mobility needs to be determined. In a previous study using CLSA data, the 4MWT was found to have poor predictive ability in identifying fallers, raising questions about the utility of the test for examining fall risk (Beauchamp et al., Reference Beauchamp, Kuspinar, Sohel, Mayhew, D’Amore, Griffith and Raina2022). There remains a need to evaluate the extent to which the 4MWT has predictive validity for other adverse outcomes, such as hospitalization and emergency department visits, in Canadian community-dwelling older adults. If found to be predictive of healthcare utilization, such as healthcare professional (HCP) visits, hospitalization, and emergency department visits, the 4MWT could be used as a screening tool for these outcomes. Furthermore, there is limited research on the responsiveness of the 4MWT in older adults. Responsiveness is an important psychometric property as it determines whether the tool can capture change in mobility over time.
The primary aim of this study was to evaluate the psychometric properties of the 4MWT in Canadian community-dwelling older adults. The CLSA data set was used to evaluate the psychometric properties of the 4MWT since it is a large, comprehensive study that provides longitudinal data in a population-based sample of older Canadians (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Oremus, Patterson and Hogan2009). This study aimed to address the following objectives:
-
1. To assess the construct validity (convergent and known-groups validity) of the 4MWT for accurately reflecting mobility in Canadian community-dwelling older adults.
-
2. To determine the predictive validity of the 4MWT for healthcare utilization at 3 years in Canadian community-dwelling older adults.
-
3. To assess the ability of the 4MWT to detect changes in mobility over time (responsiveness) in Canadian community-dwelling older adults.
Moreover, to ensure appropriateness of scores for both males and females and different age groups, the secondary aim of this study was to evaluate the psychometric properties of the 4MWT in Canadian community-dwelling older adults by sex and age subgroups.
Research design and methods
Sample
Secondary data analyses of baseline and first follow-up data from the CLSA comprehensive cohort were performed to address the objectives of this study. Data from individuals who were ≥ 65 years and completed the 4MWT at baseline were only examined. The CLSA is a large, nation-wide, longitudinal study of around 50,000 people between 45 and 85 years residing in Canada, being tracked for various outcomes prospectively over 20 years (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Oremus, Patterson and Hogan2009, Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). The CLSA included individuals who were community-dwelling at baseline, spoke English or French, and provided consent (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). Those who had cognitive impairments, were living on First Nations reserves, were full-time members of the Canadian Armed Forces, or were living in institutions (e.g., long-term care) were excluded (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019). The CLSA consists of two cohorts: the comprehensive (n = ~30,000) and tracking (n = ~20,000). Physical tests including the 4MWT were administered in the comprehensive cohort; therefore, only data from this cohort were examined. Baseline data collection was completed between 2011 and 2015, and first follow-up data collection was completed between 2015 and 2018. For objective 1, cross-sectional analyses of the baseline data were conducted and for objectives 2 and 3, longitudinal analyses of the baseline and first follow-up data (i.e., 3-year follow-up) were conducted. Individuals with incomplete or missing data were excluded. Ethics approval was received by the Hamilton Integrated Research Ethics Board (#15460).
Outcomes
The primary outcome measure was the 4MWT collected at baseline and first follow-up. Data collection was performed at a data collection site (11 sites across Canada) by research staff using a standardized procedure (Raina et al., Reference Raina, Wolfson, Kirkland, Griffith, Balion, Cossette and Van Den Heuvel2019; Timed (4-metre) Walk Test, 2014). Participants were instructed to walk a 4-metre distance at their usual pace from a static start (no acceleration zone) with no deceleration and were timed by an assessor using a stopwatch. The participant was instructed to start with their feet behind the starting line and the timer started after the staff said ‘Ready, Set, Go’ and stopped when the participant completely crossed the 4-metre finish line. Scores were reported in seconds but converted to metres per second for this analysis to represent walking speed.
Outcome measures used to assess convergent validity and responsiveness were the Timed-Up and Go (TUG) (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991), the Standing Balance test (Measuring Standing Balance, 2014), the Life-Space Assessment (LSA) (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005), and the Basic ADL and Instrumental ADL (IADL) questionnaire modified from the Older Americans’ Resources and Services (OARS) Multidimensional Assessment Questionnaire (Fillenbaum, Reference Fillenbaum2013). Baseline assessments were used to evaluate convergent validity, and change scores between baseline and first follow-up (i.e., 3 years) were used to assess responsiveness. The TUG is a reliable and valid performance-based measure for older adults that assesses mobility and balance (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991; Spagnuolo et al., Reference Spagnuolo, Jürgensen, Iwama and Dourado2010). Participants are instructed to stand up from a chair, walk 3-metres, turn 180 degrees, and walk and sit back down (Podsiadlo & Richardson, Reference Podsiadlo and Richardson1991). The TUG is scored in seconds, starting from when the participant stands up and stopping when they sit down. The standing balance test is a measure of balance, highly correlated with age (Bohannon et al., Reference Bohannon, Larkin, Cook, Gear and Singer1984). Participants are asked to hold their balance on one foot for as long as possible and then repeat on the other foot (Measuring Standing Balance, 2014). The test is scored in seconds with the timer stopping when the participant loses their balance, with a maximum time of 60 seconds. The best time (between left and right leg) was considered for this analysis. The LSA is a psychometrically sound measure of life-space mobility (i.e., the extent of movement within one’s environment) in community-dwelling older adults (Kuspinar et al., Reference Kuspinar, Mehdipour, Beauchamp, Hao, Cino, Mikton and Raina2023). The LSA consists of five life-space levels, from inside one’s home to outside of their town, with scoring including frequency of visit and assistance (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). The LSA is scored from 0 to 120, with higher scoring indicating more life-space mobility (Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). The modified OARS questionnaire used in the CLSA consists of 41 items covering ADLs and IADLs, with a derived total score ranging from 1, no functional impairment, to 5, total impairment, where meal preparation is given extra weighting.
For known-groups validity, history of falls (responding yes/no to ‘In the past 12 months, did you have any falls?’) and use of assistive devices at baseline were examined. Use of assistive devices was dichotomized into users and non-users (in the past 12 months) by considering participants’ responses to using the following devices: cane/walking stick, wheelchair, motorized scooter, walker, or leg braces. For predictive validity, healthcare utilization outcomes, such as HCP contact (family doctor, medical specialist, and rehabilitation specialist), emergency department visit, and hospitalization (i.e., hospital stay overnight) in the past 12 months, at 3 years were examined.
Analysis
Descriptive statistics (mean [standard deviation], median [interquartile range], or frequency [percentage]) were used to summarize sample characteristics and outcomes. Subgroup analyses were performed for sex (males and females) and age (65–74 years and 75+ years) groups, and data were stratified based on baseline sex and age. To determine convergent validity and responsiveness of the 4MWT, Pearson’s correlation coefficients were computed if data were normally distributed and Spearman’s correlation coefficients if data were not normally distributed. For convergent validity, high correlations of at least 0.50 were expected with the TUG as both the 4MWT and TUG are performance-based measures involving walking, and moderate correlations of 0.30–0.50 were expected with the standing balance test, the LSA, and the modified OARS as they measure related but dissimilar constructs (Mokkink et al., Reference Mokkink, Prinsen, Patrick, Alonso, Bouter, de Vet and Mokkink2018). Correlations below 0.30 were considered to be low, indicating unrelated and dissimilar constructs. For responsiveness, correlations among change scores were expected to be 0.10 units less than those hypothesized for convergent validity to account for measurement error from multiple administrations (i.e., high correlations of at least 0.40 for TUG and moderate correlations of 0.20–0.40 for the other comparators, with low correlations considered to be <0.20) (De Vet et al., Reference De Vet, Terwee, Mokkink and Knol2011). Tables 3 and 6 provide hypotheses for correlations, including the direction and magnitude of the correlation. To determine known-groups validity, statistical significance between groups were evaluated with independent t-tests if data were normally distributed and Wilcoxon test if data were not normally distributed. Using minimal important change values for the 4MWT (Perera et al., Reference Perera, Mody, Woodman and Studenski2006), a difference of at least 0.1 m/s was expected between groups to indicate evidence for known-groups validity. To determine predictive validity, Receiver Operating Characteristic curves were computed and areas under the curve (AUC) of at least 0.70 (on a 0–1 scale) were expected for acceptable discrimination (Mokkink et al., Reference Mokkink, Prinsen, Patrick, Alonso, Bouter, de Vet and Mokkink2018). AUC values ranging from 0 to 0.69 indicate inadequate discrimination with 0.50 indicating discrimination due to chance. Statistical significance was set at a p-value less than 0.05. Stata, version 15.1 (StataCorp, College Station, TX, USA), was used to perform the analyses.
Sample size
The CLSA sample size for participants ≥ 65 years in the comprehensive cohort is approximately 12,000. Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) guidelines recommend larger samples (> 100) for validation studies as confidence in the statistical values (e.g., correlations, AUCs) increases with larger samples (De Vet et al., Reference De Vet, Terwee, Mokkink and Knol2011). Therefore, a sample of ~12,000 allows for stratified analyses and confidence in results.
Results
Sociodemographic characteristics
The number of participants above the age of 65 in the cohort was 12,646. However, only 12,433 participants had baseline 4MWT scores and thus were included in the analysis for convergent and known-groups validity. Follow-up data on the 4MWT test were available for 10,107 of those participants; thus, they were included for predictive validity and responsiveness. The sample was predominantly white (96%), married (63%), completed a postsecondary education (72%), and indicated a household income over $ 20,000 (93%). Sociodemographic information of included participants can be found in Table 1 and Supplementary Tables 1 and 2.
Table 1. Sociodemographic characteristics for entire sample (n = 12,433)

a Reported in median (IQR).
4MWT distribution
Table 2 displays distribution of measure scores for convergent validity and responsiveness (subgroup distributions can be found in Supplementary Tables 3 and 4). Baseline 4MWT mean scores were 0.91 (0.20) m/s for the entire sample, 0.95 (0.19) m/s for those between 65 and 74 years (n = 7,244), 0.86 (0.19) m/s for those greater than 75 years (n = 5,189), 0.89 (0.20) m/s for females (n = 6,197), and 0.93 (0.20) m/s for males (n = 6,236).
Table 2. Distribution of the 4MWT and comparator measures for convergent validity and responsiveness for entire sample

Note: The scores reported are for participants with baseline 4MWT scores.
4MWT = 4-metre walk test; LSA = Life Space Assessment; OARS = Older Americans’ Resources and Services questionnaire; SD=standard deviation; TUG = Timed-Up and Go.
a Reported in median (IQR).
b Scores included as part of the responsiveness analysis (i.e., participants with both baseline and follow-up scores).
Construct validity
Table 3 outlines convergent validity results for the entire sample (Supplementary Tables 5 and 6 display subgroup results). The correlation between the 4MWT and the TUG was rho = −0.72 for the entire sample, ranging from -0.72 to -0.68 for sex and age subgroups. The correlation between the 4MWT and the standing balance test was rho = 0.32 for the entire sample. When analysed by subgroup, the correlation was above 0.30 for females but below 0.30 for both age groups and for males. Correlations between the 4MWT and both the LSA and the modified OARS questionnaires were below the expected 0.30 cut-off for the entire sample (rho = 0.25 and − 0.26, respectively) and all subgroups. For convergent validity, 50% of hypotheses were met for the entire sample and the female subgroup, while 25% were met for other subgroups.
Table 3. Convergent validity results of the 4MWT for entire sample

LSA, Life Space Assessment; OARS, Older Americans’ Resources and Services questionnaire; TUG, Timed-Up and Go.
+, hypothesis met; −, hypothesis not met.
Table 4 outlines known-groups validity results for the entire sample (Supplementary Tables 7 and 8 display subgroups results). Although the 4MWT was statistically different between both fallers and assistive device user groups, the difference was greater than 0.1 m/s only for assistive device user groups (0.15 m/s for entire sample and 0.13–0.17 m/s for subgroups). For known-groups validity, 50% of hypotheses were met for the entire sample and all subgroups. Overall, 50% of results supported the construct validity of the 4MWT, with 33% to 50% of results supporting construct validity within different sex and age subgroups.
Table 4. Known-groups validity results of the 4MWT for entire sample

SD, Standard Deviation.
+, hypothesis met; −, hypothesis not met.
Predictive validity
Table 5 outlines predictive validity results for the entire sample (Supplementary Tables 9 and 10 display subgroups results). For the entire sample, AUC values ranged from 0.51 to 0.59 for all healthcare utilization outcomes. None of the a priori hypotheses (0%) were met for predictive validity, for the entire sample nor subgroups, with all AUC values being less than 0.70.
Table 5. Predictive validity results of the 4MWT for entire sample

CI, confidence interval; ED, emergency department; SD, standard deviation.
+, hypothesis met; −, hypothesis not met.
Responsiveness
Table 6 outlines responsiveness results for the entire sample (Supplementary Tables 11 and 12 display subgroup results). The correlation between 4MWT and TUG change scores was rho = −0.44 for the entire sample, with similar results for sex and age subgroups. Correlations of change scores between the 4MWT and other measures, such as the standing balance test, LSA, and modified OARS, were below the expected 0.20 cut-off for the entire sample (rho = 0.01, 0.04, and − 0.06, respectively) and all subgroups. Only hypotheses regarding correlations with the TUG were met for the entire sample and subgroups, resulting in 25% of results supporting responsiveness.
Table 6. Responsiveness results of the 4MWT for entire sample

LSA, Life Space Assessment; OARS, Older Americans’ Resources and Services questionnaire; TUG, Timed-Up and Go.
+, hypothesis met; −, hypothesis not met.
Discussion and implications
Findings from this psychometric study suggest partial support for the construct validity and responsiveness of the 4MWT in Canadian older adults, with no support for its predictive validity for healthcare utilization. These findings were largely consistent across age and sex subgroups. Results from this study provide important evidence to inform the use of the 4MWT in research and clinical practice.
The construct validity of the 4MWT was evaluated to explore its accuracy in Canadian older adults. Our findings demonstrated moderate to high correlations (>0.50) with other performance-based mobility measures, such as the TUG and standing balance, and lower correlations (<0.30) with self-report measures of mobility and function. Therefore, the 4MWT reflects the construct of locomotor capacity better than self-report mobility. Both self-report measures in this study included concepts outside of one’s walking ability; thus, findings of lower correlations can be explained. For example, the LSA and the modified OARS questionnaires only include 1–2 items regarding one’s ability to walk (Fillenbaum, Reference Fillenbaum2013; Peel et al., Reference Peel, Baker, Roth, Brown, Bodner and Allman2005). European-based studies in older adults found a correlation of rho = 0.41 between walking speed and LSA (Ullrich et al., Reference Ullrich, Werner, Bongartz, Kiss, Bauer and Hauer2019) and r = 0.39–0.63 between walking speed tests and measures of ADL and IADL (Fusco et al., Reference Fusco, Ferrini, Santoro, Lo Monaco, Gambassi and Cesari2012; Nybo et al., Reference Nybo, Gaist, Jeune, McGue, Vaupel and Christensen2001). Our findings indicate that walking speed may not be fully reflective of constructs of ‘life-space mobility’ and ‘daily activities’ in Canadian older adults. Thus, inferences regarding older adults’ daily activities, such as housework or self-care, and the extent to which one moves within their environment should not be made from one’s walking speed score. Even though more hypotheses were met for the construct validity of the 4MWT in females compared to males, differences in correlations were within 0.10. Good quality studies have demonstrated the validity of walking speed tests in accurately reflecting older adults’ mobility across different countries and in comparison with different health outcome measures (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024).
To inform its construct validity, known-groups validity of the 4MWT was also evaluated, providing information for its ability to discriminate between groups. We tested two hypotheses: (1) difference between fallers/non-fallers and (2) assistive device users/non-users, and although both were found to be statistically different, only differences between assistive device users and non-users were found to be clinically important. Studies evaluating walking speed in older adults have also found a clinically important difference of at least 0.1 m/s between unaided and aided (e.g., using a cane, crutches, or a walker) groups (Kingston et al., Reference Kingston, Ferwerda, Fontaine, Keeping, Stewart, Ward and Zucker-Levin2021; Weiss et al., Reference Weiss, Seplaki, Wolff, Kasper and Agree2008). Thus, clinical inferences regarding older adults’ need for assistive devices can be made through walking speed assessments. Although other studies have found a difference of at least 0.10 m/s between fallers and non-fallers, such as 0.13 m/s (Morita et al., Reference Morita, Takamura, Kusano, Abe, Moji, Takemoto and Aoyagi2005), walking speed and its association with falls has not been strongly supported cross-sectionally nor prospectively with good-quality studies reporting AUCs less than 0.70 for known-groups validity (AUC = 0.69) (Middleton et al., Reference Middleton, Fulk, Herter, Beets, Donley and Fritz2016) and for predictive validity (AUC = 0.57–0.62) (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022; Beauchamp et al., Reference Beauchamp, Kuspinar, Sohel, Mayhew, D’Amore, Griffith and Raina2022).
This was the first study to explore the 4MWT’s predictive validity for healthcare utilization in community-dwelling older Canadians and for HCP and emergency department visits in community-dwelling older adults more broadly (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Our findings did not meet any of the a priori hypotheses, and thus, the 4MWT failed to predict healthcare utilization at 3 years in community-dwelling older Canadians. Previous studies evaluating the predictive validity of walking speed tests in community-dwelling older adults, including the 4MWT, for hospitalization also found AUCs less than 0.70 (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022; Viccaro et al., Reference Viccaro, Perera and Studenski2011). Findings from the literature and our study suggest that walking speed may not have strong predictive validity for prospective hospitalization outcomes. Although the predictive validity of walking speed tests has not been previously tested for HCP and emergency department visits in community-dwelling older adults, studies have examined the association of walking speed with prospective emergency department or general practitioner visits in other older adult populations, such as primary care patients and cancer patients, and found walking speed to not be significantly associated with such outcomes (O’Hoski et al., Reference O’Hoski, Bean, Ma, So, Kuspinar, Richardson and Beauchamp2020; Puts et al., Reference Puts, Monette, Girre, Wolfson, Monette, Batist and Bergman2010). It is important to note that healthcare utilization outcomes, such as hospitalization, were not exclusive to mobility-related injuries or conditions, explaining the poor predictive performance. For example, in contrast, the 4MWT has been found to be predictive of mobility-related outcomes such as frailty (AUC = 0.75 and 0.87) (Sutorius et al., Reference Sutorius, Hoogendijk, Prins and van Hout2016). Future studies can explore the predictive validity of the 4MWT in relation to mobility-related healthcare utilization.
This was the first study to examine the responsiveness of the 4MWT in a Canadian population. Our study only demonstrated support for the responsiveness of the 4MWT in community-dwelling older adults over time when compared to a physical measure of mobility that assessed walking (i.e., TUG), with correlations greater than 0.40. Correlations with balance and self-report measures of mobility and function ranged from 0.01 to 0.08. Therefore, only changes in physical mobility as it pertains to walking can be reflected by the 4MWT over 3 years. Only three other studies have evaluated the responsiveness of walking speed tests in community-dwelling older adults, with only one study reporting correlation of change scores with other health measures (Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Mansson et al. (Reference Mansson, Pettersson, Rosendahl, Skelton, Lundin-Olsson and Sandlund2022) reported correlations ranging from 0.10 to 0.37 between change scores on the 4MWT and physical tests and 0.01–0.26 with self-report measures of mobility in community-dwelling older adults.
Validity evidence is contextual and should not be made in relation for the measure itself but rather the context of testing (Messick, Reference Messick1995). Not only should our findings be interpreted in relation Canadian community-dwelling older adults, but it should also be interpreted in relation to the 4MWT protocol. Walking speed tests with acceleration and deceleration zones have been found to report faster and more reliable speeds as they allow for speeding up and slowing down to capture one’s usual walking speed (Kim & Won, Reference Kim and Won2019; Mehdipour et al., Reference Mehdipour, Malouka, Beauchamp, Richardson and Kuspinar2024). Thus, validity findings might be different if the 4MWT is administered using a dynamic start/end protocol.
The strengths of this study include the use of a large Canadian sample with longitudinal data and the inclusion of an array of outcome measures. The inclusion of the list of outcomes in the CLSA enables the psychometric testing of the 4MWT against different health measures, providing more evidence to inform its use and the type of measurement inferences that can be made by HCPs (e.g., walking speed strongly reflecting older adult’s physical mobility).
As a secondary analysis of existing data, this study was constrained by a pre-existing data set and predetermined study design. Consequently, limitations included aspects of the study design (e.g., the absence of acceleration and deceleration zones for the 4MWT) and the availability of comparison measures used to test hypotheses for validity and responsiveness. Moreover, although the CLSA aimed to capture a sample representative of Canadian older adults, majority of participants identified as white and had a postsecondary education. Also, speaking English and/or French was an eligibility criterion. Thus, findings cannot be applied to diverse social backgrounds. Furthermore, we recognize that our interpretations for construct validity and responsiveness were based on observed correlations and not the lower bound of the 95% confidence interval. However, this did not affect our overall conclusions as our confidence intervals were narrow given the large sample size in this study. Additionally, the 3-year follow-up may have influenced psychometric findings. For example, a previous study found higher AUC values for walking speed’s ability to predict hospitalizations when tested at 1 year, compared to 4 years (Abolhassani et al., Reference Abolhassani, Fustinoni and Henchoz2022). With respect to our evaluation, AUC values may have been higher for healthcare utilization if it were evaluated at 1 year versus 3 years.
Using a large, population-based data set, we were able to demonstrate that the 4MWT reflected constructs of physical mobility cross-sectionally and longitudinally. Thus, providing evidence for its construct validity and responsiveness in community-dwelling older Canadians in relation to capturing the construct of locomotor capacity. Clinicians and researchers can use the 4MWT to track and monitor older adults’ locomotor capacity without making inferences on their daily life mobility, such as the extent of movement within their environment. Since we did not find evidence for its ability to predict healthcare utilization over 3 years in this population, analysts and policymakers should be cautioned against using the 4MWT to make long-term decisions regarding healthcare resources and costs for older Canadians.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0714980826100592.
Data availability statement
The data underlying this article were provided by the Canadian Longitudinal Study on Aging and cannot be publicly shared. Data are available from the Canadian Longitudinal Study on Aging (www.clsa-elcv.ca) for researchers who meet the criteria for access to de-identified CLSA data.
Acknowledgements
This research was made possible using the data/biospecimens collected by the Canadian Longitudinal Study on Aging (CLSA). Funding for the Canadian Longitudinal Study on Aging (CLSA) is provided by the Government of Canada through the Canadian Institutes of Health Research (CIHR) under grant reference: LSA 94473 and the Canada Foundation for Innovation, as well as the following provinces, Newfoundland, Nova Scotia, Quebec, Ontario, Manitoba, Alberta, and British Columbia. This research has been conducted using the CLSA data set Baseline Comprehensive Dataset version 7.0 and Follow-up 1 Comprehensive Dataset version 4.0, under Application Number 2209021. The CLSA is led by Drs. Parminder Raina, Christina Wolfson, and Susan Kirkland.
Author contribution
A.M., M.B., J.R., and A.K. contributed to the study conceptualization and design. A.M. contributed to data analysis and interpretation. A.M. drafted and edited the manuscript and M.B., J.R., and A.K. provided critical revisions. A.K. was the supervising author on the project.
Competing interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Disclaimer
The opinions expressed in this manuscript are the author’s own and do not reflect the views of the Canadian Longitudinal Study on Aging.