Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-18T19:45:23.775Z Has data issue: false hasContentIssue false

Parallel and identical test–retest reliability of the Tower of London test – Freiburg version

Published online by Cambridge University Press:  12 December 2022

Valentin Schyle
Affiliation:
Institute of Medical Psychology and Medical Sociology, Faculty of Medicine, University of Freiburg, Germany
Lena V. Schumacher
Affiliation:
Institute of Medical Psychology and Medical Sociology, Faculty of Medicine, University of Freiburg, Germany
Benjamin Rahm
Affiliation:
Institute of Medical Psychology and Medical Sociology, Faculty of Medicine, University of Freiburg, Germany
Josef M. Unterrainer*
Affiliation:
Institute of Medical Psychology and Medical Sociology, Faculty of Medicine, University of Freiburg, Germany
*
Corresponding author: Josef M. Unterrainer, email: josef.unterrainer@mps.uni-freiburg.de
Rights & Permissions [Opens in a new window]

Abstract

Objectives:

The Tower of London – Freiburg version (TOL-F) was developed in three parallel-test versions (A, B, and C) that only differ in their physical appearance by interchanged ball colors, but not in their cognitive demands. We addressed the question whether the test–retest reliability of an identical problem set differs from the parallel test–retest reliability of a structurally identical problem set with a marginally different physical appearance.

Methods:

Reliabilities were assessed in two samples of young adults over a 1-week interval: In the parallel test–retest sample (n = 93; 49 female), half of the participants accomplished version A at the first session and version B at the second session, while the other half started with version B in the first session and continued with A in the second session. In the identical test–retest sample (n = 86; 48 female), half of the participants performed on version A in both the first and the second session, while the other half went through the same procedure with version B.

Results:

For overall planning accuracy, intraclass correlation coefficients for absolute agreement were r = .501 for the parallel test–retest and r = .605 for the identical test–retest sample, with Pearson correlations of r = .559 and r = .708 respectively. Greatest lower bound estimates of reliability were adequate to high in the two samples (ranging between .765 and .854) confirming previous studies.

Conclusions:

Although the TOL-F revealed only moderate intraclass correlations for absolute agreement, it showed some of the highest psychometric indices compared to repeated assessments with other TOL tests.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © INS. Published by Cambridge University Press, 2022
Figure 0

Table 1. Descriptive and inferential statistics for sample characteristics of demographic information, and scores on tests of depressive symptoms, fluid and crystallized intelligence

Figure 1

Figure 1. Overall planning accuracy in percent across sessions 1 and 2 for both samples (gray bars, parallel test–retest sample; beige bars, identical test–retest sample), with error bars denoting the standard error of mean.

Figure 2

Table 2. Descriptive statistics of the TOL-F for overall planning accuracy