We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The aims of this study were to field and pilot test the Korean version of the Household Emergency Preparedness Instrument (K-HEPI) and perform psychometric testing of the instrument’s reliability and validity.
Methods
The English to Korean translation followed a symmetrical translation approach utilizing a decentered process (i.e., both the source and target languages were considered equally important) focusing on the instruments remaining loyal to the content. After translation, the K-HEPI was field tested with 30 bilingual participants who all reported that the instructions were easy to understand and the items aligned closely with the original English version. The K-HEPI was then pilot tested with 399 Korean-speaking participants in a controlled, before-after study utilizing a disaster preparedness educational intervention.
Results
Confirmatory factor analyses supported the K-HEPI retaining the factor structure of the original English version. The K-HEPI was also found to be psychometrically comparable to the original instrument.
Conclusions
The K-HEPI can validly and reliably assess the disaster preparedness of Korean-speaking populations, enabling clinicians, researchers, emergency management professionals, and policymakers to gather accurate data on disaster preparedness levels in Korean communities, identify gaps in preparedness, develop targeted interventions, and evaluate the effectiveness of disaster preparedness interventions over time.
We compare the Emory 10-item, 4-choice Rey Complex Figure (CF) Recognition task with the Meyers and Lange (M&L) 24-item yes/no CF Recognition task in a large cohort of healthy research participants and in patients with heterogeneous movement disorder diagnoses. While both tasks assess CF recognition, they differ in key aspects including the saliency of target and distractor responses, self-selection versus forced-choice formats, and the length of the item sets.
Participants and Methods:
There were 1056 participants from the Emory Healthy Brain Study (EHBS; average MoCA = 26.8, SD = 2.4) and 223 movement disorder patients undergoing neuropsychological evaluation (average MoCA = 24.3, SD = 4.0).
Results:
Both recognition tasks differentiated between healthy and clinical groups; however, the Emory task demonstrated a larger effect size (Cohen’s d = 1.02) compared to the M&L task (Cohen’s d = 0.79). d-prime scoring of M&L recognition showed comparable group discrimination (Cohen’s d = 0.81). Unidimensional two-parameter logistic item response theory analysis revealed that many M&L items had low discrimination values and extreme difficulty parameters, which contributed to the task’s reduced sensitivity, particularly at lower cognitive proficiency levels relevant to clinical diagnosis. Dimensionality analyses indicated the influence of response sets as a potential contributor to poor item performance.
Conclusions:
Emory CF Recognition task demonstrates superior psychometric properties and greater sensitivity to cognitive impairment compared to the M&L task. Its ability to more precisely measure lower levels of cognitive functioning, along with its brevity, suggests it may be more effective for diagnostic use, especially in clinical populations with cognitive decline.
The purpose of this study was to measure meal quality in representative samples of schoolchildren in three cities located in different Brazilian regions using the Meal and Snack Assessment Quality (MESA) scale and examine association with weight status, socio-demographic characteristics and behavioural variables. This cross-sectional study analysed data on 5612 schoolchildren aged 7–12 years who resided in cities in Southern, Southeastern and Northeastern Brazil. Dietary intake was evaluated using the WebCAAFE questionnaire. Body weight and height were measured to calculate the BMI. Weight status was classified based on age- and sex-specific Z-scores. Meal quality was measured using the MESA scale. Associations of meal quality with weight status and socio-demographic and behavioural variables were investigated using multinomial regression analysis. Schoolchildren in Feira de Santana, São Paulo and Florianópolis had a predominance of healthy (41·8 %), mixed (44·4 %) and unhealthy (42·7 %) meal quality, respectively. There was no association with weight status. Schoolchildren living in Feira de Santana, those who reported weekday dietary intakes, and those with lower physical activity and screen activity scores showed higher meal quality. Schoolchildren aged 10–12 years, those who reported dietary intakes relative to weekend days, and those with higher screen activity scores exhibited lower meal quality.
To assess for differences in low score frequency on cognitive testing amongst older adults with and without a self-reported history of traumatic brain injury (TBI) in the National Alzheimer’s Coordinating Center (NACC) dataset.
Method:
The sample included adults aged 65 or older who completed the Uniform Data Set 3.0 neuropsychological test battery (N = 7,363) and was divided by individuals with and without a history of TBI, as well as cognitive status as measured by the CDR. We compared TBI- and TBI + groups by the prevalence of low scores obtained across testing. Three scores falling at or below the 2nd percentile or four scores at or below the 5th percentile were criteria for an atypical number of low scores. Nonparametric tests assessed associations among low score prevalence and demographics, symptoms of depression, and TBI history.
Results:
Among cognitively normal participants (CDR = 0), older age, male sex and greater levels of depression were associated with low score frequency; among participants with mild cognitive impairment (CDR = 0.5-1), greater levels of depression, shorter duration of time since most recent TBI, and no prior history of TBI were associated with low score frequency.
Conclusions:
Participants with and without a history of TBI largely produced low scores on cognitive testing at similar frequencies. Cognitive status, sex, education, depression, and TBI recency showed variable associations with the number of low scores within subsamples. Future research that includes more comprehensive TBI history is indicated to characterize factors that may modify the association between low scores and TBI history.
People with an intellectual disability are vulnerable to additional disorders such as dementia. Psychometrically sound and specific instruments are needed for assessment of cognitive functioning in cases of suspected dementia.
Aims
To evaluate the construct and item validity, internal consistency and test–retest reliability of a new neuropsychological test battery, the Dementia Test for People with Intellectual Disability (DTIM).
Method
The DTIM was applied to 107 individuals with intellectual disability with (n = 16) and without (n = 91) dementia. The psychometric properties of the DTIM were assessed in a prospective study. The assessors were blinded to the diagnostic assignment.
Results
Confirmatory factor analysis at the scale level showed that a one-factor model fitted the data well (root mean square error of approximation < 0.06, standardised root mean square residual < 0.08, comparative fit index > 0.9). At the domain level, one-factor models showed reasonable-to-good fit index for five of seven domains. Internal consistency indicated excellent reliability of the overall scale (Cronbach’s α: 0.94 for dementia and 0.95 for controls). Item analysis revealed a wide range of difficulties (0.19–0.75 for dementia, 0.31–0.87 for controls), with minimal floor and ceiling effects. Eleven items (26%) had discrimination value ≤ 0.50. Test–retest reliability (n = 82) was high, with intraclass correlations of 0.95 (total score) and 0.69–0.96 (domains).
Conclusions
The DTIM fits a one-factor model and demonstrates internal and test–retest reliability; thus, it is suitable for use in cases of suspected dementia in people with various intellectual disabilities.
Measurement is an essential activity in neurology, as it allows for collecting and sharing data that can be used for description, comparison and decision making regarding the health status of patients. The adequate assessment of motor and functional signs and symptoms of movement disorders must be done with instruments that have been developed and tested following a standardized methodology. The validation of a scale or instrument is an iterative process that includes several phases and the testing of a number of psychometric properties following the principles of the Classical Test Theory or the Latent Test Theory, each with its own methods and statistical procedures. In this chapter, we review the characteristics and psychometric properties of the main measurement instruments and scales for assessing motor and functional symptoms in movement disorders, particularly those recommended by the Movement Disorders Society.
This methodological study aimed to adapt the DLS, introduced for individuals aged 18-60 years, to those aged 60 years and older and to determine its psychometric properties.
Methods
We collected the data between December 15, 2021 and April 18, 2022. We carried out the study with a sample of 60 years and older living in the city center of Burdur, Turkey. The sample was selected using snowball sampling, a non-probability sampling technique. We collected the data using a questionnaire booklet covering an 11-item demographic information form and the DLS. We utilized reliability and validity analyses in the data analysis. The analyses were performed on SPSS 23.0, and a P value < 0.05 was considered statistically significant.
Results
The mean age of the participants was found to be 68.29 (SD = 6.36). The 61-item measurement tool was reduced to 57 items by removing a total of 4 items from the scale. We also calculated Cronbach’s α values to be 0.936 for the mitigation/prevention subscale, 0.935 for the preparedness subscale, 0.939 for the response subscale, and 0.945 for the recovery/rehabilitation subscale.
Conclusions
As adapted in this study, the DLS-S can be validly and reliably used for individuals aged 60 years and older.
The last two decades have been marked by excitement for measuring implicit attitudes and implicit biases, as well as optimism that new technologies have made this possible. Despite considerable attention, this movement is marked by weak measures. Current implicit measures do not have the psychometric properties needed to meet the standards required for psychological assessment or necessary for reliable criterion prediction. Some of the creativity that defines this approach has also introduced measures with unusual properties that constrain their applications and limit interpretations. We illustrate these problems by summarizing our research using the Implicit Association Test (IAT) as a case study to reveal the challenges these measures face. We consider such issues as reliability, validity, model misspecification, sources of both random and systematic method variance, as well as unusual and arbitrary properties of the IAT’s metric and scoring algorithm. We then review and critique four new interpretations of the IAT that have been advanced to defend the measure and its properties. We conclude that the IAT is not a viable measure of individual differences in biases or attitudes. Efforts to prove otherwise have diverted resources and attention, limiting progress in the scientific study of racism and bias.
In this paper, I will review some aspects of psychometric projects that I have been involved in, emphasizing the nature of the work of the psychometricians involved, especially the balance between the statistical and scientific elements of that work. The intent is to seek to understand where psychometrics, as a discipline, has been and where it might be headed, in part at least, by considering one particular journey (my own). In contemplating this, I also look to psychometrics journals to see how psychometricians represent themselves to themselves, and in a complementary way, look to substantive journals to see how psychometrics is represented there (or perhaps, not represented, as the case may be). I present a series of questions in order to consider the issue of what are the appropriate foci of the psychometric discipline. As an example, I present one recent project at the end, where the roles of the psychometricians and the substantive researchers have had to become intertwined in order to make satisfactory progress. In the conclusion I discuss the consequences of such a view for the future of psychometrics.
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.
Human abilities in perceptual domains have conventionally been described with reference to a threshold that may be defined as the maximum amount of stimulation which leads to baseline performance. Traditional psychometric links, such as the probit, logit, and t, are incompatible with a threshold as there are no true scores corresponding to baseline performance. We introduce a truncated probit link for modeling thresholds and develop a two-parameter IRT model based on this link. The model is Bayesian and analysis is performed with MCMC sampling. Through simulation, we show that the model provides for accurate measurement of performance with thresholds. The model is applied to a digit-classification experiment in which digits are briefly flashed and then subsequently masked. Using parameter estimates from the model, individuals’ thresholds for flashed-digit discrimination is estimated.
Borsboom (2006) attacks psychologists for failing to incorporate psychometric advances in their work, discusses factors that contribute to this regrettable situation, and offers suggestions for ameliorating it. This commentary applauds Borsboom for calling the field to task on this issue and notes additional problems in the field regarding measurement that he could add to his critique. It also chastises Borsboom for occasionally being unnecessarily perjorative in his critique, noting that negative rhetoric is unlikely to make converts of offenders. Finally, it exhorts psychometricians to make their work more accessible and points to Borsboom, Mellenbergh, and Van Heerden (2003) as an excellent example of how this can be done.
Educational assessment concerns inference about students' knowledge, skills, and accomplishments. Because data are never so comprehensive and unequivocal as to ensure certitude, test theory evolved in part to address questions of weight, coverage, and import of data. The resulting concepts and techniques can be viewed as applications of more general principles for inference in the presence of uncertainty. Issues of evidence and inference in educational assessment are discussed from this perspective.
A taxonomy of latent structure assumptions (LSAs) for probability matrix decomposition (PMD) models is proposed which includes the original PMD model (Maris, De Boeck, & Van Mechelen, 1996) as well as a three-way extension of the multiple classification latent class model (Maris, 1999). It is shown that PMD models involving different LSAs are actually restricted latent class models with latent variables that depend on some external variables. For parameter estimation a combined approach is proposed that uses both a mode-finding algorithm (EM) and a sampling-based approach (Gibbs sampling). A simulation study is conducted to investigate the extent to which information criteria, specific model checks, and checks for global goodness of fit may help to specify the basic assumptions of the different PMD models. Finally, an application is described with models involving different latent structure assumptions for data on hostile behavior in frustrating situations.
The Psychometric Society is “devoted to the development of Psychology as a quantitative rational science”. Engineering is often set in contradistinction with science; art is sometimes considered different from science. Why, then, juxtapose the words in the title:psychometric, engineering, and art? Because an important aspect of quantitative psychology is problem-solving, and engineering solves problems. And an essential aspect of a good solution is beauty—hence, art. In overview and with examples, this presentation describes activities that are quantitative psychology as engineering and art—that is, as design. Extended illustrations involve systems for scoring tests in realistic contexts. Allusions are made to other examples that extend the conception of quantitative psychology as engineering and art across a wider range of psychometric activities.
Designers rely on many methods and strategies to create innovative designs. However, design research often overlooks the personality and attitudinal factors influencing method utility and effectiveness. This article defines and operationalizes the construct design mindset and introduces the Design Mindset Inventory (D-Mindset0.1), allowing us to measure and leverage statistical analyses to advance our understanding of its role in design. The inventory’s validity and reliability are evaluated by analyzing a large sample of engineering students (N = 473). Using factor analysis, we identified four underlying factors of D-Mindset0.1 related to the theoretical concepts: Conversation with the Situation, Iteration, Co-Evolution of Problem–Solution and Imagination. The latter part of the article finds statistical and theoretically meaningful relationships between design mindset and the three design-related constructs of sensation-seeking, self-efficacy and ambiguity tolerance. Ambiguity tolerance and self-efficacy emerge as positively correlated with design mindset. Sensation-seeking, which is only significantly correlated with subconstructs of D-Mindset0.1, is both negatively and positively correlated. These relationships lend validity D-Mindset0.1 and, by drawing on previously established relationships between the three personality traits and specific behaviors, facilitate further investigations of what its subconstructs capture.
With the increased use of computer-based tests in clinical and research settings, assessing retest reliability and reliable change of NIH Toolbox-Cognition Battery (NIHTB-CB) and Cogstate Brief Battery (Cogstate) is essential. Previous studies used mostly White samples, but Black/African Americans (B/AAs) must be included in this research to ensure reliability.
Method:
Participants were B/AA consensus-confirmed healthy controls (HCs) (n = 49) or mild cognitive impairment (MCI) (n = 34) adults 60–85 years that completed NIHTB-CB and Cogstate for laptop at two timepoints within 4 months. Intraclass correlations, the Bland-Altman method, t-tests, and the Pearson correlation coefficient were used. Cut scores indicating reliable change provided.
Results:
NIHTB-CB composite reliability ranged from .81 to .93 (95% CIs [.37–.96]). The Fluid Composite demonstrated a significant difference between timepoints and was less consistent than the Crystallized Composite. Subtests were less consistent for MCIs (ICCs = .01–.89, CIs [−1.00–.95]) than for HCs (ICCs = .69–.93, CIs [.46–.92]). A moderate correlation was found for MCIs between timepoints and performance on the Total Composite (r = -.40, p = .03), Fluid Composite (r = -.38, p = .03), and Pattern Comparison Processing Speed (r = -.47, p = .006).
On Cogstate, HCs had lower reliability (ICCs = .47–.76, CIs [.05–.86]) than MCIs (ICCs = .65–.89, CIs [.29–.95]). Identification reaction time significantly improved between testing timepoints across samples.
Conclusions:
The NIHTB-CB and Cogstate for laptop show promise for use in research with B/AAs and were reasonably stable up to 4 months. Still, differences were found between those with MCI and HCs. It is recommended that race and cognitive status be considered when using these measures.
Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.
In this chapter we review advanced psychometric methods for examining the validity of self-report measures of attitudes, beliefs, personality style, and other social psychological and personality constructs that rely on introspection. The methods include confirmatory-factor analysis to examine whether measurements can be interpreted as meaningful continua, and measurement invariance analysis to examine whether items are answered the same way in different groups of people. We illustrate the methods using a measure of individual differences in openness to political pluralism, which includes four conceptual facets. To understand how the facets relate to the overall dimension of openness to political pluralism, we compare a second-order factor model and a bifactor model. We also check to see whether the psychometric patterns of item responses are the same for males and females. These psychometric methods can both document the quality of obtained measurements and inform theorists about nuances of their constructs.
This study evaluated the validity and reliability of the Persian version of the Disaster Resilience Measuring Tool (DRMT-C19).
Methods
The research was a methodological, psychometric study. Standard translation processes were performed. Face validity and content validity were determined along with construct and convergent validity. To determine the final version of the questionnaire, 483 health care rescuers were selected using a consecutive sampling method. Other resilience-related questionnaires were used to assess concurrent validity. All quantitative data analyses were conducted using SPSS 22 and Jamovi 2.3.28 software.
Results
The content validity and reliability were indicated using Scale’s Content Validity Ratio (S-CVR) = 0.92 and Scale’s Content Validity Index (S-CVI) = 0.93. The comprehensiveness of the measurement tool = 0.875%. Cronbach’s α = 0.89 and the test re-test reliability using interclass correlation coefficients (ICC) = 0.68 to 0.92. Exploratory factor analysis determined 4 factors which accounted for more than 58.54% of the variance among the items. Confirmatory factor analysis determined 12 factors. The concurrent validity between the DRMT-C19 and the Connor-Davidson Resilience Scale (CD-RISC) was r = 0.604 (P ≤ 0.0001).
Conclusions
The DRMT-C19 has satisfactory psychometric properties and is a valid, reliable, and valuable tool for assessing resilience against disasters in Iran’s Persian-speaking health care rescuers.