To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter offers readers a transparent view into the research methodology used to investigate mathematics anxiety and assess the impact of a targeted pedagogical intervention on students’ reported anxiety and attitudes towards statistics and quantitative research methods. It provides a detailed account of the research participants, ethical considerations, and the multi-mixed methods approach employed. The chapter also critiques the validity, reliability, and trustworthiness of the research design and findings, ensuring methodological rigour. A candid discussion of the study’s limitations further strengthens its credibility. It is an essential reading for educators, researchers, and anyone committed to evidence-based improvements in mathematics education.
Edited by
Daniel Naurin, University of Oslo,Urška Šadl, European University Institute, Florence,Jan Zglinski, London School of Economics and Political Science
The chapter discusses the creation and maintenance of databases offering accurate, research-ready data for multidisciplinary use. It draws on the experience with the IUROPA CJEU Database Project (IUROPA), which has collected data about the decision-makers and the decisions of the Court of Justice of the European Union (CJEU). IUROPA and similar multi-user databases must live up to four criteria for databases, as proposed by Weinshall and Epstein. First, they must address real-world problems. Second, they must be open and accessible. Third, they must deliver reliable and reproducible data. Fourth, they must be ageless and easily calibrated to research purposes unknown at the time of data collection and cleaning. These criteria involve trade-offs: the quest for reliability may, first, precipitate difficult choices such as whether to discard or improve upon ‘imperfect’ data or tempt creators to endlessly postpone publication of ‘incomplete’ data; second, sustainability and human intervention are inversely proportionate when it comes to database maintenance; finally, a fledgling discipline like empirical legal studies in EU law imposes a disproportionate time commitment and financial responsibility on a small group of researchers.
Edited by
Daniel Naurin, University of Oslo,Urška Šadl, European University Institute, Florence,Jan Zglinski, London School of Economics and Political Science
Empirical legal studies in EU law routinely, if not inevitably, engage with text. From the decisions of national courts applying EU law, applicants’ case filings, to the Court’s own jurisprudence, these texts are an invaluable source of information for researchers seeking to understand the dynamics involved in the shaping of EU law and its broader societal impact. Distilling relevant information from legal texts, however, is anything but trivial. Intended to serve as a reference manual, the chapter offers detailed guidelines to researchers of both law and political science interested in employing a text-as-data approach to the study of EU law. To this end, we elaborate on how to conceptualise real-life phenomena in a way that renders them conducive to measurement, providing practical guidance on hand-coding and the use of deep learning classifiers. Further, we address potential challenges arising in the specific context of EU law. This includes limitations to access to relevant documents, as well as ensuring inter-coder reliability in data collection efforts that require specialised legal expertise.
Despite the widely use and multiple validations of the EURO-D scale, its factor structure is still under debate. Exploratory Graph Analysis (EGA), a novel network psychometric method, offers a promising approach to examining dimensionality. Methodology: 45,390 participants (mean age = 71.27, 57.4% women) from 26 European countries. The sample was randomly split into a derivation sample (n = 22,823) and a cross-validation sample (n = 22,567). EGA was applied to the derivation sample to determine the structure of the EURO-D scale, utilizing two estimation methods: Graphical Least Absolute Shrinkage and Selection Operator (GLASSO) and Triangulated Maximally Filtered Graph (TMFG). The identified factor structures were then tested via Confirmatory Factor Analysis (CFA) in the cross-validation sample for model fit. Results: EGA consistently revealed a two-factor structure with minor differences in the placement of suicidality and fatigue items across estimation methods. CFA results confirmed an adequate model fit for both solutions. Conclusion: This study combines exploratory (EGA) and confirmatory (CFA) approaches, supporting a two-factor structure for the EU-RO-D scale with alternative placements for fatigue and suicidality items. Results are discussed in contrast to previous studies reporting two and three-factor solutions with different assignments of these items.
Scholars engaged in comparative research on democratic regimes are in sharp disagreement over the choice between a dichotomous or graded approach to the distinction between democracy and nondemocracy. This choice is substantively important because it affects the findings of empirical research. It is methodologically important because it raises basic issues, faced by both qualitative and quantitative analysts, concerning appropriate standards for justifying choices about concepts. Generic claims that the concept of democracy should inherently be treated as dichotomous or graded are incomplete. The burden of demonstration should instead rest on more specific arguments linked to the goals of research. This chapter thus takes the pragmatic position that how scholars understand and operationalize a concept can and should depend in part on what they are going to do with it. The chapter considers justifications focused on the conceptualization of democratization as an event, the conceptual requirements for analyzing subtypes of democracy, the empirical distribution of cases, normative evaluation, the idea of regimes as bounded wholes, and the goal of achieving sharper analytic differentiation.
The challenge of finding appropriate tools for measurement validation is an abiding concern in political science. This chapter considers four traditions of validation, using examples from cross-national research on democracy: the levels-of-measurement approach, structural-equation modeling with latent variables, the pragmatic tradition, and the case-based method. Methodologists have sharply disputed the merits of alternative traditions. The chapter encourages scholars – and certainly analysts of democracy – to pay more attention to these disputes and to consider strengths and weaknesses in the validation tools they adopt. An appendix summarizes the evaluation of six democracy data sets from the perspective of alternative approaches to validation.
This study assessed the construct validity, predictive validity, and responsiveness of the 4-metre walk test (4MWT) in community-dwelling older Canadians.
Methods
Baseline and 3-year follow-up data from the Canadian Longitudinal Study on Aging were examined, including participants ≥ 65 years with 4MWT assessments. Secondary outcomes included physical and self-report measures and healthcare utilization (e.g., hospitalization and emergency department visits).
Results
Baseline data on 12,433 and follow-up data on 10,107 participants were analysed. For construct validity, low-to-high correlations with the comparator measures (rho = 0.25 [with the Life Space Assessment] to 0.72 [with the Timed-Up and Go]) and known-groups differences of 0.15 m/s (assistive device use) and 0.04 m/s (falls) were found. For predictive validity, areas under the curve ranged from 0.51 to 0.59 for healthcare utilization, indicating poor prediction. For responsiveness, low-to-moderate correlations between change scores were found (rho = 0.01–0.44).
Conclusions
Findings demonstrated partial support for construct validity and responsiveness and no support for predictive validity.
Microaggressions have been a topic of significant debate in the psychological and social sciences. Despite an extensive body of empirical evidence, numerous misconceptions persist. This paper deconstructs common misconceptions surrounding microaggressions and addresses their origins, underlying biases, and empirical refutations. We explain the mechanisms that cause and maintain microaggressions through a CBT lens. We examine widely propagated misconceptions, including claims that microaggressions lack scientific validity, are too subjective to measure, and are not indicative of racism or other forms of prejudice. Drawing on the substantial literature base, including validated psychometric scales, experimental studies, and cross-cultural analyses, we demonstrate that microaggressions are not only real but also have significant psychological and social consequences. Empirical evidence links microaggressions to outcomes such as depression, anxiety, and lower self-esteem, reinforcing their relevance in clinical, educational, and workplace settings. CBT models provide a useful lens for understanding how individuals navigate the psychological complexities associated with microaggressive behaviours, helping explain why some people resist acknowledging microaggressions and their consequences. Lastly, we highlight the importance of education for reducing the prevalence of microaggressions and mitigating their harmful effects. Our goal is to provide clinicians with correct information so that they may skilfully and empathetically help clients experiencing microaggressions, and to no longer accept microaggressions as a harmless, misunderstood, or dismissed phenomenon. By debunking these misconceptions, this work contributes to a more scientifically grounded understanding of microaggressions, emphasizing the necessity of continued research and intervention efforts to address the impact of discrimination in society.
Key learning aims
(1) Build awareness around the various misconceptions associated with microaggressions.
(2) Knowledge of why these misconceptions exist, where they came from, and why they are important to consider and refute.
(3) Refuting misconceptions with scientific explanations and evidence.
(4) Understand how CBT clinicians can better prevent and respond to microaggressions.
This study aimed to culturally adapt the Self-Blame Attributions for Cancer Scale (SBAC) into Turkish and evaluate its psychometric properties, including validity and reliability.
Method
This methodological study enrolled 161 patients from both inpatient and outpatient oncology departments of a university hospital during a 1-year observation period (March 2024–March 2025). Participant data were obtained by using 2 instruments: a demographic questionnaire and the adapted Turkish version of “the SBAC.”
Results
Confirmatory factor analysis revealed strong factor loadings ranging from 0.670 to 0.850, indicating good item reliability. Model fit statistics demonstrated excellent psychometric properties (χ2/df = 2.00; root mean square error of approximation = 0.079; Comparative Fit Index = 0.99; standardized root mean square residual = 0.042; Tucker–Lewis Index = 0.98; root mean square residual = 0.042). The scale showed high internal consistency, with a total Cronbach’s α of 0.93 and subscale α coefficients ranging from 0.85 to 0.90. The original 2-factor structure of the SBAC was supported.
Conclusion
The study confirmed the bidimensional structure (11 items) of SBAC’s Turkish version with excellent validity and reliability indices, supporting its cultural and psychometric adequacy for Turkish samples.
Beginning with the eerie history of Edinburgh’s South Bridge vaults, Chapter 3 investigates how supernatural encounters are often reported in places associated with death, decay, and sensory uncertainty. Here, we explore the connection between electromagnetic fluctuations, ambiguous sensory experiences, and supernatural perceptions. The chapter explores the human tendency to assign meaning to ambiguous stimuli and introduces key concepts in measurement science, such as reliability and validity. It also addresses the limited evidence for human sensitivity to EMF changes. Disruptions in spatial and body awareness in the brain can lead to experiences like feeling a presence or seeing a shadow figure. Together, these ideas offer plausible brain-based explanations for some ghostly encounters and demonstrate how the brain strives to make sense of the unknown when sensory information is unclear.
Bilinguals vary in their daily-life language use and switching behaviours, which are also frequently studied in relation to other processes (e.g., executive control). Measuring daily-life language use and switching often relies on self-reported questionnaires, but little is known about the validity of these questionnaires. Here, we present two studies examining test–retest reliability and validity of language-use questionnaires (relative to Ecological Momentary Assessment, Study 1) and language-switching questionnaires and tasks (relative to recorded daily-life conversations, small-scale Study 2). Test–retest reliability and validity of the LSBQ (Anderson et al., 2018) were high and moderate, respectively, suggesting this questionnaire can capture daily-life language use well. Although only examined with a small sample size, Study 2 suggested relatively low validity of most language-switching questionnaires, with short language-production tasks potentially offering a more valid assessment. Together, these studies suggest that tools are available to reliably capture language use and switching with (a certain degree of) validity.
Mental health conditions among youths are increasing rapidly, taking into consideration their biological, psychological and social development in the time of technological advancement with its associated challenges. Therefore, this study examined the psychometric properties of eight mental health scales among Ghanaian youth. A total of 708 youths (62.1% females; 10–29 years) from junior high schools, senior high schools and a university were recruited to respond to measures on depression, anxiety, somatic symptoms, obsessive–compulsive symptoms, insomnia, smartphone application-based addiction, internet addiction, life satisfaction, stress and cognitive fatigue. Confirmatory factor analysis (CFA) and Pearson’s r were used to analyse the data. The findings indicated acceptable CFA fit for all scales (comparative fit index [CFI] >0.9, Tucker–Lewis index [TLI] >0.9, root mean square error of approximation [RMSEA] <0.08 and standardized root mean square residual [SRMR] <0.08), and internal reliability was satisfactory (Cronbach’s α = 0.774–0.868 and McDonald’s ω = 0.775–0.870). Correlation analyses showed significant relationships between all the measures except for life satisfaction and internet addiction, and stress and life satisfaction. Both the CFA indices and correlation analyses indicate that all the mental health measures demonstrate acceptable initial evidence of reliability and construct validity.
This chapter explores how to get and prepare quantitative data prior to analysis. Use theory to identify the unit of analysis for your study, then determine the population and sample for your study. Be sure to capture appropriate variation in the DV and be alert for selection bias in how cases enter the sample. Issues of validity and reliability can potentially cause major problems with your analysis. Again, use your theory to carefully match indicators to concepts to minimize the risk of these problems. Think through the data collection process and plan ahead to maximize efficiency; gather all data for control variables and robustness checks in a single sweep, if possible. Much data, particularly for standard indicators of common concepts, is freely available online through a variety of sources, and your library probably also subscribes to other quantitative databases. Collecting new data is substantially more time-consuming than using previously-gathered data, but it is often necessary to test novel theories. Whether you use existing data or novel data, be sure to define your data needs list before beginning data collection, allow sufficient time, and document and back up everything.
Accountability in grant-making requires a valid, fair and transparent selection process. This study proposes a four-step framework for validating such a process: determine standards for qualified applicants, assess inter-reviewer reliability, assess factorial validity, and assess reliability. This framework is applied to the Corporation for National and Community Service’s 2013 RSVP grant-making process. The standards were close to the highest points of reliability. Inter-reviewer reliability was above 0.90, a common threshold for high-stakes measurement. After conducting confirmatory factor analysis, the final model merged two of the original five domains of selection criteria, resulting in four domains. The final model was found to have strict measurement invariance, high convergent validity, and measurement reliability between 0.88 and 0.93 for all domains. The results validate the 2013 review process and indicated that the scores exhibited high degrees of reliability, giving public assurance that the process was sufficiently objective and accurately reflected program priorities.
This study examines the psychometric properties of a new self-report instrument to measure organisational connectedness (The Four-Dimensional Connectedness Scale; 4DCS) in two volunteer samples: state emergency service volunteers and volunteer ambulance workers. Confirmatory factor analyses in both studies supported the proposed four-factor structure of the 4DCS (other workers, recipients, task and values). In addition, confirmatory factor analyses showed that connectedness, commitment and engagement were separate constructs—a three-factor model with a Connectedness factor, a Commitment factor and an Engagement factor fitted best to the data. Moreover, hierarchical multiple regression analyses revealed that connectedness and engagement each shared unique variance with job satisfaction and intention to continue. The results confirm the factorial, discriminant and predictive validity of connectedness relative to engagement and commitment. It is concluded that the 4DCS has acceptable psychometric properties and that the instrument can be used to study volunteer wellbeing.
Expert surveys have been used to measure a wide variety of phenomena in political science, ranging from party positions, to corruption, to the quality of democracy and elections. However, expert judgments raise important validity concerns, both about the object being measured as well as the experts. It is argued in this article that the context of evaluation is also important to consider when assessing the validity of expert surveys. This is even more important for expert surveys with a comprehensive, worldwide scope, such as democracy or corruption indices. This article tests the validity of expert judgments about election integrity – a topic of increasing concern to both the international community and academics. Evaluating expert judgments of election integrity provides an important contribution to the literature evaluating the validity of expert surveys as instruments of measurement as: (1) the object under study is particularly complex to define and multifaceted; and (2) election integrity is measured in widely varying institutional contexts, ranging from electoral autocracies to liberal democracies. Three potential sources of bias are analysed (the object, the experts and the context), using a unique new dataset on election integrity entitled the ‘Perceptions of Electoral Integrity’ dataset. The data include over 800 experts in 66 parliamentary and presidential elections worldwide. It is found that validity of expert judgments about election integrity is increased if experts are asked to provide factual information (rather than evaluative judgments), and if they are asked to evaluate election day (rather than pre‐election) integrity. It is also found that ideologically polarised elections and elections of lower integrity increase expert disagreement about election integrity. The article concludes with suggestions for researchers using the expert survey data on election integrity on how to check the validity of their data and adjust their analyses accordingly, and outlines some remaining challenges for future data collection using expert surveys.
Democracy measurement is an ever growing and increasingly important research area. Nevertheless, lively discussions concerning the qualities of different measurement approaches are seldom combined with an adequate perspective on the underlying methodological framework. This article argues that a substantial theoretical perspective is only a sufficient condition for improving contemporary democracy measurement. Theoretical considerations have to be accompanied by an equally well-developed measurement concept. On the basis of examples taken from prominent approaches, potential for improvement becomes obvious. Any improvement is not just an end in itself but necessary if these measures are used as variables in all areas of research.
Over recent decades, comparative political scientists have developed new measures at a rate of knots that evaluate the quality of democratic regimes. These indices have been broadly applied to assess the quality of democracy cross-nationally and to test the generalisability of theories regarding its causes and effects. However, the validity of these inferences is jeopardised by the fact that the quality of democracy is an abstract and contested concept. In order to address this eventuality, researchers constructing indices measuring the quality of democracy as well as researchers applying these indices should critically examine the quality of the indices. Owing to the absence of a standardised framework that is both suitable for the evaluation of contested concepts and that includes explicit coding rules so as to be directly applicable, this article seeks to fill this gap. The application of our framework is demonstrated by an evaluation of the Sustainable Governance Indicators, the Global Democracy Ranking and the Democracy Barometer. As indicated by our evaluation, the framework is a practical tool that helps to assess the conceptual foundation, validity, reliability and replicability of indices. In addition, it can be used to study the quality of indices in a comparable manner.
While there is an abundant use of macro data in the social sciences, little attention is given to the sources or the construction of these data. Owing to the restricted amount of indices or items, researchers most often apply the ‘available data at hand’. Since the opportunities to analyse data are constantly increasing and the availability of macro indicators is improving as well, one may be enticed to incorporate even qualitatively inferior indicators for the sake of statistically significant results. The pitfalls of applying biased indicators or using instruments with unknown methodological characteristics are biased estimates, false statistical inferences and, as one potential consequence, the derivation of misleading policy recommendations. This Special Issue assembles contributions that attempt to stimulate the missing debate about the criteria of assessing aggregate data and their measurement properties for comparative analyses.
In recent years, the L2 Motivational Self System has faced increasing scrutiny over its theoretical clarity and empirical rigor. One element of this model, the L2 Learning Experience, remains ambiguously defined and theoretically underdeveloped. This study examined the content validity of the L2 Learning Experience scale and its potential overlap with intrinsic motivation, a cornerstone of self-determination theory. Using a panel of experts, we assessed to what extent items traditionally associated with the L2 Learning Experience scale align with their intended construct. Findings revealed that the items were predominantly identified as intrinsic motivation, not the L2 Learning Experience. These results suggest a significant overlap between the two constructs and raise concerns about a potential jangle fallacy. Our results also underscore the need for greater theoretical and terminological clarity in the field. Aligning language learning motivation research with broader psychological frameworks could lead to more parsimonious and robust theoretical models.