To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter offers readers a transparent view into the research methodology used to investigate mathematics anxiety and assess the impact of a targeted pedagogical intervention on students’ reported anxiety and attitudes towards statistics and quantitative research methods. It provides a detailed account of the research participants, ethical considerations, and the multi-mixed methods approach employed. The chapter also critiques the validity, reliability, and trustworthiness of the research design and findings, ensuring methodological rigour. A candid discussion of the study’s limitations further strengthens its credibility. It is an essential reading for educators, researchers, and anyone committed to evidence-based improvements in mathematics education.
Edited by
Daniel Naurin, University of Oslo,Urška Šadl, European University Institute, Florence,Jan Zglinski, London School of Economics and Political Science
The chapter discusses the creation and maintenance of databases offering accurate, research-ready data for multidisciplinary use. It draws on the experience with the IUROPA CJEU Database Project (IUROPA), which has collected data about the decision-makers and the decisions of the Court of Justice of the European Union (CJEU). IUROPA and similar multi-user databases must live up to four criteria for databases, as proposed by Weinshall and Epstein. First, they must address real-world problems. Second, they must be open and accessible. Third, they must deliver reliable and reproducible data. Fourth, they must be ageless and easily calibrated to research purposes unknown at the time of data collection and cleaning. These criteria involve trade-offs: the quest for reliability may, first, precipitate difficult choices such as whether to discard or improve upon ‘imperfect’ data or tempt creators to endlessly postpone publication of ‘incomplete’ data; second, sustainability and human intervention are inversely proportionate when it comes to database maintenance; finally, a fledgling discipline like empirical legal studies in EU law imposes a disproportionate time commitment and financial responsibility on a small group of researchers.
Edited by
Daniel Naurin, University of Oslo,Urška Šadl, European University Institute, Florence,Jan Zglinski, London School of Economics and Political Science
Empirical legal studies in EU law routinely, if not inevitably, engage with text. From the decisions of national courts applying EU law, applicants’ case filings, to the Court’s own jurisprudence, these texts are an invaluable source of information for researchers seeking to understand the dynamics involved in the shaping of EU law and its broader societal impact. Distilling relevant information from legal texts, however, is anything but trivial. Intended to serve as a reference manual, the chapter offers detailed guidelines to researchers of both law and political science interested in employing a text-as-data approach to the study of EU law. To this end, we elaborate on how to conceptualise real-life phenomena in a way that renders them conducive to measurement, providing practical guidance on hand-coding and the use of deep learning classifiers. Further, we address potential challenges arising in the specific context of EU law. This includes limitations to access to relevant documents, as well as ensuring inter-coder reliability in data collection efforts that require specialised legal expertise.
Intraclass correlation coefficient (ICC) estimates are necessary for several statistical techniques. Researchers need accurate ICC estimates when conducting prospective power analyses for clustered data scenarios. In addition, meta-analysts require reasonable ICC values when adjusting effect size estimates to account for clustered primary study data or to correct for psychometric artifacts when using the ICC as a reliability measure. The validity of these analyses hinges on the accuracy of the ICC estimate. Beyond these secondary analyses, ICC estimates have been used as the focal outcome of meta-analysis itself to obtain a pooled measure of agreement, reliability, or the influence of a cluster’s effect. This study evaluates how well meta-analytically pooled ICC estimates recover the population ICC parameter value when using different ICC variance formulas as the inverse variance weights used in the pooling. We found that the variance formula that uses a normalizing transformation performs best across most conditions.
This study aimed to culturally adapt the Self-Blame Attributions for Cancer Scale (SBAC) into Turkish and evaluate its psychometric properties, including validity and reliability.
Method
This methodological study enrolled 161 patients from both inpatient and outpatient oncology departments of a university hospital during a 1-year observation period (March 2024–March 2025). Participant data were obtained by using 2 instruments: a demographic questionnaire and the adapted Turkish version of “the SBAC.”
Results
Confirmatory factor analysis revealed strong factor loadings ranging from 0.670 to 0.850, indicating good item reliability. Model fit statistics demonstrated excellent psychometric properties (χ2/df = 2.00; root mean square error of approximation = 0.079; Comparative Fit Index = 0.99; standardized root mean square residual = 0.042; Tucker–Lewis Index = 0.98; root mean square residual = 0.042). The scale showed high internal consistency, with a total Cronbach’s α of 0.93 and subscale α coefficients ranging from 0.85 to 0.90. The original 2-factor structure of the SBAC was supported.
Conclusion
The study confirmed the bidimensional structure (11 items) of SBAC’s Turkish version with excellent validity and reliability indices, supporting its cultural and psychometric adequacy for Turkish samples.
Beginning with the eerie history of Edinburgh’s South Bridge vaults, Chapter 3 investigates how supernatural encounters are often reported in places associated with death, decay, and sensory uncertainty. Here, we explore the connection between electromagnetic fluctuations, ambiguous sensory experiences, and supernatural perceptions. The chapter explores the human tendency to assign meaning to ambiguous stimuli and introduces key concepts in measurement science, such as reliability and validity. It also addresses the limited evidence for human sensitivity to EMF changes. Disruptions in spatial and body awareness in the brain can lead to experiences like feeling a presence or seeing a shadow figure. Together, these ideas offer plausible brain-based explanations for some ghostly encounters and demonstrate how the brain strives to make sense of the unknown when sensory information is unclear.
The Canadian Ultra-Processed Product Screener (CUPS) was developed to rapidly assess ultra-processed food (UPF) and drink product intake among Canadian adults. The CUPS is an online self-administered screener that includes twenty-eight questions and assesses the intake of a variety of UPF available in Canada, both in French and English. This study aimed to assess the construct validity and reliability of the CUPS among a sample of adults in Canada.
Design:
Cross-sectional study (between July and November 2023).
Settings:
Participants completed the online CUPS screener in three versions (1-d (twice), 7-d and 30-d CUPS) and three 24-h dietary recalls (24HR) (the reference measure) over the course of 26–28 d.
Participants:
354 Canadians aged 18–60 years
Results:
The CUPS had an acceptable construct validity, with moderate correlation coefficients between the CUPS score and UPF consumption level measured using multiple 24HR (from 0·33 to 0·44). Reproducibility was also acceptable (intraclass correlation = 0·61) and internal consistency ranged from good to excellent (Cronbach’s α = 0·72 for the 1-d and 0·86 for the 30-d CUPS). CUPS scores were also associated with higher intake of added sugars, saturated fats and Na.
Conclusions:
This study provides evidence supporting the construct validity and reliability of the CUPS among Canadian adults. The CUPS is useful for identifying low and high consumers of UPF and could serve as a proxy measure for one key dimension of diet quality, which is the type of food processing.
Punching shear failure in slab-column connections is a brittle collapse mode that threatens the safety of flat reinforced concrete (RC) slabs. Conventional design provisions are generally conservative but exhibit inconsistencies across geometric and material variations. This study develops an eXtreme Gradient Boosting (XGBoost) model to predict the ultimate punching shear capacity of flat RC slabs, using a database of experimental results categorized by four different geometric domains, including square slab with square column, circular slab with circular column, square slab with circular column, and circular slab with square column, covering the geometric, materials strength, and reinforcement properties of input parameters. The model achieved high predictive accuracy across the domains with coefficient of determination (R2) values > 0.930 in unseen testing datasets with minimal bias (0.994–1.006) and reduced scatter. Model interpretability, addressed through the SHapley Additive exPlanations analysis, confirmed slab thickness and average effective depth as the most critical predictors of shear capacity, followed by concrete strength and reinforcement parameters, while boundary condition parameters showed negligible influence due to the predominance of interior column cases. These findings demonstrate that XGBoost provides accurate, reliable, and interpretable predictions of punching shear capacity, offering a data-driven alternative to code-based methods and supporting safer and more consistent design of flat RC slabs.
Following a trend across the sciences, recent studies in lithic analysis have embraced the ideal of replicability. Recent large-scale studies have demonstrated that high replicability is achievable under controlled conditions and have proposed strategies to improve it in lithic data recording. Although this focus has yielded important methodological advances, we argue that an overemphasis on replicability risks narrowing the scope of archaeological inquiry. More specifically, we show (1) that replicability alone does not guarantee reliability, interpretive value, or cost effectiveness, and (2) that archaeological data often involve unavoidable ambiguity due to preservation, analyst background, and the nature of lithic variability itself. Instead of allowing replicability to dictate research priorities, we advocate for a problem-driven, pluralistic approach that tailors methods to research questions and balances replicable measures with interpretive depth. This has practical implications for training, publishing, and funding policy. We conclude that Paleolithic archaeology must engage with the replicability movement on its own terms—preserving methodological diversity while maintaining scientific credibility.
Bilinguals vary in their daily-life language use and switching behaviours, which are also frequently studied in relation to other processes (e.g., executive control). Measuring daily-life language use and switching often relies on self-reported questionnaires, but little is known about the validity of these questionnaires. Here, we present two studies examining test–retest reliability and validity of language-use questionnaires (relative to Ecological Momentary Assessment, Study 1) and language-switching questionnaires and tasks (relative to recorded daily-life conversations, small-scale Study 2). Test–retest reliability and validity of the LSBQ (Anderson et al., 2018) were high and moderate, respectively, suggesting this questionnaire can capture daily-life language use well. Although only examined with a small sample size, Study 2 suggested relatively low validity of most language-switching questionnaires, with short language-production tasks potentially offering a more valid assessment. Together, these studies suggest that tools are available to reliably capture language use and switching with (a certain degree of) validity.
Mental health conditions among youths are increasing rapidly, taking into consideration their biological, psychological and social development in the time of technological advancement with its associated challenges. Therefore, this study examined the psychometric properties of eight mental health scales among Ghanaian youth. A total of 708 youths (62.1% females; 10–29 years) from junior high schools, senior high schools and a university were recruited to respond to measures on depression, anxiety, somatic symptoms, obsessive–compulsive symptoms, insomnia, smartphone application-based addiction, internet addiction, life satisfaction, stress and cognitive fatigue. Confirmatory factor analysis (CFA) and Pearson’s r were used to analyse the data. The findings indicated acceptable CFA fit for all scales (comparative fit index [CFI] >0.9, Tucker–Lewis index [TLI] >0.9, root mean square error of approximation [RMSEA] <0.08 and standardized root mean square residual [SRMR] <0.08), and internal reliability was satisfactory (Cronbach’s α = 0.774–0.868 and McDonald’s ω = 0.775–0.870). Correlation analyses showed significant relationships between all the measures except for life satisfaction and internet addiction, and stress and life satisfaction. Both the CFA indices and correlation analyses indicate that all the mental health measures demonstrate acceptable initial evidence of reliability and construct validity.
This chapter explores how to get and prepare quantitative data prior to analysis. Use theory to identify the unit of analysis for your study, then determine the population and sample for your study. Be sure to capture appropriate variation in the DV and be alert for selection bias in how cases enter the sample. Issues of validity and reliability can potentially cause major problems with your analysis. Again, use your theory to carefully match indicators to concepts to minimize the risk of these problems. Think through the data collection process and plan ahead to maximize efficiency; gather all data for control variables and robustness checks in a single sweep, if possible. Much data, particularly for standard indicators of common concepts, is freely available online through a variety of sources, and your library probably also subscribes to other quantitative databases. Collecting new data is substantially more time-consuming than using previously-gathered data, but it is often necessary to test novel theories. Whether you use existing data or novel data, be sure to define your data needs list before beginning data collection, allow sufficient time, and document and back up everything.
We revisit the question of how to include parameter uncertainty in univariate parametric models of losses and loss ratios. We first review the statistical theory for including parameter uncertainty based on right Haar priors (RHPs), which applies to many commonly used models. In this theory, the prior is chosen in such a way as to ensure matching between predicted probabilities and the relative frequencies of future outcomes in repeated tests. This property is known as reliability, or calibration. We then test priors for including parameter uncertainty in a number of models not covered by RHP theory. For these models, we find priors that generate predictions that are more reliable than predictions based on maximum likelihood, although they are not perfectly reliable. We discuss numerical schemes that can be used to generate Bayesian predictions, including a novel use of asymptotic expansions, and we include an example in which we show the impact of including parameter uncertainty in the modeling of extreme hurricane losses. The tail loss estimates show material increases due to the inclusion of parameter uncertainty. Finally, we describe a new software library that makes it straightforward to apply the methods we describe.
Democracy measurement is an ever growing and increasingly important research area. Nevertheless, lively discussions concerning the qualities of different measurement approaches are seldom combined with an adequate perspective on the underlying methodological framework. This article argues that a substantial theoretical perspective is only a sufficient condition for improving contemporary democracy measurement. Theoretical considerations have to be accompanied by an equally well-developed measurement concept. On the basis of examples taken from prominent approaches, potential for improvement becomes obvious. Any improvement is not just an end in itself but necessary if these measures are used as variables in all areas of research.
The reliability of volunteers is a major concern for many nonprofit organizations. To address this problem in more detail, we develop a theoretical model of volunteer reliability based on psychological contract theory. By taking this perspective as a starting point, we explore how individual volunteer characteristics, organizational factors, and sociological developments shape the exchange of inducements and contributions between volunteers and nonprofit organizations. We discuss how these factors can create tensions in the psychological contract and determine the extent to which volunteers behave reliably. As such, we develop a theoretical framework for addressing the reliability problem in volunteer management.
Over recent decades, comparative political scientists have developed new measures at a rate of knots that evaluate the quality of democratic regimes. These indices have been broadly applied to assess the quality of democracy cross-nationally and to test the generalisability of theories regarding its causes and effects. However, the validity of these inferences is jeopardised by the fact that the quality of democracy is an abstract and contested concept. In order to address this eventuality, researchers constructing indices measuring the quality of democracy as well as researchers applying these indices should critically examine the quality of the indices. Owing to the absence of a standardised framework that is both suitable for the evaluation of contested concepts and that includes explicit coding rules so as to be directly applicable, this article seeks to fill this gap. The application of our framework is demonstrated by an evaluation of the Sustainable Governance Indicators, the Global Democracy Ranking and the Democracy Barometer. As indicated by our evaluation, the framework is a practical tool that helps to assess the conceptual foundation, validity, reliability and replicability of indices. In addition, it can be used to study the quality of indices in a comparable manner.
While there is an abundant use of macro data in the social sciences, little attention is given to the sources or the construction of these data. Owing to the restricted amount of indices or items, researchers most often apply the ‘available data at hand’. Since the opportunities to analyse data are constantly increasing and the availability of macro indicators is improving as well, one may be enticed to incorporate even qualitatively inferior indicators for the sake of statistically significant results. The pitfalls of applying biased indicators or using instruments with unknown methodological characteristics are biased estimates, false statistical inferences and, as one potential consequence, the derivation of misleading policy recommendations. This Special Issue assembles contributions that attempt to stimulate the missing debate about the criteria of assessing aggregate data and their measurement properties for comparative analyses.
Disasters significantly challenge societal resilience, individual psychological health, and sustainable development. This study aimed to culturally adapt the Disaster Adaptation and Resilience Scale (DARS) into Turkish and evaluate its psychometric properties for use in Türkiye. Participants (N = 335) aged 18 and older who had experienced a disaster in the past 5 years completed the Turkish version of the DARS following rigorous translation and expert review procedures. Exploratory and confirmatory factor analyses revealed a 5-factor structure: Problem-Solving, Optimism, Stress Management, Social Resources, and Physical Resources, accounting for 61.3% of the total variance. Internal consistency was high (Cronbach’s Alpha = 0.910), with subscale values ranging from 0.785 to 0.901. Test-retest reliability and discriminant validity were also established. The Turkish DARS is a valid and reliable tool for evaluating disaster-related adaptation and resilience. Its implementation supports sustainable mental health responses and community preparedness in disaster-prone regions.
The 10-item Beliefs About Penis Size Scale (BAPS; Veale et al., 2014) measures boys’ and men’s beliefs about masculinity and shame related to their penis size. Penis size is a primary appearance concern of men, and these concerns may result in penile dysmorphic disorder, which is a form of body dysmorphic disorder specifically focused on being preoccupied with and distressed by one’s penis size. The BAPS can be administered online or in-person to adolescents and adults and is free to use. This chapter discusses the development of the BAPS and provides evidence of its psychometrics. Findings suggest that the BAPS is a unidimensional measure. Internal consistency reliability as well as convergent, concurrent, and discriminant validity support the use of the BAPS with boys and men. This chapter provides the BAPS items in their entirety, instructions for administering the BAPS to participants, item response scale, and scoring procedure. Logistics of use, such as permissions, copyright, and contact information, are provided for readers.
The Photographic Figure Rating Scale (PFRS; Swami et al., 2008) is a figural rating scale developed to assess body dissatisfaction (actual-ideal body size discrepancy) and consists of 10 photographic images of real women varying in body mass index from emaciated to “obese”. The PFRS can be administered online or in-person to women and is free to use for non-commercial purposes. This chapter discusses the development of the original PFRS, before providing evidence of its psychometric properties. Specifically, scores on the PFRS have been found to have adequate test-retest reliability and good patterns of convergent and criterion-related validity. Next, this chapter provides the PFRS images, as well as full instructions for administration to participants, the suggested questions, and the scoring procedure. Known translations are described and logistics of use are provided for readers.