To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In a recent article in this journal, Stockemer characterizes fuzzy-set Qualitative Comparative Analysis (fsQCA) in comparison with Ordinary Least Squares regression as a ‘poor methodological choice’ because of its ‘suboptimal nature’ for the study of descriptive female representation in national assemblies across the globe. This article seeks to demonstrate that his judgments are based on two misconceptions: first, a misunderstanding of set-theoretical thinking in general, and specifically Qualitative Comparative Analysis; and second, a misinformed application throughout various steps of the fsQCA, for example, the calibration process, the analysis of necessary and sufficient conditions, and the interpretation of the results. In pointing out the weaknesses of Stockemer’s application of OLS, we argue – in contrast to Stockemer – that fsQCA can be a valuable tool for the comparative study of social phenomena, which offers a fundamentally different analytical perspective from standard quantitative techniques.
Unlike theories in physics, social theories are usually tested by applying statistical hypothesis testing to observational data. This approach has more problems than commonly realised owing to false positives and model uncertainty, and it inhibits the development of social science. A better method to test a theory is by making definitive predictions that, if proved, strongly support the theory against alternative explanations.
In order to have references for discussing mathematical menus in political science, I review the most common types of mathematical formulae used in physics and chemistry, as well as some mathematical advances in economics. Several issues appear relevant: variables should be well defined and measurable; the relationships between variables may be non-linear; the direction of causality should be clearly identified and not assumed on a priori grounds. On these bases, theoretically driven equations on political matters can be validated by empirical tests and can predict observable phenomena.
In this rebuttal piece to Buche et al, I reiterate my criticism of fuzzy set analysis as a method that is poorly suited to study women’s representation in parliament and other rather complex phenomena. The use of fuzzy set analysis is problematic from the onset, because this method asks the researchers to distinguish meaningful from non-meaningful variation and set benchmarks for high women’s representation, the cross-over point and low or non-high representation. Yet, in the study of women’s representation, the distinction between meaningful and non-meaningful variation and the setup of these benchmarks is problematic if not impossible, even with case-specific knowledge. For example, in Asia and Latin America can we talk about high women’s representation if there are 30 per cent women deputies, 35 per cent women deputies or 40 per cent women deputies? Neither the literature nor Buche et al give an answer to this question. This problem of arbitrarily setting benchmarks is magnified by the non-robust findings and low coverage of this method. It is disturbing if we get a completely different combination of conditions, if we slightly change the benchmark for high women’s representation and/or that of some of the conditions or independent variables. Because of these reasons, researchers should refrain from using fuzzy set analysis, when explaining variation in the number of deputies in parliament.
In this article I compare the results of Qualitative Comparative Analysis (fsQCA) applied to a medium-sized data set on women's legislative representation in Asian and Latin American countries to those of regression analysis based on the same data set. I find that both methods are suboptimal. Explaining the outcome of high women's representation, fsQCA suggests complex configurations of conditions with low empirical coverage and high sensitivity to coding. While, not without shortcomings, OLS regression analysis performs somewhat better than fsQCA. On the one hand, this method identifies two statistically significant and substantively relevant variables (i.e. quota rules and communist regimes), which strongly increase the percentage of women deputies. On the other hand, the model's interpretation is not completely clear cut, as scholars may disagree over the relevance of the one marginally statistically and substantively significant variable, the longevity of democracy.
Social scientists often face a fundamental problem: Did I leave something causally important out of my explanation? How do I diagnose this? Where do I look for solutions to this problem? We build bridges between regression models and qualitative comparative analysis by comparing diagnostics and solutions to the problem of omitted variables and conditions. We then discuss various approaches and tackle the theoretical issues around causality which must be addressed before attending to technical fixes. In the conclusions, we reflect on the bridges built between the two traditions and draw more general lessons about the logic of social science research.
This article investigates how early modern migrants articulated identification with their host society in the context of the late eighteenth-century Dutch Republic, a period preceding modern nationalism. Drawing on a unique dataset derived from the Prize Papers – a collection of testimonies from captured sailors interrogated by British Admiralty courts – we analyze migrants’ declarations of sovereign allegiance. We assess how factors such as duration of residence, local citizenship (poorterschap), occupational rank, and marital status influenced migrants’ identification with their adopted polity. Using logistic regression, we find that civic institutional embeddedness, reflected in city citizenship, and occupational rank, especially among ship captains, significantly predicted identification with the Dutch Republic. In contrast, duration of residence and marital status had weak and statistically insignificant effects. Our findings highlight that pre-national forms of identification were deeply embedded in civic and institutional contexts rather than simply reflecting modern nationalist sentiments. By combining quantitative analysis with targeted archival research into individual biographies, this study demonstrates the complex interplay between institutional opportunities and personal networks in shaping migrants’ allegiances, thereby offering a nuanced historical perspective relevant to contemporary debates on civic integration.
This article uses amphora quantification and regression analysis to trace economic changes in the Mediterranean between the Principate (27 bc to ad 284) and Late Antiquity. It indicates that, during the Principate, there was a clear pattern of amphora distribution across the Mediterranean, which can be explained by the predominance of market forces among the factors governing trade. In contrast, the weak correlation between exports and prices observed in Late Antiquity suggests a significant shift in the underlying principles of trade during this period.
This chapter covers essential inferential statistical tests used in applied linguistics research, focusing on their purpose, assumptions, and interpretation. It classifies the tests into parametric and nonparametric categories, starting with parametric tests. Key tests such as t-tests (independent and paired samples), ANOVAs (one-way, two-way, and repeated measures), and regression analyses (simple linear, multiple, and logistic) are explained in detail, highlighting their importance in comparing group means, analyzing variance, and predicting outcomes. The chapter also covers assumptions such as normality, homogeneity of variance, and linearity, explaining how to assess them and handle violations. Hypothetical scenarios are used to illustrate their application to real-world research questions. Step-by-step instructions for using SPSS to run these tests are provided, along with guidance on interpreting outputs including p-values, effect sizes, and regression coefficients. By the end, you will understand when and how to use parametric tests and analyze data effectively in SPSS.
Metabolic syndrome (MetS) is associated with an increased risk of CVD, type 2 diabetes and death from all causes. Dietary factors correlate with MetS, making diet a potential target for intervention. We used data from the 2012–2016 Korea National Health and Nutrition Examination Survey (KNHANES, n 12 122) to identify a dietary pattern (DP) using thirty-nine predefined food groups as predictors. MetS components were used as the response variable with the food groups in reduced rank regression followed by stepwise linear regression analyses. We then verified the Korean status of the DP externally in the Cardiovascular Disease Association Study (CAVAS) (n 8277) and the Health EXAminees (HEXA) study (n 48 610). The DP score, which included twenty food groups, showed significant positive associations with all MetS components and a higher prevalence ratio in KNHANES participants (P < 0·0001). Although the score was NS in CAVAS (P = 0·0913), it showed a strong positive association with MetS prevalence in HEXA (P < 0·0001). We identified and tested a DP associated with MetS in Korean populations. This DP may be a useful tool for assessing MetS risk. Although the score was linked to higher MetS risk, particularly in the predominantly urban population of the HEXA study, further validation in more diverse populations is needed.
In this chapter, we review approaches to model climate-related migration including the multiple goals of modeling efforts and why modeling climate-related migration is of interest to researchers, commonly used sources of climate and migration data and data-related challenges, and various modeling methods used. The chapter is not meant to be an exhaustive inventory of approaches to modeling climate-related migration, but rather is intended to present the reader with an overview of the most common approaches and possible pitfalls associated with those approaches. We end the chapter with a discussion of some of the future directions and opportunities for data and modeling of climate-related migration.
Chapter 3 establishes that the Dutch had economic incentives to continue holding slaves. Slavery in Dutch New York was not just a cultural choice, but was reinforced by economic considerations. From archival sources and published secondary sources, I have compiled a unique dataset of prices for over 3,350 slaves bought, sold, assessed for value, or advertised for sale in New York and New Jersey. This data has been coded by sex, age, county, price, and type of record, among other categories. It is as far as I know the only slave price database for slaves in the Northern states yet assembled. Regression analysis allows us to compute the average price of Northern slaves over time, the relative price difference between male and female slaves, the price trend relative to known prices in the American South, and other variables such as the price differential between New York City slaves and slaves in other counties in the state. Slave prices in New York and New Jersey appear relatively stable over time, but declined in the nineteenth century. The analysis shows that slaveholders in Dutch New York were motivated by profit, and they sought strength and youth in purchasing slaves.
In many regression applications, users are often faced with difficulties due to nonlinear relationships, heterogeneous subjects, or time series which are best represented by splines. In such applications, two or more regression functions are often necessary to best summarize the underlying structure of the data. Unfortunately, in most cases, it is not known a priori which subset of observations should be approximated with which specific regression function. This paper presents a methodology which simultaneously clusters observations into a preset number of groups and estimates the corresponding regression functions' coefficients, all to optimize a common objective function. We describe the problem and discuss related procedures. A new simulated annealing-based methodology is described as well as program options to accommodate overlapping or nonoverlapping clustering, replications per subject, univariate or multivariate dependent variables, and constraints imposed on cluster membership. Extensive Monte Carlo analyses are reported which investigate the overall performance of the methodology. A consumer psychology application is provided concerning a conjoint analysis investigation of consumer satisfaction determinants. Finally, other applications and extensions of the methodology are discussed.
This paper presents an analysis, based on simulation, of the stability of principal components. Stability is measured by the expectation of the absolute inner product of the sample principal component with the corresponding population component. A multiple regression model to predict stability is devised, calibrated, and tested using simulated Normal data. Results show that the model can provide useful predictions of individual principal component stability when working with correlation matrices. Further, the predictive validity of the model is tested against data simulated from three non-Normal distributions. The model predicted very well even when the data departed from normality, thus giving robustness to the proposed measure. Used in conjunction with other existing rules this measure will help the user in determining interpretability of principal components.
Guttman's assumption underlying his definition of “total images” is rejected: Partial images are not generally convergent everywhere. Even divergence everywhere is shown to be possible. The convergence type always found on partial images is convergence in quadratic mean; hence, total images are redefined as quadratic mean-limits. In determining the convergence type in special situations, the asymptotic properties of certain correlations are important, implying, in some cases, convergence almost everywhere, which is also effected by a countable population or multivariate normality or independent variables. The interpretations of a total image as a predictor, and a “common-factor score”, respectively, are made precise.
Whenever statistical analyses are applied to multiply imputed datasets, specific formulas are needed to combine the results into one overall analysis, also called combination rules. In the context of regression analysis, combination rules for the unstandardized regression coefficients, the t-tests of the regression coefficients, and the F-tests for testing \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^{2}$$\end{document} for significance have long been established. However, there is still no general agreement on how to combine the point estimators of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^{2}$$\end{document} in multiple regression applied to multiply imputed datasets. Additionally, no combination rules for standardized regression coefficients and their confidence intervals seem to have been developed at all. In the current article, two sets of combination rules for the standardized regression coefficients and their confidence intervals are proposed, and their statistical properties are discussed. Additionally, two improved point estimators of \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^{2}$$\end{document} in multiply imputed data are proposed, which in their computation use the pooled standardized regression coefficients. Simulations show that the proposed pooled standardized coefficients produce only small bias and that their 95% confidence intervals produce coverage close to the theoretical 95%. Furthermore, the simulations show that the newly proposed pooled estimates for \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$R^{2}$$\end{document} are less biased than two earlier proposed pooled estimates.
This chapter uses the Strategic Displacement in Civil Conflict dataset to conduct a cross-national analysis of displacement by state actors, who it finds are the predominant perpetrators. The statistical tests provide an indirect test of the arguments by revealing where strategic displacement in general, and forced relocation in particular, tends to occur, and by identifying the factors associated with the use of these strategies across conflicts. It also evaluates the observable implications of several alternative explanations for state-induced displacement, including ethnic nationalism, rebel threat/desperation, and collective punishment. The results show that, consistent with the theory, different displacement strategies occur in different contexts and seem to follow different logics. Cleansing is more likely in conventional civil wars, where territorial conquest takes primacy, while forced relocation is more likely in irregular wars, where information and identification problems are most acute. The evidence indicates that cleansing follows a logic of punishment. The results for relocation, however, are consistent with the implications of the assortative theory: It is more likely to be employed by resource-constrained incumbents fighting insurgencies in “illegible” areas – rural, peripheral territories – and when incumbents lack group-level information about wartime loyalties.
SARS-CoV-2 transmission dynamics within households involving children are complex. We examined the association between paediatric index case (PIC) age and subsequent household SARS-CoV-2 transmission among cases reported to the Minnesota Department of Health between March 2021 and February 2022. In our primary analysis, we used logistic regression to estimate odds ratios adjusted for race/ethnicity, sex, geographic region, and disease severity among households with an unvaccinated PIC. We performed a secondary analysis among households where the PIC was eligible for vaccination adjusting for the same covariates plus time since the last vaccination. Both analyses were stratified by variant wave. During the Alpha wave, PICs of all age groups had similar odds of subsequent transmission. During Delta and Omicron waves, PICs aged 16–17 had higher odds of subsequent transmission than PICs aged 0–4 (Delta OR, 1.32; [95% CI, 1.16–1.51], Omicron OR, 4.21; [95% CI, 3.25–5.45]). In the secondary analysis, unvaccinated PICs had higher odds of subsequent transmission than vaccinated PICs (Delta OR 2.89 [95% CI, 2.18–3.84], Omicron OR 1.35 [95% CI, 1.21–1.50]). Enhanced preventative measures, especially for 12–17-year-olds, may limit SARS-CoV-2 transmission within households involving children.
In this chapter, I test the effects of post-Stalinist transitions on two important measures of agency capacity: officers employed and individuals registered as secret informants by coercive agencies. I present an original cross-national dataset on officer and informant numbers for every coercive agency in communist Central and Eastern Europe from 1945 to 1989. I show that countries that experienced post-Stalinist transitions had similarly sized coercive agencies to other states before 1953, but these agencies shrank thereafter while others continued to grow. I then estimate a series of difference-in-difference models to test the effect of post-Stalinist transitions on agency size. I find that agencies under post-Stalinist regimes had significantly smaller coercive agencies after Stalin’s death. This confirms the theoretical logic laid out in Chapter 2 in a broader setting than the comparative historical analyses of Poland and East Germany in Chapters 4 and 5. Although the number of cases and coverage of data here are limited, my results suggest that the logic of elite cohesion and coercive capacity laid out in Chapter 2 is applicable to a wide range of authoritarian regimes.
This chapter follows the definition of ‘empirical legal studies’ as research which applies quantitative methods to questions about the relationship between law and society, in particular with the aim of drawing conclusions about causal connections between variables. Comparative law does not typically phrase its research as being interested in questions of causal inference. Yet, implicitly, it is very much interested in such topics as it explores, for example, the determinants of legal differences between countries or when it evaluates how far it may be said that one of the legal solutions is preferable. It is thus valuable that significant progress has been made in empirical approaches to comparative law that may be able to show robust causal links about the relationship between law and society. This chapter outlines the main types of such studies: experiments, cross-sectional studies, panel data analysis and quasi-experiments. However, it also shows that such studies face a number of methodological problems. This chapter concludes that often it may be most promising to combine different methods in order to reach a valid empirical result.