To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A valid model is one whether the inferences drawn from it are true. Many factors can threaten the validity of a model including imprecise or inaccurate measurements, bias in study design or in sampling, and misspecification of the model itself.
A key way to validate a model is to replicate the findings with new data. The best method of replication is collecting new data. However, when that is not possible, it is possible to perform a replicate by dividing the sample using a split-group, jackknife, or bootstrap method. Of these 3 methods, split-group is the strongest but requires a dataset large enough to split your sample. A bootstrap is the weakest method of replication, but produces more valid confidence intervals than a simple model.
Multivariable techniques produce two major kinds of information: Information about how well the model (all the independent variables together) fit the data and information about the relationship of each of the independent variables to the outcome (with adjustment for all other independent variables in the analysis). Common measures of the strength of the relationship between an independent variable and the outcome are odds ratio, relative hazard, and relative risk. Adjusting for multiple comparisons is challenging; most important, is to decide ahead of time whether there will be adjustments of multiple comparisons. A common convention is to not adjust the primary outcome, but to adjust secondary outcomes for multiple comparisons.
The strength of multivariable analysis is its ability to determine how multiple independent variables, which are related to one another, are related to an outcome. However, if two variables are so closely correlated that if you know the value for one variable you know the value of the other, multivariable analysis cannot separately assess the impact of these two variables on outcome. This problem is called multicollinearity.
To assess whether there is multicollinearity, investigators should first run a correlation matrix. However, the matrix only tells you the relationship between any two independent variables. Harder to detect is whether a combination of variables accounts for another variable’s value. Two related measure of muticollinearity are tolerance and the reciprocal of tolerance: the variation inflation factor. If you have variables that are highly related, you can omit one or more of the variables, use an “and/or” clause or create a scale.
Sensitivity analysis tests how robust the results are to changes in the underlying assumptions of your analysis. In other words, if you made plausible changes in your assumptions, would you still draw the same conclusions? The changes could be a more restrictive or inclusive sample, a different way to measure your variables, a different way for handling missing data, or a change of a different feature of your analysis. With sensitivity analysis you cannot lose. If you vary the assumptions of your analysis and you get the same result, you will have more confidence in the conclusions of your study. Conversely, if plausible changes in your assumptions lead to a different conclusion, you will have learned something important. A common assumption tested in sensitivity analysis is that there are no unmeasured confounders, which can be tested with E values or falsification analysis. Other common assumptions tested are that losses to follow-up are random, that the sample is unbiased, that there is the correct exposure period and follow-up period, that there is a biased predictor or outcome, or that the model is misspecified.
Multivariable analysis is used for four major types of studies: observational studies of etiology, randomized and nonrandomized intervention studies, studies of diagnosis, and studies of prognosis.
For observational studies, whether etiologic or intervention, the most important reason to do multivariable analysis is to eliminate confounding, since in observational studies the groups are not randomly assigned. With randomized studies, multiple analysis is used to adjust for baseline differences that occurred by chance, to identify other independent predictors of outcome besides the randomized group, and x.
With studies of diagnosis, multivariable analysis is used to identify the best combination of diagnostic information to determine whether a person has a particular disease. Multivariable analysis can also be used to predict the prognosis of a group of patients with a particular set of known prognostic factors.
This nationwide retrospective study in Japan aimed to identify risk factors and diagnostic indicators for congenital syphilis (CS) and improve diagnostic accuracy. Data were collected from 230 pregnant women diagnosed with syphilis and their infants between 2015 and 2024. Of these, 49 infants were diagnosed with definite or highly probable CS, while 73 infants with excluded CS served as the control group. Multivariable logistic regression analysis revealed two significant risk factors for CS: maternal treatment not completed more than 4 weeks before delivery (odds ratio [OR]: 7.20; 95% confidence interval [CI]: 1.38–37.56; p = 0.02) and elevated total IgM levels in the infant (>20 mg/dL) (OR: 65.31; 95% CI: 4.53–941.39; p = 0.002). When using infant rapid plasma reagin (RPR) ≥1 as a diagnostic indicator, sensitivity was 93.8% (n = 48). In contrast, the infant-to-mother RPR ratio ≥1 showed a lower sensitivity of 34.3%, with fewer cases available for analysis (n = 35) due to limited maternal data. These findings indicate that delayed maternal treatment and high total IgM levels in the infant are significant risk factors, while the infant’s RPR titre serves as a useful diagnostic indicator for CS.
Australian public sector agencies want to improve access to public sector data to help conduct better informed policy analysis and research and have passed legislation to improve access to this data. Much of this public sector data also contains personal information or health information and is therefore governed by state and federal privacy law which places conditions on the use of personal and health information. This paper therefore analyses how these data sharing laws compare with one another, as well as whether they substantially change the grounds on which public sector data can be shared. It finds that data sharing legislation, by itself, does not substantially change the norms embedded in privacy and health information management law governing the sharing of personal and health information. However, this paper notes that there can still be breaches of social licence even where data sharing occurs lawfully. Further, this paper notes that there are several inconsistencies between data sharing legislation across Australia. This paper therefore proposes reform, policy, and technical strategies to resolve the impact of these inconsistencies.
We consider spline-based additive models for estimation of conditional treatment effects. To handle the uncertainty due to variable selection, we propose a method of model averaging with weights obtained by minimizing a J-fold cross-validation criterion, in which a nearest neighbor matching is used to approximate the unobserved potential outcomes. We show that the proposed method is asymptotically optimal in the sense of achieving the lowest possible squared loss in some settings and assigning all weight to the correctly specified models if such models exist in the candidate set. Moreover, consistency properties of the optimal weights and model averaging estimators are established. A simulation study and an empirical example demonstrate the superiority of the proposed estimator over other methods.
European Union (EU) public opinion research is a rich field of study. However, as citizens often have little knowledge of the EU it remains the question to what extent their attitudes are grounded in coherent, ideologically informed belief systems. As survey research is not well equipped to study this question, this paper explores the value of the method of cognitive mapping (CM) for public opinion research by studying the cognitive maps of 504 Dutch citizens regarding the Eurozone crisis. The paper shows that respondents perceive the Eurozone crisis predominantly as a governmental debt crisis. Moreover, the concept bureaucracy unexpectedly plays a key role in their belief systems exerting an ambiguous but overall negative effect on the Eurozone and trust in the EU. In contrast to expectation, the attitudes of the respondents are more solidly grounded in (ordoliberal) ideology than that of the Dutch elite. Finally, the paper introduces new ways to measure ambivalence prompting a reevaluation of the significance of different forms of ambivalence and their impact on political behavior. Overall, the results of this study suggest that CM forms a promising addition to the toolbox of public opinion research.
Since 2017, Digital Twins (DTs) have gained prominence in academic research, with researchers actively conceptualising, prototyping, and implementing DT applications across disciplines. The transformative potential of DTs has also attracted significant private sector investment, leading to substantial advancements in their development. However, their adoption in politics and public administration remains limited. While governments fund extensive DT research, their application in governance is often seen as a long-term prospect rather than an immediate priority, hindering their integration into decision-making and policy implementation. This study bridges the gap between theoretical discussions and practical adoption of DTs in governance. Using the Technology Readiness Level (TRL) and Technology Acceptance Model (TAM) frameworks, we analyse key barriers to adoption, including technological immaturity, limited institutional readiness, and scepticism regarding practical utility. Our research combines a systematic literature review of DT use cases with a case study of Germany, a country characterised by its federal governance structure, strict data privacy regulations, and strong digital innovation agenda. Our findings show that while DTs are widely conceptualised and prototyped in research, their use in governance remains scarce, particularly within federal ministries. Institutional inertia, data privacy concerns, and fragmented governance structures further constrain adoption. We conclude by emphasising the need for targeted pilot projects, clearer governance frameworks, and improved knowledge transfer to integrate DTs into policy planning, crisis management, and data-driven decision-making.
The limited stop-loss transform, along with the stop-loss and limited loss transforms – which are special or limiting cases of the limited stop-loss transform – is one of the most important transforms used in insurance, and it also appears extensively in many other fields including finance, economics, and operations research. When the distribution of the underlying loss is uncertain, the worst-case risk measure for the limited stop-loss transform plays a key role in many quantitative risk management problems in insurance and finance. In this paper, we derive expressions for the worst-case distortion risk measure of the limited stop-loss transform, as well as for the stop-loss and limited loss transforms, when the distribution of the underlying loss is uncertain and lies in a general $k$-order Wasserstein ball that contains a reference distribution. We also identify the worst-case distributions under which the worst-case distortion risk measures are attained. Additionally, our results also recover the findings of Guan et al. ((2023) North American Actuarial Journal, 28(3), 611–625), regarding the worst-case stop-loss premium over a $k$-order Wasserstein ball. Furthermore, we use numerical examples to illustrate the worst-case distributions and the worst-case risk measures derived in this paper. We also examine the effects of the reference distribution, the radius of the Wasserstein ball, and the retention levels of limited stop-loss reinsurance on the premium for this type of reinsurance.
In recent years, a wide range of mortality models has been proposed to address the diverse factors influencing mortality rates, which has highlighted the need to perform model selection. Traditional mortality model selection methods, such as AIC and BIC, often require fitting multiple models independently and ranking them based on these criteria. This process can fail to account for uncertainties in model selection, which can lead to overly optimistic prediction intervals, and it disregards the potential insights from combining models. To address these limitations, we propose a novel Bayesian model selection framework that integrates model selection and parameter estimation into the same process. This requires creating a model-building framework that will give rise to different models by choosing different parametric forms for each term. Inference is performed using the reversible jump Markov chain Monte Carlo algorithm, which is devised to allow for transition between models of different dimensions, as is the case for the models considered here. We develop modeling frameworks for data stratified by age and period and for data stratified by age, period, and product. Our results are presented in two case studies.
The escalating complexity of global migration patterns renders evident the limitation of traditional reactive governance approaches and the urgent need for anticipatory and forward-thinking strategies. This Special Collection, “Anticipatory Methods in Migration Policy: Forecasting, Foresight, and Other Forward-Looking Methods in Migration Policymaking,” groups scholarly works and practitioners’ contributions dedicated to the state-of-the-art of anticipatory approaches. It showcases significant methodological evolutions, highlighting innovations from advanced quantitative forecasting using Machine Learning to predict displacement, irregular border crossings, and asylum trends, to rich, in-depth insights generated through qualitative foresight, participatory scenario building, and hybrid methodologies that integrate diverse knowledge forms. The contributions collectively emphasize the power of methodological pluralism, address a spectrum of migration drivers, including conflict and climate change, and critically examine the opportunities, ethical imperatives, and governance challenges associated with novel data sources, such as mobile phone data. By focusing on translating predictive insights and foresight into actionable policies and humanitarian action, this collection aims to advance both academic discourse and provide tangible guidance for policymakers and practitioners. It underscores the importance of navigating inherent uncertainties and strengthening ethical frameworks to ensure that innovations in anticipatory migration policy enhance preparedness, resource allocation, and uphold human dignity in an era of increasing global migration.
Time series of counts often display complex dynamic and distributional characteristics. For this reason, we develop a flexible framework combining the integer-valued autoregressive (INAR) model with a latent Markov structure, leading to the hidden Markov model-INAR (HMM-INAR). First, we illustrate conditions for the existence of an ergodic and stationary solution and derive closed-form expressions for the autocorrelation function and its components. Second, we show consistency and asymptotic normality of the conditional maximum likelihood estimator. Third, we derive an efficient expectation–maximization algorithm with steps available in closed form which allows for fast computation of the estimator. Fourth, we provide an empirical illustration and estimate the HMM-INAR on the number of trades of the Standard & Poor’s Depositary Receipts S&P 500 Exchange-Traded Fund Trust. The combination of the latent HMM structure with a simple INAR$(1)$ formulation not only provides better fit compared to alternative specifications for count data, but it also preserves the economic interpretation of the results.
We prove that determining the weak saturation number of a host graph $F$ with respect to a pattern graph $H$ is computationally hard, even when $H$ is the triangle. Our main tool establishes a connection between weak saturation and the shellability of simplicial complexes.
For a multidimensional Itô semimartingale, we consider the problem of estimating integrated volatility functionals. Jacod and Rosenbaum (2013, The Annals of Statistics 41(3), 1462–1484) studied a plug-in type of estimator based on a Riemann sum approximation of the integrated functional and a spot volatility estimator with a forward uniform kernel. Motivated by recent results that show that spot volatility estimators with general two-sided kernels of unbounded support are more accurate, in this article, an estimator using a general kernel spot volatility estimator as the plug-in is considered. A biased central limit theorem for estimating the integrated functional is established with an optimal convergence rate. Central limit theorems for properly de-biased estimators are also obtained both at the optimal convergence regime for the bandwidth and when applying undersmoothing. Our results show that one can significantly reduce the estimator’s bias by adopting a general kernel instead of the standard uniform kernel. Our proposed bias-corrected estimators are found to maintain remarkable robustness against bandwidth selection in a variety of sampling frequencies and functions.
In this article, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedures such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ the Huber loss and a truncation scheme to handle heavy-tailed observations, while $\ell _{1}$-regularization is adopted to overcome the curse of dimensionality. To account for the time-varying coefficient, we estimate local coefficients which are biased due to the $\ell _{1}$-regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large numbers property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this robust thresholding debiased LASSO (RED-LASSO) estimator. We show that the RED-LASSO estimator can achieve a near-optimal convergence rate. In the empirical study, we apply the RED-LASSO procedure to the high-dimensional integrated coefficient estimation using high-frequency trading data.