Search results for Statistics and Probability

On the optimality of linear residual risk sharing
Jiajie Yang, Wei Wei
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 55 / Issue 3 / September 2025

Published online by Cambridge University Press:

10 January 2025, pp. 514-536

Print publication:

September 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In this paper, we explore the optimal risk sharing problem in the context of peer-to-peer insurance. Using the criterion of minimizing total variance, we find that the optimal risk sharing strategy should take a linear form. Although linear risk sharing strategies have been examined in the literature, our study uncovers a significant finding: to minimize total variance, the linear strategy should be applied to the residual risks rather than the original risks, as commonly adopted in existing studies. By comparing with the existing models, we demonstrate the advantage of the linear residual risk sharing model in variance reduction and robustness. Furthermore, we develop and study a number of new models by incorporating some constraints, to reflect desirable properties required by the market. With those constraints, the optimal strategies turn out to favor market development, such as incentivize participation and guarantee fairness. A relevant model is considered at last, which establishes the connection among multiple optimization problems and provides insights on how to extend the models into a more general setup.

Mixed-frequency VAR: a new approach to forecasting migration in Europe using macroeconomic data
Part of
- Anticipating Migration for Policymaking
Emily R. Barker, Jakub Bijak
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

10 January 2025, e3
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Forecasting international migration is a challenge that, despite its political and policy salience, has seen a limited success so far. In this proof-of-concept paper, we employ a range of macroeconomic data to represent different drivers of migration. We also take into account the relatively consistent set of migration policies within the European Common Market, with its constituent freedom of movement of labour. Using panel vector autoregressive (VAR) models for mixed-frequency data, we forecast migration in the short- and long-term horizons for 26 of the 32 countries within the Common Market. We demonstrate how the methodology can be used to assess the possible responses of other macroeconomic variables to unforeseen migration events—and vice versa. Our results indicate reasonable in-sample performance of migration forecasts, especially in the short term, although with varying levels of accuracy. They also underline the need for taking country-specific factors into account when constructing forecasting models, with different variables being important across the regions of Europe. For the longer term, the proposed methods, despite high prediction errors, can still be useful as tools for setting coherent migration scenarios and analysing responses to exogenous shocks.

Information Theory

From Coding to Learning
Yury Polyanskiy, Yihong Wu
Published online:

09 January 2025

Print publication:

02 January 2025
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.

Options pricing with Markov regime switching Heston volatility Hull–White interest rates and stochastic intensity
Priya Mittal, Dharmaraja Selvamuthu
Journal:

Probability in the Engineering and Informational Sciences / Volume 39 / Issue 2 / April 2025

Published online by Cambridge University Press:

09 January 2025, pp. 243-259
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper proposes an options pricing model that incorporates stochastic volatility, stochastic interest rates, and stochastic jump intensity. Market shocks are modeled using a jump process, with each jump governed by an asymmetric double-exponential distribution. The model also integrates a Markov regime-switching framework for volatility and the risk-free rate, allowing the market to alternate between a finite number of distinct economic states. A closed-form solution for European option pricing is derived. To demonstrate the significance of the proposed model, a comparison with various other models is performed, and the sensitivity of the various model parameters is illustrated.

On stochastic control under poissonian intervention: optimality of a barrier strategy in a general Lévy model
Part of
Kei Noba, Kazutoshi Yamazaki
Journal:

Journal of Applied Probability / Volume 62 / Issue 2 / June 2025

Published online by Cambridge University Press:

09 January 2025, pp. 628-651

Print publication:

June 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We study a version of the stochastic control problem of minimizing the sum of running and controlling costs, where control opportunities are restricted to independent Poisson arrival times. Under a general setting driven by a general Lévy process, we show the optimality of a periodic barrier strategy, which moves the process upward to the barrier whenever it is observed to be below it. The convergence of the optimal solutions to those in the continuous-observation case is also shown.

Agent-based modeling for data-driven enforcement: combining empirical data with behavioral theory for scenario-based analysis of inspections
Eunice Koid, Haiko van der Voort, Martijn Warnier
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

09 January 2025, e6
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Effective enforcement of laws and regulations hinges heavily on robust inspection policies. While data-driven approaches to testing the effectiveness of these policies are gaining popularity, they suffer significant drawbacks, particularly a lack of explainability and generalizability. This paper proposes an approach to crafting inspection policies that combines data-driven insights with behavioral theories to create an agent-based simulation model that we call a theory-infused phenomenological agent-based model (TIP-ABM). Moreover, this approach outlines a systematic process for combining theories and data to construct a phenomenological ABM, beginning with defining macro-level empirical phenomena. Illustrated through a case study of the Dutch inland shipping sector, the proposed methodology enhances explainability by illuminating inspectors’ tacit knowledge while iterating between statistical data and underlying theories. The broader generalizability of the proposed approach beyond the inland shipping context requires further research.

A model of network redistributive pressure
Salvatore Di Falco, Francesco Feri, Paolo Pin
Journal:

Network Science / Volume 13 / 2025

Published online by Cambridge University Press:

09 January 2025, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In this paper, we propose a network model to explain the implications of the pressure to share resources. Individuals use the network to establish social interactions that allow them to increase their income. They also use the network as a safety and to ask for assistance in case of need. The network is therefore a system characterized by social pressure to share and redistribute surplus of resources among members. The main result is that the potential redistributive pressure from other network members causes individuals to behave inefficiently. The number of social interactions used to employ workers displays a non-monotonic pattern with respect to the number of neighbors (degree): it increases for intermediate degree and decreases for high degree. Respect to a benchmark case without social pressure, individuals with few (many) network members interact more (less). Finally, we show that these predictions are consistent with the results obtained in a set of field experiments run in rural Tanzania.

Data-centric strategies for deep-learning accelerated salt interpretation
Apurva Gala, Pandu Devarakota
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

09 January 2025, e6
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Deep learning (DL) has become the most effective machine learning solution for addressing and accelerating complex problems in various fields, from computer vision and natural language processing to many more. Training well-generalized DL models requires large amounts of data which allows the model to learn the complexity of the task it is being trained to perform. Consequently, performance optimization of the deep-learning models is concentrated on complex architectures with a large number of tunable model parameters, in other words, model-centric techniques. To enable training such large models, significant effort has also gone into high-performance computing and big-data handling. However, adapting DL to tackle specialized domain-related data and problems in real-world settings presents unique challenges that model-centric techniques do not suffice to optimize. In this paper, we tackle the problem of developing DL models for seismic imaging using complex seismic data. We specifically address developing and deploying DL models for salt interpretation using seismic images. Most importantly, we discuss how looking beyond model-centric and leveraging data-centric strategies for optimization of DL model performance was crucial to significantly improve salt interpretation. This technique was also key in developing production quality, robust and generalized models.

Reducing the CO2 footprint at an LNG asset with replicate trains using operational data-driven analysis. A case study on end flash vessels
Rakesh Paleja, Ekhorutomwen Osemwinyen, Matthew Jones, John Ayoola, Raghuraman Pitchumani, Philip Jonathan
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

08 January 2025, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A liquefied natural gas (LNG) facility often incorporates replicate liquefaction trains. The performance of equivalent units across trains, designed using common numerical models, might be expected to be similar. In this article, we discuss statistical analysis of real plant data to validate this assumption. Analysis of operational data for end flash vessels from a pair of replicate trains at an LNG facility indicates that one train produces 2.8%–6.4% more end flash gas than the other. We then develop statistical models for train operation, facilitating reduced flaring and hence a reduction of up to 45% in CO2 equivalent flaring emissions, noting that flaring emissions for a typical LNG facility account for ~4%–8% of the overall facility emissions. We recommend that operational data-driven models be considered generally to improve the performance of LNG facilities and reduce their CO2 footprint, particularly when replica units are present.

An environment-adaptive SAC-based HVAC control of single-zone residential and office buildings
Part of
- AI for Energy Management in Buildings and Cities
Xinlin Wang, Nariman Mahdavi, Subbu Sethuvenkatraman, Sam West
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

08 January 2025, e3
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This study introduces an advanced reinforcement learning (RL)-based control strategy for heating, ventilation, and air conditioning (HVAC) systems, employing a soft actor-critic agent with a customized reward mechanism. This strategy integrates time-varying outdoor temperature-dependent weighting factors to dynamically balance thermal comfort and energy efficiency. Our methodology has undergone rigorous evaluation across two distinct test cases within the building optimization testing (BOPTEST) framework, an open-source virtual simulator equipped with standardized key performance indicators (KPIs) for performance assessment. Each test case is strategically selected to represent distinct building typologies, climatic conditions, and HVAC system complexities, ensuring a thorough evaluation of our method across diverse settings. The first test case is a heating-focused scenario in a residential setting. Here, we directly compare our method against four advanced control strategies: an optimized rule-based controller inherently provided by BOPTEST, two sophisticated RL-based strategies leveraging BOPTEST’s KPIs as reward references, and a model predictive control (MPC)-based approach specifically tailored for the test case. Our results indicate that our approach outperforms the rule-based and other RL-based strategies and achieves outcomes comparable to the MPC-based controller. The second scenario, a cooling-dominated environment in an office setting, further validates the versatility of our strategy under varying conditions. The consistent performance of our strategy across both scenarios underscores its potential as a robust tool for smart building management, adaptable to both residential and office environments under different climatic challenges.

SFG and TG seropositivity in Humans suspected of TBD in Yucatan, Mexico
Karla Rossanet Dzul Rosado, Carlos Aaron Peña Bates, Martin Raúl Tello, Henry R. Noh-Pech, Fernando I. Puerto, Oghenekaro Omodior
Journal:

Epidemiology & Infection / Volume 153 / 2025

Published online by Cambridge University Press:

08 January 2025, e21
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Since 1996, the incidence of rickettsiosis has been increasing in Yucatán, Mexico, but recent prevalence data are lacking. This study aimed to determine exposure to the Spotted Fever Group (SFG) and Typhus Group (TG) in human serum samples suspected of tick-borne diseases (TBD) between 2015 and 2022. A total of 620 samples were analysed using indirect immunofluorescence assay (IFA) to detect IgG antibodies against SFG (Rickettsia rickettsii) and TG (Rickettsia typhi), considering a titer of ≥64 as positive. Results showed that 103 samples (17%) were positive for R. rickettsii and 145 (24%) for R. typhi, while 256 (41%) and 229 (37%) were negative, respectively. There was a cross-reaction in 244 samples (39%). Individuals with contact with vectors, such as ticks, showed significant exposure to fleas (p = 0.0010). The study suggests a high prevalence of rickettsiosis and recommends prospective studies to assess the disease burden and strengthen surveillance and prevention in Yucatán, considering factors like temperature and ecological changes.

Discretization-independent surrogate modeling of physical fields around variable geometries using coordinate-based networks
James Duvall, Karthik Duraisamy
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

08 January 2025, e5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Numerical solutions of partial differential equations require expensive simulations, limiting their application in design optimization, model-based control, and large-scale inverse problems. Surrogate modeling techniques aim to decrease computational expense while retaining dominant solution features and characteristics. Existing frameworks based on convolutional neural networks and snapshot-matrix decomposition often rely on lossy pixelization and data-preprocessing, limiting their effectiveness in realistic engineering scenarios. Recently, coordinate-based multilayer perceptron networks have been found to be effective at representing 3D objects and scenes by regressing volumetric implicit fields. These concepts are leveraged and adapted in the context of physical-field surrogate modeling. Two methods toward generalization are proposed and compared: design-variable multilayer perceptron (DV-MLP) and design-variable hypernetworks (DVH). Each method utilizes a main network which consumes pointwise spatial information to provide a continuous representation of the solution field, allowing discretization independence and a decoupling of solution and model size. DV-MLP achieves generalization through the use of a design-variable embedding vector, while DVH conditions the main network weights on the design variables using a hypernetwork. The methods are applied to predict steady-state solutions around complex, parametrically defined geometries on non-parametrically-defined meshes, with model predictions obtained in less than a second. The incorporation of random Fourier features greatly enhanced prediction and generalization accuracy for both approaches. DVH models have more trainable weights than a similar DV-MLP model, but an efficient batch-by-case training method allows DVH to be trained in a similar amount of time as DV-MLP. A vehicle aerodynamics test problem is chosen to assess the method’s feasibility. Both methods exhibit promising potential as viable options for surrogate modeling, being able to process snapshots of data that correspond to different mesh topologies.

Quantifying and hedging economic risk in disability income insurance portfolios
Annika Schneider, Gaurav Khemka, David Pitt, Jinhui Zhang
Journal:

Annals of Actuarial Science / Volume 19 / Issue 2 / July 2025

Published online by Cambridge University Press:

08 January 2025, pp. 285-303
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The association between economic variables and the frequency and duration of disability income insurance (DII) claims is well established. Across many jurisdictions, heightened levels of unemployment have been associated with both a higher incidence and a longer duration of DII claims. This motivated us to derive an asset portfolio for which the total asset value moves in line with the level of unemployment, thus, providing a natural match for the DII portfolio liabilities. To achieve this, we develop an economic tracking portfolio where the asset weights in the portfolio are chosen so that the portfolio value changes in a way that reflects, as closely as possible, the level of unemployment. To the best of our knowledge, this is the first paper applying economic tracking portfolios to hedge economic risk in DII. The methodology put forward to establish this asset-liability matching portfolio is illustrated using DII data from the UK between 2004 and 2016. The benefits of our approach for claims reserving in DII portfolios are illustrated using a simulation study.

Exploring the potentialities and challenges of deep learning for simulation and prediction of urban sprawl features
Ange Gabriel Belinga, Stéphane Cedric Tékouabou Koumetio, Mohamed El Haziti
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

08 January 2025, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Rapid urbanization poses several challenges, especially when faced with an uncontrolled urban development plan. Therefore, it often leads to anarchic occupation and expansion of cities, resulting in the phenomenon of urban sprawl (US). To support sustainable decision–making in urban planning and policy development, a more effective approach to addressing this issue through US simulation and prediction is essential. Despite the work published in the literature on the use of deep learning (DL) methods to simulate US indicators, almost no work has been published to assess what has already been done, the potential, the issues, and the challenges ahead. By synthesising existing research, we aim to assess the current landscape of the use of DL in modelling US. This article elucidates the complexities of US, focusing on its multifaceted challenges and implications. Through an examination of DL methodologies, we aim to highlight their effectiveness in capturing the complex spatial patterns and relationships associated with US. This work begins by demystifying US, highlighting its multifaceted challenges. In addition, the article examines the synergy between DL and conventional methods, highlighting the advantages and disadvantages. It emerges that the use of DL in the simulation and forecasting of US indicators is increasing, and its potential is very promising for guiding strategic decisions to control and mitigate this phenomenon. Of course, this is not without major challenges, both in terms of data and models and in terms of strategic city planning policies.

Enhancing dynamical system modeling through interpretable machine-learning augmentations: a case study in cathodic electrophoretic deposition
Christian Jacobsen, Jiayuan Dong, Mehdi Khalloufi, Xun Huan, Karthik Duraisamy, Maryam Akram, Wanjiao Liu
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

08 January 2025, e4
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We introduce a comprehensive data-driven framework aimed at enhancing the modeling of physical systems, employing inference techniques and machine-learning enhancements. As a demonstrative application, we pursue the modeling of cathodic electrophoretic deposition, commonly known as e-coating. Our approach illustrates a systematic procedure for enhancing physical models by identifying their limitations through inference on experimental data and introducing adaptable model enhancements to address these shortcomings. We begin by tackling the issue of model parameter identifiability, which reveals aspects of the model that require improvement. To address generalizability, we introduce modifications, which also enhance identifiability. However, these modifications do not fully capture essential experimental behaviors. To overcome this limitation, we incorporate interpretable yet flexible augmentations into the baseline model. These augmentations are parameterized by simple fully-connected neural networks, and we leverage machine-learning tools, particularly neural ordinary differential equations, to learn these augmentations. Our simulations demonstrate that the machine-learning-augmented model more accurately captures observed behaviors and improves predictive accuracy. Nevertheless, we contend that while the model updates offer superior performance and capture the relevant physics, we can reduce off-line computational costs by eliminating certain dynamics without compromising accuracy or interpretability in downstream predictions of quantities of interest, particularly film thickness predictions. The entire process outlined here provides a structured approach to leverage data-driven methods by helping us comprehend the root causes of model inaccuracies and by offering a principled method for enhancing model performance.

Mobile phone data for anticipating displacements: practices, opportunities, and challenges
Part of
- Data for Policy Proceedings 2024
- Anticipating Migration for Policymaking
Bilgeçağ Aydoğdu, Özge Bilgili, Subhi Güneş, Albert Ali Salah
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

08 January 2025, e5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The global number of individuals experiencing forced displacement has reached its highest level in the past decade. In this context, the provision of services for those in need requires timely and evidence-based approaches. How can mobile phone data (MPD) based analyses address the knowledge gap on mobility patterns and needs assessments in forced displacement settings? To answer this question, in this paper, we examine the capacity of MPD to function as a tool for anticipatory analysis, particularly in response to natural disasters and conflicts that lead to internal or cross-border displacement. The paper begins with a detailed review of the processes involved in acquiring, processing, and analyzing MPD in forced displacement settings. Following this, we critically assess the challenges associated with employing MPD in policy-making, with a specific focus on issues of user privacy and data ethics. The paper concludes by evaluating the potential benefits of MPD analysis for targeted and effective policy interventions and discusses future research avenues, drawing on recent studies and ongoing collaborations with mobile network operators.

COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic
Yo Yehudi, Lukas Hughes-Noehrer, Carole Goble, Caroline Jay
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

08 January 2025, e4
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In 2020, the COVID-19 pandemic resulted in a rapid response from governments and researchers worldwide, but information-sharing mechanisms were variable, and many early efforts were insufficient for the purpose. We interviewed fifteen data professionals located around the world, working with COVID-19-relevant data types in semi-structured interviews. Interviews covered both challenges and positive experiences with data in multiple domains and formats, including medical records, social deprivation, hospital bed capacity, and mobility data. We analyze this qualitative corpus of experiences for content and themes and identify four sequential barriers a researcher may encounter. These are: (1) Knowing data exists, (2) being able to access that data, (3) data quality, and (4) ability to share data onwards. A fifth barrier, (5) human throughput capacity, is present throughout all four stages. Examples of these barriers range from challenges faced by single individuals to non-existent records of historic mingling/social distance laws, and up to systemic geopolitical data suppression. Finally, we recommend that governments and local authorities explicitly create machine-readable temporal “law as code” for changes in laws such as mobility/mingling laws and changes in geographical regions.

Data-driven multifidelity surrogate models for rocket engines injector design
Jose Felix Zapata Usandivaras, Michael Bauerheim, Bénédicte Cuenot, Annafederica Urbano
Journal:

Data-Centric Engineering / Volume 6 / 2025

Published online by Cambridge University Press:

08 January 2025, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Surrogate models of turbulent diffusive flames could play a strategic role in the design of liquid rocket engine combustion chambers. The present article introduces a method to obtain data-driven surrogate models for coaxial injectors, by leveraging an inductive transfer learning strategy over a U-Net with available multifidelity Large Eddy Simulations (LES) data. The resulting models preserve reasonable accuracy while reducing the offline computational cost of data-generation. First, a database of about 100 low-fidelity LES simulations of shear-coaxial injectors, operating with gaseous oxygen and gaseous methane as propellants, has been created. The design of experiments explores three variables: the chamber radius, the recess-length of the oxidizer post, and the mixture ratio. Subsequently, U-Nets were trained upon this dataset to provide reasonable approximations of the temporal-averaged two-dimensional flow field. Despite the fact that neural networks are efficient non-linear data emulators, in purely data-driven approaches their quality is directly impacted by the precision of the data they are trained upon. Thus, a high-fidelity (HF) dataset has been created, made of about 10 simulations, to a much greater cost per sample. The amalgamation of low and HF data during the the transfer-learning process enables the improvement of the surrogate model’s fidelity without excessive additional cost.

AI product cards: a framework for code-bound formal documentation cards in the public administration
Part of
- Data for Policy Proceedings 2024
Albana Celepija, Alessio Palmero Aprosio, Bruno Lepri, Raman Kazhamiakin
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

08 January 2025, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Currently, artificial intelligence (AI) is integrated across various segments of the public sector, in a scattered and fragmented manner, aiming to enhance the quality of people’s lives. While AI adoption has proven to have a great impact, there are several aspects that hamper its utilization in public administration. Therefore, a large set of initiatives is designed to play a pivotal role in promoting the adoption of reliable AI, including documentation as a key driver. The AI community has been proactively recommending a variety of initiatives aimed at promoting the adoption of documentation practices. While currently proposed AI documentation artifacts play a crucial role in increasing the transparency and accountability of various facts about AI systems, we propose a code-bound declarative documentation framework that aims to support the responsible deployment of AI-based solutions. Our proposed framework aims to address the need to shift the focus from data and models being considered in isolation to the reuse of AI solutions as a whole. By introducing a formalized approach to describing adaptation and optimization techniques, we aim to enhance existing documentation alternatives. Furthermore, its utilization in the public administration aims to foster the rapid adoption of AI-based applications due to the open access to common use cases in the public sector. We further showcase our proposal with a public sector-specific use case, such as a legal text classification task, and demonstrate how the AI Product Card enables its reuse through the interactions of the formal documentation specifications with the modular code references.

FROM MODEL SELECTION TO MODEL AVERAGING: A COMPARISON FOR NESTED LINEAR MODELS
Wenchao Xu, Xinyu Zhang
Journal:

Econometric Theory , First View

Published online by Cambridge University Press:

07 January 2025, pp. 1-33
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Model selection (MS) and model averaging (MA) are two popular approaches when many candidate models exist. Theoretically, the estimation risk of an oracle MA is not larger than that of an oracle MS because the former is more flexible, but a foundational issue is this: Does MA offer a substantial improvement over MS? Recently, seminal work by Peng and Yang (2022) has answered this question under nested models with linear orthonormal series expansion. In the current paper, we further respond to this question under linear nested regression models. A more general nested framework, heteroscedastic and autocorrelated random errors, and sparse coefficients are allowed in the current paper, giving a scenario that is more common in practice. A remarkable implication is that MS can be significantly improved by MA under certain conditions. In addition, we further compare MA techniques with different weight sets. Simulation studies illustrate the theoretical findings in a variety of settings.

Statistics and Probability

Refine search

Refine search

Actions for selected content:

52379 results in Statistics and Probability

On the optimality of linear residual risk sharing

Mixed-frequency VAR: a new approach to forecasting migration in Europe using macroeconomic data

Information Theory

Options pricing with Markov regime switching Heston volatility Hull–White interest rates and stochastic intensity

On stochastic control under poissonian intervention: optimality of a barrier strategy in a general Lévy model

Agent-based modeling for data-driven enforcement: combining empirical data with behavioral theory for scenario-based analysis of inspections

A model of network redistributive pressure

Data-centric strategies for deep-learning accelerated salt interpretation

Reducing the CO2 footprint at an LNG asset with replicate trains using operational data-driven analysis. A case study on end flash vessels

An environment-adaptive SAC-based HVAC control of single-zone residential and office buildings

SFG and TG seropositivity in Humans suspected of TBD in Yucatan, Mexico

Discretization-independent surrogate modeling of physical fields around variable geometries using coordinate-based networks

Quantifying and hedging economic risk in disability income insurance portfolios

Exploring the potentialities and challenges of deep learning for simulation and prediction of urban sprawl features

Enhancing dynamical system modeling through interpretable machine-learning augmentations: a case study in cathodic electrophoretic deposition

Mobile phone data for anticipating displacements: practices, opportunities, and challenges

COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic

Data-driven multifidelity surrogate models for rocket engines injector design

AI product cards: a framework for code-bound formal documentation cards in the public administration

FROM MODEL SELECTION TO MODEL AVERAGING: A COMPARISON FOR NESTED LINEAR MODELS

Statistics and Probability

Refine search

Refine search

Actions for selected content:

Save Search

52379 results in Statistics and Probability

Information Theory