To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We prove a new lower bound for the almost 20-year-old problem of determining the smallest possible size of an essential cover of the $n$-dimensional hypercube $\{\pm 1\}^n$, that is, the smallest possible size of a collection of hyperplanes that forms a minimal cover of $\{\pm 1\}^n$ and such that, furthermore, every variable appears with a non-zero coefficient in at least one of the hyperplane equations. We show that such an essential cover must consist of at least $10^{-2}\cdot n^{2/3}/(\log n)^{2/3}$ hyperplanes, improving previous lower bounds of Linial–Radhakrishnan, of Yehuda–Yehudayoff, and of Araujo–Balogh–Mattos.
We provide explicit small-time formulae for the at-the-money implied volatility, skew, and curvature in a large class of models, including rough volatility models and their multi-factor versions. Our general setup encompasses both European options on a stock and VIX options, thereby providing new insights on their joint calibration. The tools used are essentially based on Malliavin calculus for Gaussian processes. We develop a detailed theoretical and numerical analysis of the two-factor rough Bergomi model and provide insights on the interplay between the different parameters for joint SPX–VIX smile calibration.
Introduction to Probability and Statistics for Data Science provides a solid course in the fundamental concepts, methods and theory of statistics for students in statistics, data science, biostatistics, engineering, and physical science programs. It teaches students to understand, use, and build on modern statistical techniques for complex problems. The authors develop the methods from both an intuitive and mathematical angle, illustrating with simple examples how and why the methods work. More complicated examples, many of which incorporate data and code in R, show how the method is used in practice. Through this guidance, students get the big picture about how statistics works and can be applied. This text covers more modern topics such as regression trees, large scale hypothesis testing, bootstrapping, MCMC, time series, and fewer theoretical topics like the Cramer-Rao lower bound and the Rao-Blackwell theorem. It features more than 250 high-quality figures, 180 of which involve actual data. Data and R are code available on our website so that students can reproduce the examples and do hands-on exercises.
The performance and confidence in fault detection and diagnostic systems can be undermined by data pipelines that feature multiple compounding sources of uncertainty. These issues further inhibit the deployment of data-based analytics in industry, where variable data quality and lack of confidence in model outputs are already barriers to their adoption. The methodology proposed in this paper supports trustworthy data pipeline design and leverages knowledge gained from one fully-observed data pipeline to a similar, under-observed case. The transfer of uncertainties provides insight into uncertainty drivers without repeating the computational or cost overhead of fully redesigning the pipeline. A SHAP-based human-readable explainable AI (XAI) framework was used to rank and explain the impact of each choice in a data pipeline, allowing the decoupling of positive and negative performance drivers to facilitate the successful selection of highly-performing pipelines. This empirical approach is demonstrated in bearing fault classification case studies using well-understood open-source data.
Our study aim was to identify high-risk areas of neonatal mortality associated with bacterial sepsis in the state of São Paulo, Southeast Brazil. We used a population-based study applying retrospective spatial scan statistics with data extracted from birth certificates linked to death certificates. All live births from mothers residing in São Paulo State from 2004 to 2020 were included. Spatial analysis using the Poisson model was adopted to scan high-rate clusters of neonatal mortality associated with bacterial sepsis (WHO-ICD10 A32.7, A40, A41, P36, P37.2 in any line of the death certificate). We found a prevalence of neonatal death associated with bacterial sepsis of 2.3/1000 live births. Clusters of high neonatal mortality associated with bacterial sepsis were identified mainly in the southeast region of the state, with four of them appearing as cluster areas for all birth weight categories (<1500 g, 1500 to <2500 g and ≥ 2500 g). The spatial analysis according to the birth weight showed some overlapping in the detected clusters, suggesting shared risk factors that need to be explored. Our study highlights the ongoing challenge of neonatal sepsis in the most developed state of a middle-income country and the importance of employing statistical techniques, including spatial methods, for enhancing surveillance and intervention strategies.
In our digitalized modern society where cyber-physical systems and internet-of-things (IoT) devices are increasingly commonplace, it is paramount that we are able to assure the cybersecurity of the systems that we rely on. As a fundamental policy, we join the advocates of multilayered cybersecurity measures, where resilience is built into IoT systems by relying on multiple defensive techniques. While existing legislation such as the General Data Protection Regulation (GDPR) also takes this stance, the technical implementation of these measures is left open. This invites research into the landscape of multilayered defensive measures, and within this problem space, we focus on two defensive measures: obfuscation and diversification. In this study, through a literature review, we situate these measures within the broader IoT cybersecurity landscape and show how they operate with other security measures built on the network and within IoT devices themselves. Our findings highlight that obfuscation and diversification show promise in contributing to a cost-effective robust cybersecurity ecosystem in today’s diverse cyber threat landscape.
Gas furnaces are the prevalent heating systems in Europe, but efforts to decarbonize the energy sector advocate for their replacement with heat pumps. However, this transition poses challenges for power grids due to increased electricity consumption. Estimating this consumption relies on the seasonal performance factor (SPF) of heat pumps, a metric that is complex to model and hard to measure accurately. We propose using an unpaired dataset of smart meter data at the building level to model the heat consumption and the SPF. We compare the distributions of the annual gas and heat pump electricity consumption by applying either the Jensen–Shannon Divergence or the Kolmogorov–Smirnov test. Through evaluation of a real-world dataset, we prove the ability of the methodology to predict the electricity consumption of future heat pumps replacing existing gas furnaces with a focus on single- and two-family buildings. Our results indicate anticipated SPFs ranging between 2.8 and 3.4, based on the Kolmogorov–Smirnov test. However, it is essential to note that the analysis reveals challenges associated with interpreting results when there are single-sided shifts in the input data, such as those induced by external factors like the European gas crisis in 2022. In summary, this extended version of a conference paper shows the viability of utilizing smart meter data to model heat consumption and seasonal performance factor for future retrofitted heat pumps.
In this paper we consider positional games where the winning sets are edge sets of tree-universal graphs. Specifically, we show that in the unbiased Maker-Breaker game on the edges of the complete graph $K_n$, Maker has a strategy to claim a graph which contains copies of all spanning trees with maximum degree at most $cn/\log (n)$, for a suitable constant $c$ and $n$ being large enough. We also prove an analogous result for Waiter-Client games. Both of our results show that the building player can play at least as good as suggested by the random graph intuition. Moreover, they improve on a special case of earlier results by Johannsen, Krivelevich, and Samotij as well as Han and Yang for Maker-Breaker games.
The rat lungworm Angiostrongylus cantonensis is a zoonotic metastrongyloid nematode currently considered an emerging pathogen. Originating in Southeast Asia, this nematode has spread to tropical and subtropical parts of the world via its invasive rodent and gastropod hosts.
On the island of Tenerife in the Canary archipelago, the A. cantonensis invasion was recognized more than a decade ago. The endemic lizard Gallotia galloti has been identified as a paratenic host of this nematode in the Canary Island ecosystem. Because this lizard species is the most abundant reptile in Tenerife, we tested its suitability as a possible sentinel for A. cantonensis presence. Lizards were captured alive in nine localities, spanning an environmental gradient across the island. Tail muscle tissue was obtained by provoked caudal autotomy and tested for the nematode infection by a species-specific qPCR. Infection intensities were assessed by detecting A. cantonensis DNA quantities based on a calibrated standard curve. Of the 129 samples tested, 31 were positive. The prevalence varied among localities, with the highest (63.6%) recorded in a humid laurel forest. Even though the prevalence in Valle San Lorenzo was the lowest, this is the first record of A. cantonensis from the arid south of Tenerife. Variation in prevalence at different localities was significantly and positively correlated with increasing vegetation cover and negatively correlated with seasonal variability of precipitation, as determined by Spearman correlation coefficients. Fisher’s exact test was used to determine the variation in the prevalence of A. cantonensis among adult males, females, and juveniles and showed no significant difference. Also, there was no significant difference in infection intensity between males and females (as determined by GEE-g). We demonstrated that provoking caudal autotomy can be an effective non-lethal method of A. cantonensis mapping in island ecosystems with abundant lizard species, particularly those with a sharp climatic and vegetation gradient, from xeric to humid conditions.
Transfer learning has been highlighted as a promising framework to increase the accuracy of the data-driven model in the case of data sparsity, specifically by leveraging pretrained knowledge to the training of the target model. The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order modeling (ROM) that represents the homogeneous ignition of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to regress the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases in the target task, the ROM fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the ROM with a sparse dataset is remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, Parameter control via Partial Initialization and Regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted in terms of the initialization and regularization schemes of the ANN model in the target task.
This study aimed to develop a predictive tool for identifying individuals with high antibody titers crucial for recruiting COVID-19 convalescent plasma (CCP) donors and to assess the quality and storage changes of CCP. A convenience sample of 110 plasma donors was recruited, of which 75 met the study criteria. Using univariate logistic regression and random forest, 6 significant factors were identified, leading to the development of a nomogram. Receiver operating characteristic curves, calibration plots, and decision curve analysis (DCA) evaluated the nomogram’s discrimination, calibration, and clinical utility. The nomogram indicated that females aged 18 to 26, blood type O, receiving 1 to 2 COVID-19 vaccine doses, experiencing 2 symptoms during infection, and donating plasma 41 to 150 days after symptom onset had higher likelihoods of high antibody titres. Nomogram’s AUC was 0.853 with good calibration. DCA showed clinical benefit within 9% ~ 90% thresholds. CCP quality was qualified, with stable antibody titres over 6 months (P > 0.05). These findings highlight developing predictive tools to identify suitable CCP donors and emphasize the stability of CCP quality over time, suggesting its potential for long-term storage.
Despite global efforts to end tuberculosis (TB), the goal of preventing catastrophic health expenditure (CHE) due to TB remains unmet. This cross-sectional study was conducted in Guizhou Province, Southwest China. Data were collected from the Hospital Information System and a survey of TB patients who had completed standardized antituberculosis treatment between January and March 2021. Among the 2 283 participants, the average total expenditure and out-of-pocket expenditure were $1 506.6 (median = $760.5) and $683.6 (median = $437.8), respectively. Health insurance reimbursement reduced CHE by 16.8%, with a contribution rate of 24.9%, and the concentration index changed from -0.070 prereimbursement to -0.099 postreimbursement. However, the contribution of health insurance varied significantly across different economic strata, with contribution rates of 6.4% for the lowest economic group and 53.1% for the highest group. For patients from lower socioeconomic strata, health insurance contributed 10.7% to CHE in the prediagnostic phase and 23.5% during treatment. While social health insurance alleviated the financial burden for TB patients, it did not provide sufficient protection for those in lower economic strata or during the prediagnostic stage. This study underscores the need for more effective and equitable subsidy policies for TB patients .
We show that the twin-width of every $n$-vertex $d$-regular graph is at most $n^{\frac{d-2}{2d-2}+o(1)}$ for any fixed integer $d \geq 2$ and that almost all $d$-regular graphs attain this bound. More generally, we obtain bounds on the twin-width of sparse Erdős–Renyi and regular random graphs, complementing the bounds in the denser regime due to Ahn, Chakraborti, Hendrey, Kim, and Oum.
High prevalence of long COVID symptoms has emerged as a significant public health concern. This study investigated the associations between three doses of COVID-19 vaccines and the presence of any and ≥3 types of long COVID symptoms among people with a history of SARS-CoV-2 infection in Hong Kong, China. This is a secondary analysis of a cross-sectional online survey among Hong Kong adult residents conducted between June and August 2022. This analysis was based on a sub-sample of 1,542 participants with confirmed SARS-CoV-2 infection during the fifth wave of COVID-19 outbreak in Hong Kong (December 2021 to April 2022). Among the participants, 40.9% and 16.1% self-reported having any and ≥3 types of long COVID symptoms, respectively. After adjusting for significant variables related to sociodemographic characteristics, health conditions and lifestyles, and SARS-CoV-2 infection, receiving at least three doses of COVID-19 vaccines was associated with lower odds of reporting any long COVID symptoms comparing to receiving two doses (adjusted odds ratio [AOR]: 0.69, 95% CI: 0.54, 0.87, P = .002). Three doses of inactivated and mRNA vaccines had similar protective effects against long COVID symptoms. It is important to strengthen the coverage of COVID-19 vaccination booster doses, even in the post-pandemic era.
Let $T$ be a tree on $t$ vertices. We prove that for every positive integer $k$ and every graph $G$, either $G$ contains $k$ pairwise vertex-disjoint subgraphs each having a $T$ minor, or there exists a set $X$ of at most $t(k-1)$ vertices of $G$ such that $G-X$ has no $T$ minor. The bound on the size of $X$ is best possible and improves on an earlier $f(t)k$ bound proved by Fiorini, Joret, and Wood (2013) with some fast-growing function $f(t)$. Moreover, our proof is short and simple.
Cervical cancer, closely linked to human papillomavirus (HPV) infection, is a major global health concern. Our study aims to fill the gap in understanding HPV vaccine awareness and acceptance in the Middle East, where national immunization programs are often lacking and cultural perceptions hinder acceptance. This systematic review and meta-analysis adhered to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A comprehensive literature search across several databases was conducted on 5 September 2023. We included quantitative studies on HPV vaccine awareness and acceptance in Middle Eastern countries. Data extraction and quality assessment were conducted independently by multiple reviewers to ensure accuracy. Statistical analyses, including subgroup analyses, were performed using R to calculate pooled estimates, assess heterogeneity, and publication bias. We reviewed 159 articles from 15 Middle Eastern countries, focusing on 93,730 participants, predominantly female and healthcare workers. HPV vaccine awareness was found to be 41.7% (95% CI 37.4%–46.1%), with higher awareness among healthcare workers. The pooled acceptance rate was 45.6% (95% CI 41.3%–50.1%), with similar rates between healthcare and non-healthcare workers. Our study highlights the critical need for increased HPV vaccine awareness and acceptance in the Middle East, emphasizing the importance of integrating the vaccine into national immunization programs and addressing cultural and religious factors to improve public health outcomes.
The expanding application of advanced analytics in insurance has generated numerous opportunities, such as more accurate predictive modeling powered by machine learning and artificial intelligence (AI) methods, the utilization of novel and unstructured datasets, and the automation of key operations. Significant advances in these areas are being made through novel applications and adaptations of predictive modeling techniques for insurance purposes, while, concurrently, rapid advances in machine learning methods are being made outside of the insurance sector. However, these innovations also bring substantial challenges, particularly around the transparency, explanation, and fairness of complex algorithmic models and the economic and societal impacts of their adoption in decision-making. As insurance is a highly regulated industry, models may be required by regulators to be explainable, in order to enable analysis of the basis for decision making. Due to the societal importance of insurance, significant attention is being paid to ensuring that insurance models do not discriminate unfairly. In this special issue, we feature papers that explore key issues in insurance analytics, focusing on prediction, explainability, and fairness.