To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Multivariate linear regression is a method for modeling linear relations between two random vectors, say X and Y. Common reasons for using multivariate regression include (1) to predicting Y given X, (2) to testing hypotheses about the relation between X and Y, and (3) to projecting Y onto prescribed time series or spatial patterns. Special cases of multivariate regression models include Linear Inverse Models (LIMs) and Vector Autoregressive Models. Multivariate regression also is fundamental to other statistical techniques, including canonical correlation analysis, discriminant analysis, and predictable component analysis. This chapter introduces multivariate linear regression and discusses estimation, measures of association, hypothesis testing, and model selection. In climate studies, model selection often involves selecting Y as well as X. For instance, Y may be a set of principal components that need to be chosen, which is not a standard selection problem. This chapter introduces a criterion for selecting X and Y simultaneously called Mutual Information Criterion (MIC).
The method of least squares will fit any model to a data set, but is the resulting model "good"?One criterion is that the model should fit the data significantly better than a simpler model with fewer predictors. After all, if the fit is not significantly better, then the model with fewer predictors is almost as good. For linear models, this approach is equivalent to testing if selected regression parameters vanish. This chapter discusses procedures for testing such hypotheses. In interpreting such hypotheses, it is important to recognize that a regression parameter for a given predictor quantifies the expected rate of change of the predict and while holding the other predictors constant. Equivalently, the regression parameter quantifies the dependence between two variables after controlling or regressing out other predictors. These concepts are important for identifying a confounding variable, which is a third variable that influences two variables to produce a correlation between those two variables. This chapter also discusses how detection and attribution of climate change can be framed in a regression model framework.
Atmospheric simulation data present richer information in terms of spatiotemporal resolution, spatial dimension, and the number of physical quantities compared to observational data; however, such simulations do not perfectly correspond to the real atmospheric conditions. Additionally, extensive simulation data aids machine learning-based image classification in atmospheric science. In this study, we applied a machine learning model for tropical cyclone detection, which was trained using both simulation and satellite observation data. Consequently, the classification performance was significantly lower than that obtained with the application of simulation data. Owing to the large gap between the simulation and observation data, the classification model could not be practically trained only on the simulation data. Thus, the representation capability of the simulation data must be analyzed and integrated into the observation data for application in real problems.
Large data sets are difficult to grasp. To make progress, we often seek a few quantities that capture as much of the information in the data as possible. In this chapter, we discuss a procedure called Principal Component Analysis (PCA), also called Empirical Orthogonal Function (EOF) analysis, which finds the components that minimizes the sum square difference between the components and the data. The components are ordered such that the first approximates the data the best (in a least squares sense), the second approximates the data the best among all components orthogonal to the first, and so on. In typical climate applications, a principal component consists of two parts: (1) a fixed spatial structure, called an Empirical Orthogonal Function (EOF), and (2) its time-dependent amplitude, called a PC time series. The EOFs are orthogonal and the PC time series are uncorrelated. Principal components often are used as input to other analyses, such as linear regression, canonical correlation analysis, predictable components analysis, or discriminant analysis. The procedure for performing area-weighted PCA is discussed in detail in this chapter.
This chapter introduces the power spectrum. The power spectrum is the Fourier Transform of the autocovariance function, and the autocovariance function is the (inverse) Fourier Transform of the power spectrum. As such, the power spectrum and autocovariance function offer two complementary but mathematically equivalent descriptions of a stochastic process. The power spectrum quantifies how variance is distributed over frequencies and is useful for identifying periodic behavior in time series. The discrete Fourier transform of a time series can be summarized in a periodogram, which provides a starting point for estimating power spectra. Estimation of the power spectrum can be counterintuitive because the uncertainty in periodogram elements does not decrease with increasing sample size. To reduce uncertainty, periodogram estimates are averaged over a frequency interval called the bandwidth. Trends and discontinuities in time series can lead to similar low-frequency structure despite very different temporal characteristics. Spectral analysis provides a particularly insightful way to understand the behavior of linear filters.
In low-income countries like the Democratic Republic of the Congo (DRC)—where data is scarce and national statistics offices often under-resourced—aggregated and anonymised mobile operators’ data can provide vital insights for decision-makers to promptly respond to both prevailing and new pandemics, such as COVID-19. Yet, while research on possible applications of mobile big data (MBD) analytics for COVID-19 is growing, there is still little evidence on how such use cases are actually being adopted by governmental authorities and how MBD insights can effectively be turned into informed public health actions in times of crises. This four-part commentary paper aims to bridge such literature gaps, by sharing lessons learnt from the DRC, whereby Congolese public health authorities, through a steep learning curve, have initiated a public–private sector dialogue with local mobile network operators (MNOs) and their ecosystem partners to leverage population mobility insights for COVID-19 policy-making. After having set the scene on the policy relevance of MBD analytics in the context of the DRC in the first section, the paper will then detail four key enablers that contributed, since March 2020, to accelerate Congolese authorities’ uptake of MBD, thus effectively increasing preparedness for future pandemics. Thirdly, we showcase concreate use-cases where “readiness-to-use” has actually translated into actual “usage” and “adoption” for decision-making, while introducing other use cases currently under development. Finally, we explore challenges when harnessing telco big data for decision-making with the ultimate aim to share lessons to replicate the successes and steer the development of MBD for social good in other low-income countries.
In this paper, we study the credit default swap (CDS) pricing with counterparty risk in a reduced form model. The default jump intensities of the reference firm and counterparty are both assumed to follow the mean-reverting CIR processes with independent jumps respectively and a common jump. The approximate closed-form solutions of the joint survival probability density and the probability density of the first default can be obtained by using the PDE method. Then with the expressions of the probability densities, we can get the formula for the CDS price with counterparty risk in a reduced form model with a common jump. In the numerical analysis part, we find that the default of the reference asset has a greater impact on the CDS price than that of the default of counterparty after introducing the common jump process.
In April 2020, Belgium experienced high numbers of fatal COVID-19 cases among nursing home (NH) residents. In response, a mass testing campaign was organised testing all NH residents and staff. We analysed the data of Flemish NHs to identify institutional factors associated with increased SARS-CoV-2 infection rates among NH residents. Cross-sectional study was conducted between 8 April and 15 May 2020. Data collected included demographics, group category (i.e. staff or resident), symptom status and test result. We retrieved additional data: number of beds and staff, type of beds (level of dependency of residents) and ownership (public, private for profit/non-profit institutions). Risk factor analysis was performed using negative binomial regression. In total, 695 NHs were included, 282 (41%) had at least one resident tested positive. Higher infection rate among residents was associated with a higher fraction of RVT beds, generally occupied by more dependent residents (incidence rate ratio (IRR) 1.97; 95% CI 1.00–3.86) and higher staff infection rate (IRR 1.89; 95% CI 1.68–2.12). No relationship was found between other investigated NH characteristics and infection rate among residents. Staff-resident interactions are key in SARS-CoV-2 transmission dynamics. Vaccination, regular staff testing, assessment of infection prevention and control strategies in all NHs are needed to face future SARS-CoV-2 epidemics in these settings.
Mycobacterium tuberculosis is the cause of tuberculosis (TB), a granulomatous illness that mostly affects the lungs. Pakistan is one of the eight nations that accounts for two-thirds of all new cases of developing TB. TB has long been an endemic disease in Pakistan. According to the World Health Organization (WHO) estimates, the nation has over 500 000 incident TB infections per year, with a rising number of drug-resistant cases. Recently, the coexistence of COVID-19 and TB in Pakistan has provided doctors with a problem. Fever or chills, cough, shortness of breath or difficulty breathing are all signs of COVID-19. After SARS-CoV-2 infection, cough might persist for weeks or months and it is frequently accompanied by persistent tiredness, cognitive impairment, dyspnoea or pain – a group of long-term consequences known as post-COVID syndrome or protracted COVID. Coughing with mucus or blood, and coughing that continues over 2 months are indications of TB. The same clinical presentation features make it difficult for healthcare personnel to effectively evaluate the illness and prevent the spread of these fatal diseases. Pakistan lacks the necessary healthcare resources to tackle two contagious diseases at the same time. To counteract the sudden increase in TB cases, appropriate management and effective policies must be implemented. Thus, in order to prevent the spread of these infectious diseases, it is critical to recognise and address the problems that the healthcare sector faces, as well as to create an atmosphere in which the healthcare sector can function at its full potential.
As a Bayesian approach to fitting motorway traffic flow models remains rare in the literature, we empirically explore the sampling challenges this approach offers which have to do with the strong correlations and multimodality of the posterior distribution. In particular, we provide a unified statistical model to estimate using motorway data both boundary conditions and fundamental diagram parameters in a motorway traffic flow model due to Lighthill, Whitham, and Richards known as LWR. This allows us to provide a traffic flow density estimation method that is shown to be superior to two methods found in the traffic flow literature. To sample from this challenging posterior distribution, we use a state-of-the-art gradient-free function space sampler augmented with parallel tempering.
Latent position network models are a versatile tool in network science; applications include clustering entities, controlling for causal confounders, and defining priors over unobserved graphs. Estimating each node’s latent position is typically framed as a Bayesian inference problem, with Metropolis within Gibbs being the most popular tool for approximating the posterior distribution. However, it is well-known that Metropolis within Gibbs is inefficient for large networks; the acceptance ratios are expensive to compute, and the resultant posterior draws are highly correlated. In this article, we propose an alternative Markov chain Monte Carlo strategy—defined using a combination of split Hamiltonian Monte Carlo and Firefly Monte Carlo—that leverages the posterior distribution’s functional form for more efficient posterior computation. We demonstrate that these strategies outperform Metropolis within Gibbs and other algorithms on synthetic networks, as well as on real information-sharing networks of teachers and staff in a school district.
SARS-CoV-2 serological tests are used to assess the infection seroprevalence within a population. This study aims at assessing potential biases in estimating infection prevalence amongst healthcare workers (HCWs) when different diagnostic criteria are considered. A multi-site cross-sectional study was carried out in April–September 2020 amongst 1.367 Italian HCWs. SARS-CoV-2 prevalence was assessed using three diagnostic criteria: RT-PCR on nasopharyngeal swab, point-of-care fingerprick serological test (POCT) result and COVID-19 clinical pathognomonic presentation. A logistic regression model was used to estimate the probability of POCT-positive result in relation to the time since infection (RT-PCR positivity). Among 1.367 HCWs, 69.2% were working in COVID-19 units. Statistically significant differences in age, role and gender were observed between COVID-19/non-COVID-19 units. Prevalence of SARS-CoV-2 infection varied according to the criterion considered: 6.7% for POCT, 8.1% for RT-PCR, 10.0% for either POCT or RT-PCR, 9.6% for infection pathognomonic clinical presentation and 17.6% when at least one of the previous criteria was present. The probability of POCT-positive result decreased by 1.1% every 10 days from the infection. This study highlights potential biases in estimating SARS-CoV-2 point-prevalence data according to the criteria used. Although informative on infection susceptibility and herd immunity level, POCT serological tests are not the best predictors of previous COVID-19 infections for public health monitoring programmes.
Listeriosis is a rare but serious foodborne disease caused by Listeria monocytogenes. This matched case–control study (1:1 ratio) aimed to identify the risk factors associated with food consumption and food-handling habits for the occurrence of sporadic listeriosis in Beijing, China. Cases were defined as patients from whom Listeria was isolated, in addition to the presence of symptoms, including fever, bacteraemia, sepsis and other clinical manifestations corresponding to listeriosis, which were reported via the Beijing Foodborne Disease Surveillance System. Basic patient information and possible risk factors associated with food consumption and food-handling habits were collected through face-to-face interviews. One hundred and six cases were enrolled from 1 January 2018 to 31 December 2020, including 52 perinatal cases and 54 non-perinatal cases. In the non-perinatal group, the consumption of Chinese cold dishes increased the risk of infection by 3.43-fold (95% confidence interval 1.27–9.25, χ2 = 5.92, P = 0.02). In the perinatal group, the risk of infection reduced by 95.2% when raw and cooked foods were well-separated (χ2 = 5.11, P = 0.02). These findings provide important scientific evidence for preventing infection by L. monocytogenes and improving the dissemination of advice regarding food safety for vulnerable populations.
Estimating tail risk measures for portfolios of complex variable annuities is an important enterprise risk management task which usually requires nested simulation. In the nested simulation, the outer simulation stage involves projecting scenarios of key risk factors under the real-world measure, while the inner simulations are used to value pay-offs under guarantees of varying complexity, under a risk-neutral measure. In this paper, we propose and analyse an efficient simulation approach that dynamically allocates the inner simulations to the specific outer scenarios that are most likely to generate larger losses. These scenarios are identified using a proxy calculation that is used only to rank the outer scenarios, not to estimate the tail risk measure directly. As the proxy ranking will not generally provide a perfect match to the true ranking of outer scenarios, we calculate a measure based on the concomitant of order statistics to test whether further tail scenarios are required to ensure, with given confidence, that the true tail scenarios are captured. This procedure, which we call the dynamic importance allocated nested simulation approach, automatically adjusts for the relationship between the proxy calculations and the true valuations and also signals when the proxy is not sufficiently accurate.
The COVID-19 pandemic requires that actuaries track short-term mortality fluctuations in the portfolios they manage. This demands methods that not only operate over much shorter time periods than a year but that also deal with reporting delays. In this paper, we consider a semi-parametric approach for tracking portfolio mortality levels in continuous time. We identify both seasonal patterns and mortality shocks, thus providing a comparison benchmark for the impact of COVID-19 in terms of a portfolio’s own past experience. A parametric model is presented to allow for the average impact of seasonal variation and also reporting delays. We find that an estimate of mortality reporting delays can be made from a single extract of experience data. This can be used to forecast unreported deaths and improve estimates of recent mortality levels. Results are given for annuity portfolios in France, the UK and the USA.
This paper concentrates on the fundamental concepts of entropy, information and divergence to the case where the distribution function and the respective survival function play the central role in their definition. The main aim is to provide an overview of these three categories of measures of information and their cumulative and survival counterparts. It also aims to introduce and discuss Csiszár's type cumulative and survival divergences and the analogous Fisher's type information on the basis of cumulative and survival functions.
Nosocomial transmission of COVID-19 among immunocompromised hosts can have a serious impact on COVID-19 severity, underlying disease progression and SARS-CoV-2 transmission to other patients and healthcare workers within hospitals. We experienced a nosocomial outbreak of COVID-19 in the setting of a daycare unit for paediatric and young adult cancer patients. Between 9 and 18 November 2020, 473 individuals (181 patients, 247 caregivers/siblings and 45 staff members) were exposed to the index case, who was a nursing staff. Among them, three patients and four caregivers were infected. Two 5-year-old cancer patients with COVID-19 were not severely ill, but a 25-year-old cancer patient showed prolonged shedding of SARS-CoV-2 RNA for at least 12 weeks, which probably infected his mother at home approximately 7–8 weeks after the initial diagnosis. Except for this case, no secondary transmission was observed from the confirmed cases in either the hospital or the community. To conclude, in the day care setting of immunocompromised children and young adults, the rate of in-hospital transmission of SARS-CoV-2 was 1.6% when applying the stringent policy of infection prevention and control, including universal mask application and rapid and extensive contact investigation. Severely immunocompromised children/young adults with COVID-19 would have to be carefully managed after the mandatory isolation period while keeping the possibility of prolonged shedding of live virus in mind.
This paper obtains an optimal strategy in a finite horizon time for a portfolio of a defined contribution (DC) pension fund for an investor with the CRRA utility function. It employs the optimal stochastic control method in a financial market with two different asset markets, one risk-free and another one risky asset in which its jump follows either by a finite or infinite activity Lévy process. Sensitivity of jump parameters in an uncertainty financial market has been studied.