Search results for Statistics and Probability

Frontmatter
Nils B. Weidmann, Universität Konstanz, Germany
Book:

Data Management for Social Scientists

Published online:

03 March 2023

Print publication:

09 March 2023, pp i-iv
- Chapter
- - You have access
  - Open access
- PDF
- Export citation

Covid-19 and the effectiveness of ERM frameworks
Journal:

British Actuarial Journal / Volume 28 / 2023

Published online by Cambridge University Press:

08 March 2023, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Sandwiched SDEs with unbounded drift driven by Hölder noises
Part of
Giulia Di Nunno, Yuliya Mishura, Anton Yurchenko-Tytarenko
Journal:

Advances in Applied Probability / Volume 55 / Issue 3 / September 2023

Published online by Cambridge University Press:

08 March 2023, pp. 927-964

Print publication:

September 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We study a stochastic differential equation with an unbounded drift and general Hölder continuous noise of order $\lambda \in (0,1)$. The corresponding equation turns out to have a unique solution that, depending on a particular shape of the drift, either stays above some continuous function or has continuous upper and lower bounds. Under some mild assumptions on the noise, we prove that the solution has moments of all orders. In addition, we provide its connection to the solution of some Skorokhod reflection problem. As an illustration of our results and motivation for applications, we also suggest two stochastic volatility models which we regard as generalizations of the CIR and CEV processes. We complete the study by providing a numerical scheme for the solution.

An ergodic theorem for asymptotically periodic time-inhomogeneous Markov processes, with application to quasi-stationarity with moving boundaries
Part of
- Markov processes
- Limit theorems
William Oçafrain
Journal:

Advances in Applied Probability / Volume 55 / Issue 2 / June 2023

Published online by Cambridge University Press:

08 March 2023, pp. 672-700

Print publication:

June 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper deals with ergodic theorems for particular time-inhomogeneous Markov processes, whose time-inhomogeneity is asymptotically periodic. Under a Lyapunov/minorization condition, it is shown that, for any measurable bounded function f, the time average $\frac{1}{t} \int_0^t f(X_s)ds$ converges in $\mathbb{L}^2$ towards a limiting distribution, starting from any initial distribution for the process $(X_t)_{t \geq 0}$. This convergence can be improved to an almost sure convergence under an additional assumption on the initial measure. This result is then applied to show the existence of a quasi-ergodic distribution for processes absorbed by an asymptotically periodic moving boundary, satisfying a conditional Doeblin condition.

Cumulative and undiagnosed SARS-CoV-2 infection among the staff of a medical research centre in Tokyo after the emergence of variants
Tetsuya Mizoue, Shohei Yamamoto, Yusuke Oshiro, Natsumi Inamura, Takashi Nemoto, Kumi Horii, Kaori Okudera, Maki Konishi, Mitsuru Ozeki, Haruhito Sugiyama, Nobuyoshi Aoyanagi, Wataru Sugiura, Norio Ohmagari
Journal:

Epidemiology & Infection / Volume 151 / 2023

Published online by Cambridge University Press:

08 March 2023, e48
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
To describe the trend of cumulative incidence of coronavirus disease 19 (COVID-19) and undiagnosed cases over the pandemic through the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants among healthcare workers in Tokyo, we analysed data of repeated serological surveys and in-house COVID-19 registry among the staff of National Center for Global Health and Medicine. Participants were asked to donate venous blood and complete a survey questionnaire about COVID-19 diagnosis and vaccine. Positive serology was defined as being positive on Roche or Abbott assay against SARS-CoV-2 nucleocapsid protein, and cumulative infection was defined as either being seropositive or having a history of COVID-19. Cumulative infection has increased from 2.0% in June 2021 (pre-Delta) to 5.3% in December 2021 (post-Delta). After the emergence of the Omicron, it has increased substantially during 2022 (16.9% in June and 39.0% in December). As of December 2022, 30% of those who were infected in the past were not aware of their infection. Results indicate that SARS-CoV-2 infection has rapidly expanded during the Omicron-variant epidemic among healthcare workers in Tokyo and that a sizable number of infections were undiagnosed.

On hybrid tree-based methods for short-term insurance claims
Zhiyu Quan, Zhiguo Wang, Guojun Gan, Emiliano A. Valdez
Journal:

Probability in the Engineering and Informational Sciences / Volume 37 / Issue 2 / April 2023

Published online by Cambridge University Press:

08 March 2023, pp. 597-620
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Two-part framework and the Tweedie generalized linear model (GLM) have traditionally been used to model loss costs for short-term insurance contracts. For most portfolios of insurance claims, there is typically a large proportion of zero claims that leads to imbalances, resulting in lower prediction accuracy of these traditional approaches. In this article, we propose the use of tree-based methods with a hybrid structure that involves a two-step algorithm as an alternative approach. For example, the first step is the construction of a classification tree to build the probability model for claim frequency. The second step is the application of elastic net regression models at each terminal node from the classification tree to build the distribution models for claim severity. This hybrid structure captures the benefits of tuning hyperparameters at each step of the algorithm; this allows for improved prediction accuracy, and tuning can be performed to meet specific business objectives. An obvious major advantage of this hybrid structure is improved model interpretability. We examine and compare the predictive performance of this hybrid structure relative to the traditional Tweedie GLM using both simulated and real datasets. Our empirical results show that these hybrid tree-based methods produce more accurate and informative predictions.

An efficient method for generating a discrete uniform distribution using a biased random source
Part of
- Theory of computing
- Algorithms - Computer Science
Xiaoyu Lei
Journal:

Journal of Applied Probability / Volume 60 / Issue 3 / September 2023

Published online by Cambridge University Press:

07 March 2023, pp. 1069-1078

Print publication:

September 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present an efficient algorithm to generate a discrete uniform distribution on a set of p elements using a biased random source for p prime. The algorithm generalizes Von Neumann’s method and improves the computational efficiency of Dijkstra’s method. In addition, the algorithm is extended to generate a discrete uniform distribution on any finite set based on the prime factorization of integers. The average running time of the proposed algorithm is overall sublinear: $\operatorname{O}\!(n/\log n)$.

A network community detection method with integration of data from multiple layers and node attributes
Hannu Reittu, Lasse Leskelä, Tomi Räty
Journal:

Network Science / Volume 11 / Issue 3 / September 2023

Published online by Cambridge University Press:

07 March 2023, pp. 374-396
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Multilayer networks are in the focus of the current complex network study. In such networks, multiple types of links may exist as well as many attributes for nodes. To fully use multilayer—and other types of complex networks in applications, the merging of various data with topological information renders a powerful analysis. First, we suggest a simple way of representing network data in a data matrix where rows correspond to the nodes and columns correspond to the data items. The number of columns is allowed to be arbitrary, so that the data matrix can be easily expanded by adding columns. The data matrix can be chosen according to targets of the analysis and may vary a lot from case to case. Next, we partition the rows of the data matrix into communities using a method which allows maximal compression of the data matrix. For compressing a data matrix, we suggest to extend so-called regular decomposition method for non-square matrices. We illustrate our method for several types of data matrices, in particular, distance matrices, and matrices obtained by augmenting a distance matrix by a column of node degrees, or by concatenating several distance matrices corresponding to layers of a multilayer network. We illustrate our method with synthetic power-law graphs and two real networks: an Internet autonomous systems graph and a world airline graph. We compare the outputs of different community recovery methods on these graphs and discuss how incorporating node degrees as a separate column to the data matrix leads our method to identify community structures well-aligned with tiered hierarchical structures commonly encountered in complex scale-free networks.

Degree distributions in recursive trees with fitnesses
Part of
Tejas Iyer
Journal:

Advances in Applied Probability / Volume 55 / Issue 2 / June 2023

Published online by Cambridge University Press:

06 March 2023, pp. 407-443

Print publication:

June 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We study a general model of recursive trees where vertices are equipped with independent weights and at each time-step a vertex is sampled with probability proportional to its fitness function, which is a function of its weight and degree, and connects to $\ell$ new-coming vertices. Under a certain technical assumption, applying the theory of Crump–Mode–Jagers branching processes, we derive formulas for the limiting distributions of the proportion of vertices with a given degree and weight, and proportion of edges with endpoint having a certain weight. As an application of this theorem, we rigorously prove observations of Bianconi related to the evolving Cayley tree (Phys. Rev. E 66, paper no. 036116, 2002). We also study the process in depth when the technical condition can fail in the particular case when the fitness function is affine, a model we call ‘generalised preferential attachment with fitness’. We show that this model can exhibit condensation, where a positive proportion of edges accumulates around vertices with maximal weight, or, more drastically, can have a degenerate limiting degree distribution, where the entire proportion of edges accumulates around these vertices. Finally, we prove stochastic convergence for the degree distribution under a different assumption of a strong law of large numbers for the partition function associated with the process.

Epidemiology of invasive meningococcal disease worldwide from 2010–2019: a literature review
Carmen Pardo de Santayana, Myint Tin Tin Htar, Jamie Findlow, Paul Balmer
Journal:

Epidemiology & Infection / Volume 151 / 2023

Published online by Cambridge University Press:

06 March 2023, e57
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The epidemiology of invasive meningococcal disease (IMD) is unpredictable, varies by region and age group and continuously evolves. This review aimed to describe trends in the incidence of IMD and serogroup distribution by age group and global region over time. Data were extracted from 90 subnational, national and multinational grey literature surveillance reports and 22 published articles related to the burden of IMD from 2010 to 2019 in 77 countries. The global incidence of IMD was generally low, with substantial variability between regions in circulating disease-causing serogroups. The highest incidence was usually observed in infants, generally followed by young children and adolescents/young adults, as well as older adults in some countries. Globally, serogroup B was a predominant cause of IMD in most countries. Additionally, there was a notable increase in the number of IMD cases caused by serogroups W and Y from 2010 to 2019 in several regions, highlighting the unpredictable and dynamic nature of the disease. Overall, serogroups A, B, C, W and Y were responsible for the vast majority of IMD cases, despite the availability of vaccines to prevent disease due to these serogroups.

Data Management for Social Scientists

From Files to Databases
Nils B. Weidmann
Published online:

03 March 2023

Print publication:

09 March 2023
- Book
- - You have access
  - Open access
- Export citation
The 'data revolution' offers many new opportunities for research in the social sciences. Increasingly, social and political interactions can be recorded digitally, leading to vast amounts of new data available for research. This poses new challenges for organizing and processing research data. This comprehensive introduction covers the entire range of data management techniques, from flat files to database management systems. It demonstrates how established techniques and technologies from computer science can be applied in social science projects, drawing on a wide range of different applied examples. This book covers simple tools such as spreadsheets and file-based data storage and processing, as well as more powerful data management software like relational databases. It goes on to address advanced topics such as spatial data, text as data, and network data. This book is one of the first to discuss questions of practical data management specifically for social science projects. This title is also available as Open Access on Cambridge Core.

Validating the behavioral Defining Issues Test across different genders, political, and religious affiliations
Hyemin Han
Journal:

Experimental Results / Volume 4 / 2023

Published online by Cambridge University Press:

03 March 2023, e6
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The Defining Issues Test (DIT) has been widely used in psychological experiments to assess one’s developmental level of moral reasoning in terms of postconventional reasoning. However, there have been concerns regarding whether the tool is biased across people with different genders and political and religious views. To address the limitations, in the present study, I tested the validity of the brief version of the test, that is, the behavioral DIT, in terms of the measurement invariance and differential item functioning (DIF). I could not find any significant non-invariance at the test level or any item demonstrating practically significant DIF at the item level. The findings indicate that neither the test nor any of its items showed a significant bias toward any particular group. As a result, the collected validity evidence supports the use of test scores across different groups, enabling researchers who intend to examine participants’ moral reasoning development across heterogeneous groups to draw conclusions based on the scores.

A model for an epidemic with contact tracing and cluster isolation, and a detection paradox
Part of
- Markov processes
Jean Bertoin
Journal:

Journal of Applied Probability / Volume 60 / Issue 3 / September 2023

Published online by Cambridge University Press:

03 March 2023, pp. 1079-1095

Print publication:

September 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We determine the distributions of some random variables related to a simple model of an epidemic with contact tracing and cluster isolation. This enables us to apply general limit theorems for super-critical Crump–Mode–Jagers branching processes. Notably, we compute explicitly the asymptotic proportion of isolated clusters with a given size amongst all isolated clusters, conditionally on survival of the epidemic. Somewhat surprisingly, the latter differs from the distribution of the size of a typical cluster at the time of its detection, and we explain the reasons behind this seeming paradox.

Chinese oats in temperate Bhutan: Results of field experiments
Khengtala Wangchuk, Kesang Wangchuk
Journal:

Experimental Results / Volume 4 / 2023

Published online by Cambridge University Press:

03 March 2023, e9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Seven varieties of forage oats from China were evaluated in the temperate environment of Bhutan for morphological traits, dry matter production, and forage quality. The oat variety Qingyin No. 1 provided a greater plant height (61 cm) and the largest number of tillers per plant (five tillers per plant). The leaf-stem ratio (LSR) was highest for Longyan No. 2 (LSR 0.73). During harvest in late winter, Longyan No. 2 had a greater plant height (64 cm) and the highest number of tillers per plant (seven tillers per plant), followed by Qingyin No. 1. The top three varieties with high LSRs of 1.49, 1.31, and 1.35 were Longyan No. 1, 2, and 3, respectively. In both summer and winter, Longyan No. 2 had the highest forage yields of around 5.00 and 4.00 DM t/ha, respectively. Qingyin No. 1 was the second largest forage producer, with under 5.00 DM t/ha in summer and under 3.00 DM t/ha in winter. For forage quality, Longyan No. 2 and Longyan No. 3 had the highest levels of crude protein (15%) in summer. However, during late winter, the Linna variety had the highest crude protein content (13%). The overall results of the field experiments suggest that Longyan No. 2 and Qingyin No. 1 are promising new oat varieties for winter fodder production in the temperate environments of Bhutan.

Validation of Risk Management Models for Financial Institutions

Theory and Practice
Edited by David Lynch, Iftekhar Hasan, Akhtar Siddique
Published online:

02 March 2023

Print publication:

09 March 2023
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Financial models are an inescapable feature of modern financial markets. Yet it was over reliance on these models and the failure to test them properly that is now widely recognized as one of the main causes of the financial crisis of 2007–2011. Since this crisis, there has been an increase in the amount of scrutiny and testing applied to such models, and validation has become an essential part of model risk management at financial institutions. The book covers all of the major risk areas that a financial institution is exposed to and uses models for, including market risk, interest rate risk, retail credit risk, wholesale credit risk, compliance risk, and investment management. The book discusses current practices and pitfalls that model risk users need to be aware of and identifies areas where validation can be advanced in the future. This provides the first unified framework for validating risk management models.

The Conway–Maxwell–Poisson Distribution

Kimberly F. Sellers
Published online:

02 March 2023

Print publication:

09 March 2023
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
While the Poisson distribution is a classical statistical model for count data, the distributional model hinges on the constraining property that its mean equal its variance. This text instead introduces the Conway-Maxwell-Poisson distribution and motivates its use in developing flexible statistical methods based on its distributional form. This two-parameter model not only contains the Poisson distribution as a special case but, in its ability to account for data over- or under-dispersion, encompasses both the geometric and Bernoulli distributions. The resulting statistical methods serve in a multitude of ways, from an exploratory data analysis tool, to a flexible modeling impetus for varied statistical methods involving count data. The first comprehensive reference on the subject, this text contains numerous illustrative examples demonstrating R code and output. It is essential reading for academics in statistics and data science, as well as quantitative researchers and data analysts in economics, biostatistics and other applied disciplines.

Characteristics, treatment and care of pregnant women living with hepatitis B in England: findings from a national audit
Heather Bailey, Eleni Nastouli, Sharon Webb, Catherine Peckham, Claire Thorne
Journal:

Epidemiology & Infection / Volume 151 / 2023

Published online by Cambridge University Press:

02 March 2023, e50
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Around 0.4% of pregnant women in England have chronic hepatitis B virus (HBV) infection and need services to prevent vertical transmission. In this national audit, sociodemographic, clinical and laboratory information was requested from all maternity units in England for hepatitis B surface antigen-positive women initiating antenatal care in 2014. We describe these women's characteristics and indicators of access to/uptake of healthcare. Of 2542 pregnancies in 2538 women, median maternal age was 31 [IQR 27, 35] years, 94% (1986/2109) were non-UK born (25% (228/923) having arrived into the UK <2 years previously) and 32% (794/2473) had ⩾2 previous live births. In 39%, English levels were basic/less than basic. Antenatal care was initiated at median 11.3 [IQR 9.6, 14] gestation weeks, and ‘late’ (⩾20 weeks) in 10% (251/2491). In 70% (1783/2533) of pregnancies, HBV had been previously diagnosed and 11.8% (288/2450) had ⩾1 marker of higher infectivity. Missed specialist appointments were reported in 18% (426/2339). Late antenatal care and/or missed specialist appointments were more common in pregnancies among women lacking basic English, arriving in the UK ⩽2 years previously, newly HBV diagnosed, aged <25 years and/or with ⩾2 previous live births. We show overlapping groups of pregnant women with chronic HBV vulnerable to delayed or incomplete care.

On the probability of rumour survival among sceptics
Part of
- Markov processes
- Special processes
Neda Esmaeeli, Farkhondeh Alsadat Sajadi
Journal:

Journal of Applied Probability / Volume 60 / Issue 3 / September 2023

Published online by Cambridge University Press:

02 March 2023, pp. 1096-1111

Print publication:

September 2023
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We study a sceptical rumour model on the non-negative integer line. The model starts with two spreaders at sites 0, 1 and sceptical ignorants at all other natural numbers. Then each sceptic transmits the rumour, independently, to the individuals within a random distance on its right after s/he receives the rumour from at least two different sources. We say that the process survives if the size of the set of vertices which heard the rumour in this fashion is infinite. We calculate the probability of survival exactly, and obtain some bounds for the tail distribution of the final range of the rumour among sceptics. We also prove that the rumour dies out among non-sceptics and sceptics, under the same condition.

Is remote measurement a better assessment of internet censorship than expert analysis? Analyzing tradeoffs for international donors and advocacy organizations of current data and methodologies
Terry Fletcher, Andria Hayes-Birchler
Journal:

Data & Policy / Volume 5 / 2023

Published online by Cambridge University Press:

02 March 2023, e9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Donor organizations and multilaterals require ways to measure progress toward the goals of creating an open internet, and condition assistance on recipient governments maintaining access to information online. Because the internet is increasingly becoming a leading tool for exchanging information, authoritarian governments around the world often seek methods to restrict citizens’ access. Two of the most common methods for restricting the internet are shutting down internet access entirely and filtering specific content. We conduct a systematic literature review of articles on the measurement of internet censorship and find that little work has been done comparing the tradeoffs of using different methods to measure censorship on a global scale. We compare the tradeoffs between measuring these phenomena using expert analysis (as measured by Freedom House and V-Dem) and remote measurement with manual oversight (as measured by Access Now and the OpenNet Initiative [ONI]) for donor organizations that want to incentivize and measure good internet governance. We find that remote measurement with manual oversight is less likely to include false positives, and therefore may be more preferable for donor organizations that value verifiability. We also find that expert analysis is less likely to include false negatives, particularly for very repressive regimes in the Middle East and Central Asia and therefore these data may be preferable for advocacy organizations that want to ensure very repressive regimes are not able to avoid accountability, or organizations working primarily in these areas.

An iterative regulatory process for robot governance
Hadassah Drukarch, Carlos Calleja, Eduard Fosch-Villaronga
Journal:

Data & Policy / Volume 5 / 2023

Published online by Cambridge University Press:

01 March 2023, e8
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
There is an increasing gap between the policy cycle’s speed and that of technological and social change. This gap is becoming broader and more prominent in robotics, that is, movable machines that perform tasks either automatically or with a degree of autonomy. This is because current legislation was unprepared for machine learning and autonomous agents. As a result, the law often lags behind and does not adequately frame robot technologies. This state of affairs inevitably increases legal uncertainty. It is unclear what regulatory frameworks developers have to follow to comply, often resulting in technology that does not perform well in the wild, is unsafe, and can exacerbate biases and lead to discrimination. This paper explores these issues and considers the background, key findings, and lessons learned of the LIAISON project, which stands for “Liaising robot development and policymaking,” and aims to ideate an alignment model for robots’ legal appraisal channeling robot policy development from a hybrid top-down/bottom-up perspective to solve this mismatch. As such, LIAISON seeks to uncover to what extent compliance tools could be used as data generators for robot policy purposes to unravel an optimal regulatory framing for existing and emerging robot technologies.

Statistics and Probability

Refine search

Refine search

Actions for selected content:

52379 results in Statistics and Probability

Frontmatter

Covid-19 and the effectiveness of ERM frameworks

Sandwiched SDEs with unbounded drift driven by Hölder noises

An ergodic theorem for asymptotically periodic time-inhomogeneous Markov processes, with application to quasi-stationarity with moving boundaries

Cumulative and undiagnosed SARS-CoV-2 infection among the staff of a medical research centre in Tokyo after the emergence of variants

On hybrid tree-based methods for short-term insurance claims

An efficient method for generating a discrete uniform distribution using a biased random source

A network community detection method with integration of data from multiple layers and node attributes

Degree distributions in recursive trees with fitnesses

Epidemiology of invasive meningococcal disease worldwide from 2010–2019: a literature review

Data Management for Social Scientists

Validating the behavioral Defining Issues Test across different genders, political, and religious affiliations

A model for an epidemic with contact tracing and cluster isolation, and a detection paradox

Chinese oats in temperate Bhutan: Results of field experiments

Validation of Risk Management Models for Financial Institutions

The Conway–Maxwell–Poisson Distribution

Characteristics, treatment and care of pregnant women living with hepatitis B in England: findings from a national audit

On the probability of rumour survival among sceptics

Is remote measurement a better assessment of internet censorship than expert analysis? Analyzing tradeoffs for international donors and advocacy organizations of current data and methodologies

An iterative regulatory process for robot governance

Statistics and Probability

Refine search

Refine search

Actions for selected content:

Save Search

52379 results in Statistics and Probability

Data Management for Social Scientists

Validation of Risk Management Models for Financial Institutions

The Conway–Maxwell–Poisson Distribution