To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We study a general model of recursive trees where vertices are equipped with independent weights and at each time-step a vertex is sampled with probability proportional to its fitness function, which is a function of its weight and degree, and connects to $\ell$ new-coming vertices. Under a certain technical assumption, applying the theory of Crump–Mode–Jagers branching processes, we derive formulas for the limiting distributions of the proportion of vertices with a given degree and weight, and proportion of edges with endpoint having a certain weight. As an application of this theorem, we rigorously prove observations of Bianconi related to the evolving Cayley tree (Phys. Rev. E66, paper no. 036116, 2002). We also study the process in depth when the technical condition can fail in the particular case when the fitness function is affine, a model we call ‘generalised preferential attachment with fitness’. We show that this model can exhibit condensation, where a positive proportion of edges accumulates around vertices with maximal weight, or, more drastically, can have a degenerate limiting degree distribution, where the entire proportion of edges accumulates around these vertices. Finally, we prove stochastic convergence for the degree distribution under a different assumption of a strong law of large numbers for the partition function associated with the process.
The epidemiology of invasive meningococcal disease (IMD) is unpredictable, varies by region and age group and continuously evolves. This review aimed to describe trends in the incidence of IMD and serogroup distribution by age group and global region over time. Data were extracted from 90 subnational, national and multinational grey literature surveillance reports and 22 published articles related to the burden of IMD from 2010 to 2019 in 77 countries. The global incidence of IMD was generally low, with substantial variability between regions in circulating disease-causing serogroups. The highest incidence was usually observed in infants, generally followed by young children and adolescents/young adults, as well as older adults in some countries. Globally, serogroup B was a predominant cause of IMD in most countries. Additionally, there was a notable increase in the number of IMD cases caused by serogroups W and Y from 2010 to 2019 in several regions, highlighting the unpredictable and dynamic nature of the disease. Overall, serogroups A, B, C, W and Y were responsible for the vast majority of IMD cases, despite the availability of vaccines to prevent disease due to these serogroups.
The 'data revolution' offers many new opportunities for research in the social sciences. Increasingly, social and political interactions can be recorded digitally, leading to vast amounts of new data available for research. This poses new challenges for organizing and processing research data. This comprehensive introduction covers the entire range of data management techniques, from flat files to database management systems. It demonstrates how established techniques and technologies from computer science can be applied in social science projects, drawing on a wide range of different applied examples. This book covers simple tools such as spreadsheets and file-based data storage and processing, as well as more powerful data management software like relational databases. It goes on to address advanced topics such as spatial data, text as data, and network data. This book is one of the first to discuss questions of practical data management specifically for social science projects. This title is also available as Open Access on Cambridge Core.
The Defining Issues Test (DIT) has been widely used in psychological experiments to assess one’s developmental level of moral reasoning in terms of postconventional reasoning. However, there have been concerns regarding whether the tool is biased across people with different genders and political and religious views. To address the limitations, in the present study, I tested the validity of the brief version of the test, that is, the behavioral DIT, in terms of the measurement invariance and differential item functioning (DIF). I could not find any significant non-invariance at the test level or any item demonstrating practically significant DIF at the item level. The findings indicate that neither the test nor any of its items showed a significant bias toward any particular group. As a result, the collected validity evidence supports the use of test scores across different groups, enabling researchers who intend to examine participants’ moral reasoning development across heterogeneous groups to draw conclusions based on the scores.
We determine the distributions of some random variables related to a simple model of an epidemic with contact tracing and cluster isolation. This enables us to apply general limit theorems for super-critical Crump–Mode–Jagers branching processes. Notably, we compute explicitly the asymptotic proportion of isolated clusters with a given size amongst all isolated clusters, conditionally on survival of the epidemic. Somewhat surprisingly, the latter differs from the distribution of the size of a typical cluster at the time of its detection, and we explain the reasons behind this seeming paradox.
Seven varieties of forage oats from China were evaluated in the temperate environment of Bhutan for morphological traits, dry matter production, and forage quality. The oat variety Qingyin No. 1 provided a greater plant height (61 cm) and the largest number of tillers per plant (five tillers per plant). The leaf-stem ratio (LSR) was highest for Longyan No. 2 (LSR 0.73). During harvest in late winter, Longyan No. 2 had a greater plant height (64 cm) and the highest number of tillers per plant (seven tillers per plant), followed by Qingyin No. 1. The top three varieties with high LSRs of 1.49, 1.31, and 1.35 were Longyan No. 1, 2, and 3, respectively. In both summer and winter, Longyan No. 2 had the highest forage yields of around 5.00 and 4.00 DM t/ha, respectively. Qingyin No. 1 was the second largest forage producer, with under 5.00 DM t/ha in summer and under 3.00 DM t/ha in winter. For forage quality, Longyan No. 2 and Longyan No. 3 had the highest levels of crude protein (15%) in summer. However, during late winter, the Linna variety had the highest crude protein content (13%). The overall results of the field experiments suggest that Longyan No. 2 and Qingyin No. 1 are promising new oat varieties for winter fodder production in the temperate environments of Bhutan.
Financial models are an inescapable feature of modern financial markets. Yet it was over reliance on these models and the failure to test them properly that is now widely recognized as one of the main causes of the financial crisis of 2007–2011. Since this crisis, there has been an increase in the amount of scrutiny and testing applied to such models, and validation has become an essential part of model risk management at financial institutions. The book covers all of the major risk areas that a financial institution is exposed to and uses models for, including market risk, interest rate risk, retail credit risk, wholesale credit risk, compliance risk, and investment management. The book discusses current practices and pitfalls that model risk users need to be aware of and identifies areas where validation can be advanced in the future. This provides the first unified framework for validating risk management models.
While the Poisson distribution is a classical statistical model for count data, the distributional model hinges on the constraining property that its mean equal its variance. This text instead introduces the Conway-Maxwell-Poisson distribution and motivates its use in developing flexible statistical methods based on its distributional form. This two-parameter model not only contains the Poisson distribution as a special case but, in its ability to account for data over- or under-dispersion, encompasses both the geometric and Bernoulli distributions. The resulting statistical methods serve in a multitude of ways, from an exploratory data analysis tool, to a flexible modeling impetus for varied statistical methods involving count data. The first comprehensive reference on the subject, this text contains numerous illustrative examples demonstrating R code and output. It is essential reading for academics in statistics and data science, as well as quantitative researchers and data analysts in economics, biostatistics and other applied disciplines.
Around 0.4% of pregnant women in England have chronic hepatitis B virus (HBV) infection and need services to prevent vertical transmission. In this national audit, sociodemographic, clinical and laboratory information was requested from all maternity units in England for hepatitis B surface antigen-positive women initiating antenatal care in 2014. We describe these women's characteristics and indicators of access to/uptake of healthcare. Of 2542 pregnancies in 2538 women, median maternal age was 31 [IQR 27, 35] years, 94% (1986/2109) were non-UK born (25% (228/923) having arrived into the UK <2 years previously) and 32% (794/2473) had ⩾2 previous live births. In 39%, English levels were basic/less than basic. Antenatal care was initiated at median 11.3 [IQR 9.6, 14] gestation weeks, and ‘late’ (⩾20 weeks) in 10% (251/2491). In 70% (1783/2533) of pregnancies, HBV had been previously diagnosed and 11.8% (288/2450) had ⩾1 marker of higher infectivity. Missed specialist appointments were reported in 18% (426/2339). Late antenatal care and/or missed specialist appointments were more common in pregnancies among women lacking basic English, arriving in the UK ⩽2 years previously, newly HBV diagnosed, aged <25 years and/or with ⩾2 previous live births. We show overlapping groups of pregnant women with chronic HBV vulnerable to delayed or incomplete care.
We study a sceptical rumour model on the non-negative integer line. The model starts with two spreaders at sites 0, 1 and sceptical ignorants at all other natural numbers. Then each sceptic transmits the rumour, independently, to the individuals within a random distance on its right after s/he receives the rumour from at least two different sources. We say that the process survives if the size of the set of vertices which heard the rumour in this fashion is infinite. We calculate the probability of survival exactly, and obtain some bounds for the tail distribution of the final range of the rumour among sceptics. We also prove that the rumour dies out among non-sceptics and sceptics, under the same condition.
Donor organizations and multilaterals require ways to measure progress toward the goals of creating an open internet, and condition assistance on recipient governments maintaining access to information online. Because the internet is increasingly becoming a leading tool for exchanging information, authoritarian governments around the world often seek methods to restrict citizens’ access. Two of the most common methods for restricting the internet are shutting down internet access entirely and filtering specific content. We conduct a systematic literature review of articles on the measurement of internet censorship and find that little work has been done comparing the tradeoffs of using different methods to measure censorship on a global scale. We compare the tradeoffs between measuring these phenomena using expert analysis (as measured by Freedom House and V-Dem) and remote measurement with manual oversight (as measured by Access Now and the OpenNet Initiative [ONI]) for donor organizations that want to incentivize and measure good internet governance. We find that remote measurement with manual oversight is less likely to include false positives, and therefore may be more preferable for donor organizations that value verifiability. We also find that expert analysis is less likely to include false negatives, particularly for very repressive regimes in the Middle East and Central Asia and therefore these data may be preferable for advocacy organizations that want to ensure very repressive regimes are not able to avoid accountability, or organizations working primarily in these areas.
There is an increasing gap between the policy cycle’s speed and that of technological and social change. This gap is becoming broader and more prominent in robotics, that is, movable machines that perform tasks either automatically or with a degree of autonomy. This is because current legislation was unprepared for machine learning and autonomous agents. As a result, the law often lags behind and does not adequately frame robot technologies. This state of affairs inevitably increases legal uncertainty. It is unclear what regulatory frameworks developers have to follow to comply, often resulting in technology that does not perform well in the wild, is unsafe, and can exacerbate biases and lead to discrimination. This paper explores these issues and considers the background, key findings, and lessons learned of the LIAISON project, which stands for “Liaising robot development and policymaking,” and aims to ideate an alignment model for robots’ legal appraisal channeling robot policy development from a hybrid top-down/bottom-up perspective to solve this mismatch. As such, LIAISON seeks to uncover to what extent compliance tools could be used as data generators for robot policy purposes to unravel an optimal regulatory framing for existing and emerging robot technologies.
Supreme audit institutions (SAIs) are touted as an integral component to anticorruption efforts in developing nations. SAIs review governmental budgets and report fiscal discrepancies in publicly available audit reports. These documents contain valuable information on budgetary discrepancies, missing resources, or may even report fraud and corruption. Existing research on anticorruption efforts relies on information published by national-level SAIs while mostly ignoring audits from subnational SAIs because their information is not published in accessible formats. I collect publicly available audit reports published by a subnational SAI in Mexico, the Auditoria Superior del Estado de Sinaloa, and build a pipeline for extracting the monetary value of discrepancies detected in municipal budgets. I systematically convert scanned documents into machine-readable text using optical character recognition, and I then train a classification model to identify paragraphs with relevant information. From the relevant paragraphs, I extract the monetary values of budgetary discrepancies by developing a named entity recognizer that automates the identification of this information. In this paper, I explain the steps for building the pipeline and detail the procedures for replicating it in different contexts. The resulting dataset contains the official amounts of discrepancies in municipal budgets for the state of Sinaloa. This information is useful to anticorruption policymakers because it quantifies discrepancies in municipal spending potentially motivating reforms that mitigate misappropriation. Although I focus on a single state in Mexico, this method can be extended to any context where audit reports are publicly available.
Persons experiencing homelessness (PEH) or rough sleeping are a vulnerable population, likely to be disproportionately affected by the coronavirus disease 2019 (COVID-19) pandemic. The impact of COVID-19 infection on this population is yet to be fully described in England. We present a novel method to identify COVID-19 cases in this population and describe its findings. A phenotype was developed and validated to identify PEH or rough sleeping in a national surveillance system. Confirmed COVID-19 cases in England from March 2020 to March 2022 were address-matched to known homelessness accommodations and shelters. Further cases were identified using address-based indicators, such as NHS pseudo postcodes. In total, 1835 cases were identified by the phenotype. Most were <39 years of age (66.8%) and male (62.8%). The proportion of cases was highest in London (29.8%). The proportion of cases of a minority ethnic background and deaths were disproportionality greater in this population, compared to all COVID-19 cases in England. This methodology provides an approach to track the impact of COVID-19 on a subset of this population and will be relevant to policy making. Future surveillance systems and studies may benefit from this approach to further investigate the impact of COVID-19 and other diseases on select populations.
A quantification of the financial implications of the design of a funded, collective defined contribution (CDC) pension scheme is presented and illustrated. It is done through an attribution analysis, which allows the importance of various elements of CDC scheme design to be determined. The model of a CDC scheme analysed is based lightly on the first CDC scheme set to be approved in the UK. In the CDC scheme analysed, contributions are fixed and the initial benefit accrued by each contribution is fixed. Once accrued, benefits are subsequently adjusted annually in response to changes in assumptions and returns. An attribution of the benefit payments shows that this design gives higher benefits to the first generations and lower benefits to the last generations, for a scheme which starts with no members. The contributions paid also affect the balance of benefits paid between generations. Too high a contribution is to the advantage of the first generations. Too low a contribution is in the interests of the later generations. The conclusion, within the simple model considered, is that a constant benefit accrual is an important design choice. Its financial consequences across all generations should be carefully analysed, if it is intended to be implemented. Additionally, contributions should be reviewed regularly in such a CDC scheme, to ensure that cross-subsidies are not borne excessively by particular generations.
The notion of ordered system signature, originally defined for independent and identical coherent systems, is first extended to the case of independent and non-identical coherent systems, and then some key properties that help simplify its computation are established. Through its use, a dynamic ordered system signature is defined next, which facilitates a systematic study of dynamic properties of several coherent systems under a life test. The theoretical results established here are then illustrated through some specific examples. Finally, the usefulness in the evaluation of aging used systems of the concepts introduced is demonstrated.
Quantifying tail dependence is an important issue in insurance and risk management. The prevalent tail dependence coefficient (TDC), however, is known to underestimate the degree of tail dependence and it does not capture non-exchangeable tail dependence since it evaluates the limiting tail probability only along the main diagonal. To overcome these issues, two novel tail dependence measures called the maximal tail concordance measure (MTCM) and the average tail concordance measure (ATCM) are proposed. Both measures are constructed based on tail copulas and possess clear probabilistic interpretations in that the MTCM evaluates the largest limiting probability among all comparable rectangles in the tail, and the ATCM is a normalized average of these limiting probabilities. In contrast to the TDC, the proposed measures can capture non-exchangeable tail dependence. Analytical forms of the proposed measures are also derived for various copulas. A real data analysis reveals striking tail dependence and tail non-exchangeability of the return series of stock indices, particularly in periods of financial distress.
Today, technological developments are ever-growing yet fragmented. Alongside inconsistent digital approaches and attitudes across city administrations, such developments have made it difficult to reap the benefits of city digital twins. Bringing together experiences from five research projects, this paper discusses these digital twins based on two digital integration methodologies—systems and semantic integration. We revisit the nature of the underlying technologies, and their implications for interoperability and compatibility in the context of planning processes and smart urbanism. Semantic approaches present a new opportunity for bidirectional data flows that can inform both governance processes and technological systems to co-create, cross-pollinate, and support optimal outcomes. Building on this opportunity, we suggest that considering the technological dimension as a new addition to the trifecta of economic, environmental, and social sustainability goals that guide planning processes, can aid governments to address this conundrum of fragmentation, interoperability, and compatibility.
The principle of maximum entropy is a well-known approach to produce a model for data-generating distributions. In this approach, if partial knowledge about the distribution is available in terms of a set of information constraints, then the model that maximizes entropy under these constraints is used for the inference. In this paper, we propose a new three-parameter lifetime distribution using the maximum entropy principle under the constraints on the mean and a general index. We then present some statistical properties of the new distribution, including hazard rate function, quantile function, moments, characterization, and stochastic ordering. We use the maximum likelihood estimation technique to estimate the model parameters. A Monte Carlo study is carried out to evaluate the performance of the estimation method. In order to illustrate the usefulness of the proposed model, we fit the model to three real data sets and compare its relative performance with respect to the beta generalized Weibull family.
Let $\{X_n\}_{n\in{\mathbb{N}}}$ be an ${\mathbb{X}}$-valued iterated function system (IFS) of Lipschitz maps defined as $X_0 \in {\mathbb{X}}$ and for $n\geq 1$, $X_n\;:\!=\;F(X_{n-1},\vartheta_n)$, where $\{\vartheta_n\}_{n \ge 1}$ are independent and identically distributed random variables with common probability distribution $\mathfrak{p}$, $F(\cdot,\cdot)$ is Lipschitz continuous in the first variable, and $X_0$ is independent of $\{\vartheta_n\}_{n \ge 1}$. Under parametric perturbation of both F and $\mathfrak{p}$, we are interested in the robustness of the V-geometrical ergodicity property of $\{X_n\}_{n\in{\mathbb{N}}}$, of its invariant probability measure, and finally of the probability distribution of $X_n$. Specifically, we propose a pattern of assumptions for studying such robustness properties for an IFS. This pattern is implemented for the autoregressive processes with autoregressive conditional heteroscedastic errors, and for IFS under roundoff error or under thresholding/truncation. Moreover, we provide a general set of assumptions covering the classical Feller-type hypotheses for an IFS to be a V-geometrical ergodic process. An accurate bound for the rate of convergence is also provided.