To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This experimental study aimed to determine the activity of a near-UVA (405 nm) LED ceiling system against the SARS-CoV-2 virus. The ceiling system comprised 17 near-UVA LED lights with a radiant power of 1.1 W/each centred at 405 nm wavelength. A 96-multiwell plate, fixed to a wooden base, was inoculated with suspensions of VERO E6 cell cultures infected with SARS-CoV-2 virus and irradiated at a distance of 40 cm with a dose of 20.2 J/cm2 for 120 min. The collected suspensions were transferred to VERO cell culture plates and incubated for 3 days. The maximum measurable log reduction obtained, starting from a concentration of 107.2 TCID50/mL, was 3.0 log10 and indicated inhibition of SARS-CoV-2 replication by the near-UVA LED ceiling system. Near-UVA light at a 405-nm wavelength is emerging as a potential alternative treatment for localised infections and environmental decontamination because it is far less harmful to living organisms’ cells than UV-C irradiation.
Blastocystis is a protist of controversial pathogenicity inhabiting the gut of humans and other animals. Despite a century of intense study, understanding of the epidemiology of Blastocystis remains fragmentary. Here, we aimed to explore its prevalence, stability of colonisation and association with various factors in a rural elementary school in northern Thailand. One hundred and forty faecal samples were collected from 104 children at two time points (tp) 105 days apart. For tp2, samples were also obtained from 15 animals residing on campus and seven water locations. Prevalence in children was 67% at tp1 and 89% at tp2, 63% in chickens, 86% in pigs, and 57% in water. Ten STs were identified, two of which were shared between humans and animals, one between animals and water, and three between humans and water. Eighteen children (out of 36) carried the same ST over both time points, indicating stable colonisation. Presence of Blastocystis (or ST) was not associated with body mass index, ethnicity, birth delivery mode, or milk source as an infant. This study advances understanding of Blastocystis prevalence in an understudied age group, the role of the environment in transmission, and the ability of specific STs to stably colonise children.
An edge flipping is a non-reversible Markov chain on a given connected graph, as defined in Chung and Graham (2012). In the same paper, edge flipping eigenvalues and stationary distributions for some classes of graphs were identified. We further study edge flipping spectral properties to show a lower bound for the rate of convergence in the case of regular graphs. Moreover, we show by a coupling argument that a cutoff occurs at $\frac{1}{4} n \log n$ for the edge flipping on the complete graph.
Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that are routinely generated. In applications that are constrained by memory and computational intensity, excessively large datasets may hinder scientific discovery, making data reduction a critical component of data-driven methods. Datasets are growing in two directions: the number of data points and their dimensionality. Whereas dimension reduction typically aims at describing each data sample on lower-dimensional space, the focus here is on reducing the number of data points. A strategy is proposed to select data points such that they uniformly span the phase-space of the data. The algorithm proposed relies on estimating the probability map of the data and using it to construct an acceptance probability. An iterative method is used to accurately estimate the probability of the rare data points when only a small subset of the dataset is used to construct the probability map. Instead of binning the phase-space to estimate the probability map, its functional form is approximated with a normalizing flow. Therefore, the method naturally extends to high-dimensional datasets. The proposed framework is demonstrated as a viable pathway to enable data-efficient machine learning when abundant data are available.
We study the coupon collector’s problem with group drawings. Assume there are n different coupons. At each time precisely s of the n coupons are drawn, where all choices are supposed to have equal probability. The focus lies on the fluctuations, as $n\to\infty$, of the number $Z_{n,s}(k_n)$ of coupons that have not been drawn in the first $k_n$ drawings. Using a size-biased coupling construction together with Stein’s method for normal approximation, a quantitative central limit theorem for $Z_{n,s}(k_n)$ is shown for the case that $k_n=({n/s})(\alpha\log(n)+x)$, where $0<\alpha<1$ and $x\in\mathbb{R}$. The same coupling construction is used to retrieve a quantitative Poisson limit theorem in the boundary case $\alpha=1$, again using Stein’s method.
For a fixed infinite graph $H$, we study the largest density of a monochromatic subgraph isomorphic to $H$ that can be found in every two-colouring of the edges of $K_{\mathbb{N}}$. This is called the Ramsey upper density of $H$ and was introduced by Erdős and Galvin in a restricted setting, and by DeBiasio and McKenney in general. Recently [4], the Ramsey upper density of the infinite path was determined. Here, we find the value of this density for all locally finite graphs $H$ up to a factor of 2, answering a question of DeBiasio and McKenney. We also find the exact density for a wide class of bipartite graphs, including all locally finite forests. Our approach relates this problem to the solution of an optimisation problem for continuous functions. We show that, under certain conditions, the density depends only on the chromatic number of $H$, the number of components of $H$ and the expansion ratio $|N(I)|/|I|$ of the independent sets of $H$.
The prevalence rate of coinfection Chagas disease (CD) and HIV in Brazil is between 1.3 and 5%. Serological tests for detecting CD use total antigen, which present cross reactivity with other endemic diseases, such as leishmaniasis. It is urge the use of a specific test to determinate the real prevalence of T. cruzi infection in people living with HIV AIDS (PLWHA). Here, we evaluated the prevalence of T. cruzi infection in a cohort of 240 PLWHA living in urban area from São Paulo, Brazil. Enzyme Linked Immunosorbent Assay, using epimastigote alkaline extract antigen from T. cruzi (ELISA EAE), returned a 2.0% prevalence. However by Immunoblotting, using trypomastigote excreted-secreted antigen (TESA Blot) from T. cruzi, we detected a prevalence of 0.83%. We consider that the real prevalence of T. cruzi-infection in PLWHA is 0.83%, lower than reported in literature; this is due to TESA Blot specificity, probably excluding false positives for CD immunodiagnosis. Our results demonstrate a real need to apply diagnostic tests with high sensitivity and specificity that can help assess the current status of CD/HIV coinfection in Brazil in order to stratify the effective risk of reactivation and consequently decreasing mortality.
We propose various semiparametric estimators for nonlinear selection models, where slope and intercept can be separately identified. When the selection equation satisfies a monotonic index restriction, we suggest a local polynomial estimator, using only observations for which the marginal cumulative distribution function of the instrument index is close to one. Data-driven procedures such as cross-validation may be used to select the bandwidth for this estimator. We then consider the case in which the monotonic index restriction does not hold and/or the set of observations with a propensity score close to one is thin so that convergence occurs at a rate that is arbitrarily close to the cubic rate. We explore the finite sample behavior in a Monte Carlo study and illustrate the use of our estimator using a model for count data with multiplicative unobserved heterogeneity.
The notion of cross-intersecting set pair system of size $m$, $ (\{A_i\}_{i=1}^m, \{B_i\}_{i=1}^m )$ with $A_i\cap B_i=\emptyset$ and $A_i\cap B_j\ne \emptyset$, was introduced by Bollobás and it became an important tool of extremal combinatorics. His classical result states that $m\le\binom{a+b}{a}$ if $|A_i|\le a$ and $|B_i|\le b$ for each $i$. Our central problem is to see how this bound changes with the additional condition $|A_i\cap B_j|=1$ for $i\ne j$. Such a system is called $1$-cross-intersecting. We show that these systems are related to perfect graphs, clique partitions of graphs, and finite geometries. We prove that their maximum size is
at least $5^{n/2}$ for $n$ even, $a=b=n$,
equal to $\bigl (\lfloor \frac{n}{2}\rfloor +1\bigr )\bigl (\lceil \frac{n}{2}\rceil +1\bigr )$ if $a=2$ and $b=n\ge 4$,
at most $|\cup _{i=1}^m A_i|$,
asymptotically $n^2$ if $\{A_i\}$ is a linear hypergraph ($|A_i\cap A_j|\le 1$ for $i\ne j$),
asymptotically ${1\over 2}n^2$ if $\{A_i\}$ and $\{B_i\}$ are both linear hypergraphs.
We derive a new theoretical lower bound for the expected supremum of drifted fractional Brownian motion with Hurst index $H\in(0,1)$ over a (in)finite time horizon. Extensive simulation experiments indicate that our lower bound outperforms the Monte Carlo estimates based on very dense grids for $H\in(0,\tfrac{1}{2})$. Additionally, we derive the Paley–Wiener–Zygmund representation of a linear fractional Brownian motion in the general case and give an explicit expression for the derivative of the expected supremum at $H=\tfrac{1}{2}$ in the sense of Bisewski, Dȩbicki and Rolski (2021).
State-of-the-art machine-learning-based models are a popular choice for modeling and forecasting energy behavior in buildings because given enough data, they are good at finding spatiotemporal patterns and structures even in scenarios where the complexity prohibits analytical descriptions. However, their architecture typically does not hold physical correspondence to mechanistic structures linked with governing physical phenomena. As a result, their ability to successfully generalize for unobserved timesteps depends on the representativeness of the dynamics underlying the observed system in the data, which is difficult to guarantee in real-world engineering problems such as control and energy management in digital twins. In response, we present a framework that combines lumped-parameter models in the form of linear time-invariant (LTI) state-space models (SSMs) with unsupervised reduced-order modeling in a subspace-based domain adaptation (SDA) approach, which is a type of transfer-learning (TL) technique. Traditionally, SDA is adopted for exploiting labeled data from one domain to predict in a different but related target domain for which labeled data is limited. We introduced a novel SDA approach where instead of labeled data, we leverage the geometric structure of the LTI SSM governed by well-known heat transfer ordinary differential equations to forecast for unobserved timesteps beyond available measurement data by geometrically aligning the physics-derived and data-derived embedded subspaces closer together. In this initial exploration, we evaluate the physics-based SDA framework on a demonstrative heat conduction scenario by varying the thermophysical properties of the source and target systems to demonstrate the transferability of mechanistic models from physics to observed measurement data.
A proliferation of data-generating devices, sensors, and applications has led to unprecedented amounts of digital data. We live in an era of datafication, one in which life is increasingly quantified and transformed into intelligence for private or public benefit. When used responsibly, this offers new opportunities for public good. The potential of data is evident in the possibilities offered by open data and data collaboratives—both instances of how wider access to data can lead to positive and often dramatic social transformation. However, three key forms of asymmetry currently limit this potential, especially for already vulnerable and marginalized groups: data asymmetries, information asymmetries, and agency asymmetries. These asymmetries limit human potential, both in a practical and psychological sense, leading to feelings of disempowerment and eroding public trust in technology. Existing methods to limit asymmetries (such as open data or consent) as well as some alternatives under consideration (data ownership, collective ownership, personal information management systems) have limitations to adequately address the challenges at hand. A new principle and practice of digital self-determination (DSD) is therefore required. The study and practice of DSD remain in its infancy. The characteristics we have outlined here are only exploratory, and much work remains to be done so as to better understand what works and what does not. We suggest the need for a new research framework or agenda to explore DSD and how it can address the asymmetries, imbalances, and inequalities—both in data and society more generally—that are emerging as key public policy challenges of our era.
This paper studies large N and large T conditional quantile panel data models with interactive fixed effects. We propose a nuclear norm penalized estimator of the coefficients on the covariates and the low-rank matrix formed by the interactive fixed effects. The estimator solves a convex minimization problem, not requiring pre-estimation of the (number of) interactive fixed effects. It also allows the number of covariates to grow slowly with N and T. We derive an error bound on the estimator that holds uniformly in the quantile level. The order of the bound implies uniform consistency of the estimator and is nearly optimal for the low-rank component. Given the error bound, we also propose a consistent estimator of the number of interactive fixed effects at any quantile level. We demonstrate the performance of the estimator via Monte Carlo simulations.
Pseudo cross-variograms appear naturally in the context of multivariate Brown–Resnick processes, and are a useful tool for analysis and prediction of multivariate random fields. We give a necessary and sufficient criterion for a matrix-valued function to be a pseudo cross-variogram, and further provide a Schoenberg-type result connecting pseudo cross-variograms and multivariate correlation functions. By means of these characterizations, we provide extensions of the popular univariate space–time covariance model of Gneiting to the multivariate case.
Fuelled by the big data explosion, a new methodology to estimate sub-annual death probabilities has recently been proposed, opening new insurance business opportunities. This new approach exploits all the detailed information available from millions of microdata records to develop seasonal-ageing indexes (SAIs) from which sub-annual (quarterly) life tables can be derived from annual tables. In this paper, we explore whether a shortcut could be taken in the estimation of SAIs and (life insurance) sub-annual death rates. We propose three different approximations, in which estimates are attained by using just a small bunch of thousands of data records and assess their impact on several competitive markets defined from an actual portfolio of life insurance policies. Our analyses clearly point to the shortcuts as good practical alternatives that can be used in real-life insurance markets. Noticeably, we see that embracing the new quarterly based approach, even using only an approximation (shortcut), is economically preferable to using the associated annual table, offering a significant competitive advantage to the company adopting this innovation.
Given a graph $G$ and an integer $\ell \ge 2$, we denote by $\alpha _{\ell }(G)$ the maximum size of a $K_{\ell }$-free subset of vertices in $V(G)$. A recent question of Nenadov and Pehova asks for determining the best possible minimum degree conditions forcing clique-factors in $n$-vertex graphs $G$ with $\alpha _{\ell }(G) = o(n)$, which can be seen as a Ramsey–Turán variant of the celebrated Hajnal–Szemerédi theorem. In this paper we find the asymptotical sharp minimum degree threshold for $K_r$-factors in $n$-vertex graphs $G$ with $\alpha _\ell (G)=n^{1-o(1)}$ for all $r\ge \ell \ge 2$.
A random two-cell embedding of a given graph $G$ is obtained by choosing a random local rotation around every vertex. We analyse the expected number of faces of such an embedding, which is equivalent to studying its average genus. In 1991, Stahl [5] proved that the expected number of faces in a random embedding of an arbitrary graph of order $n$ is at most $n\log (n)$. While there are many families of graphs whose expected number of faces is $\Theta (n)$, none are known where the expected number would be super-linear. This led the authors of [1] to conjecture that there is a linear upper bound. In this note we confirm their conjecture by proving that for any $n$-vertex multigraph, the expected number of faces in a random two-cell embedding is at most $2n\log (2\mu )$, where $\mu$ is the maximum edge-multiplicity. This bound is best possible up to a constant factor.
One of the drivers for pushing for open data as a form of corruption control stems from the belief that in making government operations more transparent, it would be possible to hold public officials accountable for how public resources are spent. These large datasets would then be open to the public for scrutiny and analysis, resulting in lower levels of corruption. Though data quality has been largely studied and many advancements have been made, it has not been extensively applied to open data, with some aspects of data quality receiving more attention than others. One key aspect however—accuracy—seems to have been overlooked. This gap resulted in our inquiry: how is accurate open data produced and how might breakdowns in this process introduce opportunities for corruption? We study a government agency situated within the Brazilian Federal Government in order to understand in what ways is accuracy compromised. Adopting a distributed cognition (DCog) theoretical framework, we found that the production of open data is not a neutral activity, instead it is a distributed process performed by individuals and artifacts. This distributed cognitive process creates opportunities for data to be concealed and misrepresented. Two models mapping data production were generated, the combination of which provided an insight into how cognitive processes are distributed, how data flow, are transformed, stored, and processed, and what instances provide opportunities for data inaccuracies and misrepresentations to occur. The results obtained have the potential to aid policymakers in improving data accuracy.
Representative school data on SARS-CoV-2 past-infection are scarce, and differences between pupils and staff remain ambiguous. We performed a nation-wide prospective seroprevalence study among pupils and staff over time and in relation to determinants of infection using Poisson regression and generalised estimating equations. A cluster random sample was selected with allocation by region and sociodemographic (SES) background. Surveys and saliva samples were collected in December 2020, March, and June 2021, and also in October and December 2021 for primary pupils. We recruited 885 primary and 569 secondary pupils and 799 staff in 84 schools. Cumulative seroprevalence (95% CI) among primary pupils increased from 11.0% (7.6; 15.9) at baseline to 60.4% (53.4; 68.3) in December 2021. Group estimates were similar at baseline; however, in June they were significantly higher among primary staff (38.9% (32.5; 46.4)) compared to pupils and secondary staff (24.2% (20.3; 28.8)). Infections were asymptomatic in 48–56% of pupils and 28% of staff. Seropositivity was associated with individual SES in pupils, and with school level, school SES and language network in staff in June. Associations with behavioural characteristics were inconsistent. Seroconversion rates increased two- to four-fold after self-reported high-risk contacts, especially with adults. Seroprevalence studies using non-invasive sampling can inform public health management.