To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this paper we study one-sided hypothesis testing under random sampling without replacement, which frequently appears in the cryptographic problem setting, including the verification of measurement-based quantum computation. Suppose that $n+1$ binary random variables $X_1,\ldots, X_{n+1}$ follow a permutation invariant distribution and n binary random variables $X_1,\ldots, X_{n}$ are observed. Then, we propose randomized tests with a randomization parameter for the expectation of the $(n+1)$th random variable $X_{n+1}$ under a given significance level $\delta>0$. Our randomized tests significantly improve the upper confidence limit over deterministic tests. Our problem setting commonly appears in machine learning in addition to cryptographic scenarios by considering adversarial examples. Such studies are essential for expanding the applicable area of statistics. Although this paper addresses only binary random variables, a similar significant improvement by randomized tests can be expected for general non-binary random variables.
In this paper we consider a dynamic Erdős–Rényi graph in which edges, according to an alternating renewal process, change from present to absent and vice versa. The objective is to estimate the on- and off-time distributions while only observing the aggregate number of edges. This inverse problem is dealt with, in a parametric context, by setting up an estimator based on the method of moments. We provide conditions under which the estimator is asymptotically normal, and we point out how the corresponding covariance matrix can be identified. We also demonstrate how to adapt the estimation procedure if alternative subgraph counts are observed, such as the number of wedges or triangles.
This paper investigates the asymptotic properties of parameter estimation for the Ewens–Pitman partition with parameters $0\lt\alpha\lt1$ and $\theta\gt-\alpha$. Specifically, we show that the maximum-likelihood estimator (MLE) of $\alpha$ is $n^{\alpha/2}$-consistent and converges to a variance mixture of normal distributions, where the variance is governed by the Mittag-Leffler distribution. Moreover, we show that a proper normalization involving a random statistic eliminates the randomness in the variance. Building on this result, we construct an approximate confidence interval for $\alpha$. Our proof relies on a stable martingale central limit theorem, which is of independent interest.
This paper introduces a method for pricing insurance policies using market data. The approach is designed for scenarios in which the insurance company seeks to enter a new market, in our case: pet insurance, lacking historical data. The methodology involves an iterative two-step process. First, a suitable parameter is proposed to characterize the underlying risk. Second, the resulting pure premium is linked to the observed commercial premium using an isotonic regression model. To validate the method, comprehensive testing is conducted on synthetic data, followed by its application to a dataset of actual pet insurance rates. To facilitate practical implementation, we have developed an R package called IsoPriceR. By addressing the challenge of pricing insurance policies in the absence of historical data, this method helps enhance pricing strategies in emerging markets.
The cumulative residual extropy has been proposed recently as an alternative measure of extropy to the cumulative distribution function of a random variable. In this paper, the concept of cumulative residual extropy has been extended to cumulative residual extropy inaccuracy (CREI) and dynamic cumulative residual extropy inaccuracy (DCREI). Some lower and upper bounds for these measures are provided. A characterization problem for the DCREI measure under the proportional hazard rate model is studied. Nonparametric estimators for CREI and DCREI measures based on kernel and empirical methods are suggested. Also, a simulation study is presented to evaluate the performance of the suggested measures. Simulation results show that the kernel-based estimator performs better than the empirical-based estimator. Finally, applications of the DCREI measure for model selection are provided using two real data sets.
Reliability analysis of stress–strength models usually assumes that the stress and strength variables are independent. However, in numerous real-world scenarios, stress and strength variables exhibit dependence. This paper investigates the reliability estimation in a multicomponent stress–strength model for parallel-series system assuming that the dependence between stress and strength is based on the Clayton copula. The estimators for the unknown parameters and system reliability are derived using the two-step maximum likelihood estimation and the maximum product spacing methods. Additionally, confidence intervals are constructed by utilizing asymptotically normal distribution theory and bootstrap method. Furthermore, Monte Carlo simulations are conducted to compare the effectiveness of the proposed inference methods. Finally, a real dataset is analyzed for illustrative purposes.
In this paper, we use an information theoretic approach called cumulative residual extropy (CRJ) to compare mixed used systems. We establish mixture representations for the CRJ of mixed used systems and then explore the measure and comparison results among these systems. We compare the mixed used systems based on stochastic orders and stochastically ordered conditional coefficients vectors. Additionally, we derive bounds for the CRJ of mixed used systems with independent and identically distributed components. We also propose the Jensen-cumulative residual extropy (JCRJ) divergence to calculate the complexity of systems. To demonstrate the utility of these results, we calculate and compare the CRJ and JCRJ divergence of mixed used systems in the Exponential model. Furthermore, we determine the optimal system configuration based on signature under a criterion function derived from JCRJ in the exponential model.
Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the maximum a posteriori (MAP) estimator for clustering a degree corrected stochastic block model when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.
Inference in spatial and spatio-temporal models can be challenging for a variety of reasons. For example, non-Gaussianity often leads to analytically intractable integrals; we may be in a ‘big’ data setting, whereby the number of observations renders traditional methods too computationally expensive; we may wish to make inferences over spatial supports that are different to those of our measurements; or, we may wish to use a statistical model whose likelihood function is either unavailable or computationally intractable. In this thesis, I develop several techniques that help to alleviate these challenges.
We investigate some aspects of the problem of the estimation of birth distributions (BDs) in multi-type Galton–Watson trees (MGWs) with unobserved types. More precisely, we consider two-type MGWs called spinal-structured trees. This kind of tree is characterized by a spine of special individuals whose BD $\nu$ is different from the other individuals in the tree (called normal, and whose BD is denoted by $\mu$). In this work, we show that even in such a very structured two-type population, our ability to distinguish the two types and estimate $\mu$ and $\nu$ is constrained by a trade-off between the growth-rate of the population and the similarity of $\mu$ and $\nu$. Indeed, if the growth-rate is too large, large deviation events are likely to be observed in the sampling of the normal individuals, preventing us from distinguishing them from special ones. Roughly speaking, our approach succeeds if $r\lt \mathfrak{D}(\mu,\nu)$, where r is the exponential growth-rate of the population and $\mathfrak{D}$ is a divergence measuring the dissimilarity between $\mu$ and $\nu$.
We study the community detection problem on a Gaussian mixture model, in which vertices are divided into $k\geq 2$ distinct communities. The major difference in our model is that the intensities for Gaussian perturbations are different for different entries in the observation matrix, and we do not assume that every community has the same number of vertices. We explicitly find the necessary and sufficient conditions for the exact recovery of the maximum likelihood estimation, which can give a sharp phase transition for the exact recovery even though the Gaussian perturbations are not identically distributed; see Section 7. Applications include the community detection on hypergraphs.
In this paper, we introduce a novel way to quantify the remaining inaccuracy of order statistics by utilizing the concept of extropy. We explore various properties and characteristics of this new measure. Additionally, we expand the notion of inaccuracy for ordered random variables to a dynamic version and demonstrate that this dynamic information measure provides a unique determination of the distribution function. Moreover, we investigate specific lifetime distributions by analyzing the residual inaccuracy of the first-order statistics. Nonparametric kernel estimation of the proposed measure is suggested. Simulation results show that the kernel estimator with bandwidth selection using the cross-validation method has the best performance. Finally, an application of the proposed measure on the model selection is provided.
In this paper we study the drift parameter estimation for reflected stochastic linear differential equations of a large signal. We discuss the consistency and asymptotic distributions of trajectory fitting estimator (TFE).
Several information measures have been proposed and studied in the literature. One such measure is extropy, a complementary dual function of entropy. Its meaning and related aging notions have not yet been studied in great detail. In this paper, we first illustrate that extropy information ranks the uniformity of a wide array of absolutely continuous families. We then discuss several theoretical merits of extropy. We also provide a closed-form expression of it for finite mixture distributions. Finally, the dynamic versions of extropy are also discussed, specifically the residual extropy and past extropy measures.
Let $(Z_n)_{n\geq0}$ be a supercritical Galton–Watson process. Consider the Lotka–Nagaev estimator for the offspring mean. In this paper we establish self-normalized Cramér-type moderate deviations and Berry–Esseen bounds for the Lotka–Nagaev estimator. The results are believed to be optimal or near-optimal.
Consider the problem of determining the Bayesian credibility mean $E(X_{n+1}|X_1,\cdots, X_n),$ whenever the random claims $X_1,\cdots, X_n,$ given parameter vector $\boldsymbol{\Psi},$ are sampled from the K-component mixture family of distributions, whose members are the union of different families of distributions. This article begins by deriving a recursive formula for such a Bayesian credibility mean. Moreover, under the assumption that using additional information $Z_{i,1},\cdots,Z_{i,m},$ one may probabilistically determine a random claim $X_i$ belongs to a given population (or a distribution), the above recursive formula simplifies to an exact Bayesian credibility mean whenever all components of the mixture distribution belong to the exponential families of distributions. For a situation where a 2-component mixture family of distributions is an appropriate choice for data modelling, using the logistic regression model, it shows that: how one may employ such additional information to derive the Bayesian credibility model, say Logistic Regression Credibility model, for a finite mixture of distributions. A comparison between the Logistic Regression Credibility (LRC) model and its competitor, the Regression Tree Credibility (RTC) model, has been given. More precisely, it shows that under the squared error loss function, it shows the LRC’s risk function dominates the RTC’s risk function at least in an interval which about $0.5.$ Several examples have been given to illustrate the practical application of our findings.
We introduce a new measure of inaccuracy based on extropy between distributions of the nth upper (lower) record value and parent random variable and discuss some properties of it. A characterization problem for the proposed extropy inaccuracy measure has been studied. It is also shown that the defined measure of inaccuracy is invariant under scale but not under location transformation. We characterize certain specific lifetime distribution functions. Nonparametric estimators based on the empirical and kernel methods for the proposed measures are also obtained. The performance of estimators is also discussed using a real dataset.
Let $\{X_n\}_{n\in{\mathbb{N}}}$ be an ${\mathbb{X}}$-valued iterated function system (IFS) of Lipschitz maps defined as $X_0 \in {\mathbb{X}}$ and for $n\geq 1$, $X_n\;:\!=\;F(X_{n-1},\vartheta_n)$, where $\{\vartheta_n\}_{n \ge 1}$ are independent and identically distributed random variables with common probability distribution $\mathfrak{p}$, $F(\cdot,\cdot)$ is Lipschitz continuous in the first variable, and $X_0$ is independent of $\{\vartheta_n\}_{n \ge 1}$. Under parametric perturbation of both F and $\mathfrak{p}$, we are interested in the robustness of the V-geometrical ergodicity property of $\{X_n\}_{n\in{\mathbb{N}}}$, of its invariant probability measure, and finally of the probability distribution of $X_n$. Specifically, we propose a pattern of assumptions for studying such robustness properties for an IFS. This pattern is implemented for the autoregressive processes with autoregressive conditional heteroscedastic errors, and for IFS under roundoff error or under thresholding/truncation. Moreover, we provide a general set of assumptions covering the classical Feller-type hypotheses for an IFS to be a V-geometrical ergodic process. An accurate bound for the rate of convergence is also provided.