1. Introduction
Competing and complementary risk (CCR) problems typically focus on the failure of a system composed of multiple components or a system with several, sometimes even a countably infinite number of, risk factors that can cause its failure. Here we think of components and risk factors as having random lifetimes, denoted by a sequence of independent and identically distributed (i.i.d.) positive random variables
$X_1, X_2, \ldots.$
At the end of its lifetime the component fails, or the risk occurs.
CCR systems are either sequential or parallel; in sequential systems, the whole system fails at the occurrence of the first among an unknown number N of risk factors; risks compete with each other to cause failure. In this case, the observed failure time is the minimum of the lifetimes of the risk factors. In parallel systems, the system fails after all of an unknown number N of risk factors occur, and the observed failure time is the maximum of the lifetimes of these risk factors. In both settings, the lifetimes of the components or risks are modelled as random, as is the number of risks N, which is assumed to be independent of the lifetimes. These two CCR settings are frequently studied together, as
$\max(X_1, \ldots, X_N) = - \min(\!-X_1, \ldots, -X_N)$
; see [Reference Basu and Klein6].
CCR settings arise in many fields, such as industrial reliability, demography, biomedical studies, public health, and actuarial sciences. For example, in a study of vertical transmission of HIV from an infected mother to her newborn child, several factors increase the risk of transmission, such as high maternal virus load and low birth weight. Exactly which factors determine the timing of transmission is an ongoing area of research. Even with a list of known risk factors, there are possibly many unknown risk factors involved. The problem is therefore modelled as a CCR problem in [Reference Cancho, Louzada-Neto and Barriga8], [Reference Louzada, Bereta and Franco20], [Reference Louzada, Ramos and Perdoná21], and [Reference Tojeiro, Louzada, Roman and Borges35]. The number of successive failures of the air conditioning system of each member of a fleet of 13 Boeing 720 jet aeroplanes was modelled as a CCR problem in [Reference Kus17], with potential risk factors including defective components in the air conditioning system and errors during the production process. In [Reference Adamidis and Loukas3] the period between successive coal-mining disasters is modelled as a CCR problem, with construction faults or human errors committed by inexperienced miners as examples of risks. The daily ozone concentrations in New York during May–September 1973 is treated as a CCR problem in [Reference Jayakumar, Babu and Bakouch16].
We call the distributions used to model the lifetimes in CCR problems the CCR family of distributions. Examples from this family that have been studied in the literature include the exponential–Poisson distribution [Reference Kus17], the Poisson–exponential (PE) lifetime distribution [Reference Cancho, Louzada-Neto and Barriga8], the Weibull–Poisson distribution [Reference Lu and Shi22], the extended Weibull–Poisson distribution, and the extended generalised extreme value Poisson distribution [Reference Ramos, Dey, Louzada and Lachos29]. In [Reference Tahir and Cordeiro34] a detailed review is given of CCR distributions, and additional ones are proposed therein.
The probability mass function of a CCR distribution can be quite unwieldy. Hence, it is of interest to approximate a CCR distribution by a simpler CCR distribution such as the PE distribution or the exponential–geometric distribution. For these two distributions, the number of risks or components follows a zero-truncated Poisson or geometric distribution, respectively. To assess such approximations, we develop Stein’s method for CCR distributions.
The seminal work of Charles Stein [Reference Stein32] derives bounds on the approximation error for normal approximations. In [Reference Chen10], Stein’s method is adapted to Poisson approximation; see also [Reference Arratia, Goldstein and Gordon4] and [Reference Barbour, Holst and Janson5]. Generalisations to many other distributions and dimensions are available; see, for example, [Reference Chen, Goldstein and Shao9], [Reference Mijoule, Raič, Reinert and Swan24], and [Reference Nourdin and Peccati25]. The Stein operators in this paper are based on the density approach; see [Reference Ley, Reinert and Swan18]. A main difficulty in distributional comparison problems is comparing a discrete and a continuous distribution; related works on Stein’s method include [Reference Goldstein and Reinert15], which bounds the Wasserstein distance between a beta distribution and the distribution of the number of white balls in a Pólya–Eggenberger urn, and [Reference Germain and Swan14], where standardisations related to the ones we propose are used. Maxima of a fixed number of random variables were treated with Stein’s method in [Reference Feidt13]; Stein’s method applied to a random sum of random variables can be found in [Reference Peköz, Röllin and Ross28]. Here we propose a comparison of the distributions of a maximum (or minimum) of discrete random variables and a maximum (or minimum) of continuous random variables when the number of these random variables is itself an independent random variable. In future work, the Stein characterisations derived in the present paper could be employed to construct Stein-based goodness-of-fit statistics as in [Reference Betsch and Ebner7].
The remainder of this paper is organised as follows. Section 2 gives a brief introduction to Stein’s method using the density approach and applies it to obtain a general representation for the CCR class of distributions through a Stein operator. Section 3 develops Stein’s method for comparing CCR distributions; as an alternative approach, it also provides a comparison based on a Lindeberg argument. As the main illustration of our results, Section 4 details Stein’s method for the PE distribution and uses it to bound the total variation distance between a PE distribution and a distribution from the CCR family, as well as between a PE distribution and a distribution not from the CCR family. As a second illustration, in Section 5 we develop Stein’s method for the exponential–geometric distribution and perform distributional comparisons in total variation distance. Section 6 gives a bound on the bounded Wasserstein distance between the distribution of the maximum waiting time of sequence patterns in Bernoulli trials and a PE distribution. Proofs that are standard but which would disturb the flow of the argument are postponed to Appendix.
2. Stein’s method for CCR distributions
We use the notation
$\mathbb{R}^+_{{>0}} = (0, \infty)$
and
$\mathbb{N} = \{1, 2, 3, \ldots\}.$
The backward difference operator
$\Delta^-$
operates on a function
$g\,:\, \mathbb{R} \rightarrow \mathbb{R}$
by
$\Delta^-g(x) = g(x) - g(x-1)$
. We note that
$\Delta^- (gq)(y) = q(y)\Delta^- g(y) + g(y-1) \Delta^- q(y)$
. For a function
$h \in \textrm{Lip}_b(1)$
, its derivative is denoted by h
′ and exists almost everywhere (by Rademacher’s theorem).
For two probability distributions q and p on
$\mathbb{R}^+_{{>0}}$
, we seek bounds on distances of the form

where
$\mathcal{H}$
is a set of test functions,
$Z\sim q$
, and
$X \sim p$
. The sets of functions
$\mathcal{H}$
in (2.1) are, for the total variation distance
$d_{\rm TV}$
,
$\mathcal{H} = \{\mathbb{I}[\cdot\in A] \,:\, A \in \mathcal{B} ( \mathbb{R})\}$
and, for the bounded Wasserstein distance
$d_{\rm BW}$
,

Here
$\mathcal{B} ( \mathbb{R})$
denotes the Borel sets of
$\mathbb{R}$
and
$\| \cdot \|$
is the supremum norm in
$\mathbb{R}$
. We note here the alternative formulation for total variation distance based on Borel-measurable functions
$h\,:\, \mathbb{R} \rightarrow \mathbb{R}$
,

2.1. Stein’s method for distributional comparisons
To obtain explicit bounds on the distance between a probability distribution
$\mathcal{L}(X)$
of interest and a usually well-understood approximating distribution
$\mathcal{L}_0(Z)$
, often called the target distribution, Stein’s method connects the test function
$h \in \mathcal{H}$
in (2.1) to the distribution of interest through a Stein equation

In (2.3),
$\mathcal{T}$
is a Stein operator for the distribution
$\mathcal{L}_0 (Z)$
, with an associated Stein class
$ \mathcal{F}(\mathcal{T})$
of functions such that
$\mathbb{E}[\mathcal{T}g(Z)] = 0 \mbox{ for all } g \in \mathcal{F}(\mathcal{T}) $
if and only if
$ Z \sim \mathcal{L}_0 (Z);$
thus, a Stein operator characterises the distribution. The distance (2.1) can then be bounded by
$ d_{\mathcal{H}} (\mathcal{L}(X),\mathcal{L}_0(Z)) \le \sup_{g \in \mathcal{F}(\mathcal{H})}|\mathbb{E}\mathcal{T}g(X)| $
where
$ \mathcal{F}(\mathcal{H}) = \{g_h \,:\, h \in \mathcal{H}\}$
is the set of solutions of the Stein equation (2.3) for the set of test functions
$h \in \mathcal{H}$
.
The Stein operator
$\mathcal{T}$
for a probability distribution is not unique; see e.g. [Reference Ley, Reinert and Swan18]. In this paper, we employ the so-called density method, which uses the score function of a probability distribution to construct a Stein operator, called a score Stein operator. Following [Reference Ley and Swan19, Reference Stein, Diaconis, Holmes and Reinert33], a score Stein operator for a continuous distribution with probability density function (PDF) p and support
$[a,b] \in \mathbb{R}$
acts on functions g for which the derivative exists by

where we take
$0/0 =0$
. For differentiable g and p, (2.4) simplifies to
${\mathcal T}_p g (x) = g'(x) + g(x) \rho (x)$
, where
$\rho = p'/p$
is the score function of p. The Stein class
${\mathcal F}(\mathcal{T}_p)$
is the collection of functions
$ g\, :\, \mathbb{R} \rightarrow \mathbb{R}$
such that g(x) p(x) is differentiable with integrable derivative and
$ \lim_{x \to {a,b} } g(x)p(x)=0 $
. It is straightforward to see that for a PDF p on
${[a,b] \subset \mathbb{R}}$
,

solves (2.3) for h and that if h is bounded then
$g \in \mathcal{F}(\mathcal{T}_p)$
.
To use the Stein equation for a distributional comparison, let X and Y be two random variables with PDFs
$p_X$
and
$p_Y$
, defined on the same probability space and with nested supports
${\textrm{supp}}(p_Y) \subset {\textrm{supp}}(p_X) =[a,b] \subset \mathbb{R}$
, score functions
$\rho_X$
and
$\rho_Y$
, and corresponding score Stein operators
$\mathcal T_X$
and
$\mathcal T_Y$
. Then

where
$g_{X, h}(x)$
is the solution of the Stein equation for h and
$\mathcal T_X$
. Equation (2.6) is a special case of Equation 23 in Section 2.5 of [Reference Ley, Reinert and Swan18].
For a discrete distribution with probability mass function (PMF) q having support
$\mathcal{I}=[a,b] \subset {\mathbb{N}}$
, the discrete backward score function is
$\frac{ \Delta^- q(y)}{q(y)}$
, and, as in Remark 3.2 and Example 3.13 of [Reference Ley, Reinert and Swan18], a discrete backward score Stein operator is

With an abuse of notation, we often refer to a Stein operator for the distribution of X as the Stein operator for X, and similarly refer to the score function of the distribution of X as the score function of X. Further, if
$X \sim p$
, we also write
$\mathcal{T}_X$
for
$\mathcal{T}_p$
.
2.2. CCR distributions
Let
$N \in \mathbb{N}$
be a random variable with finite second moment and let
${\textbf{Y}}= (Y_1, Y_2, \ldots)$
be a sequence of positive i.i.d. random variables, with cumulative distribution function (CDF)
$F_Y$
independent of N. Then

is called a CCR random variable with type indicator
$\alpha \in \{-1, 1\}.$
Setting

and writing
$G_N(x) = \mathbb{E}\, x^N$
, the probability generating function of N,
$W_{\alpha} >0$
has CDF


If the
$Y_i$
have a continuous distribution with PDF
$f_Y$
, then
$W_\alpha$
has PDF

see also [Reference Tahir and Cordeiro34]. Here we have used that
$(U_Y^\alpha)'(w) = \alpha f_Y(w)$
.
If the
$Y_i$
are discrete with PMF
$p_Y$
on
$\mathbb{N},$
the resulting random variable
$W_{\alpha}$
has PMF
$p_{W_{\alpha}}$
, which can be expressed in terms of
$G_N(\cdot)$
as

Example 1.
-
(i) If N is a zero-truncated Poisson random variable with parameter
$\theta$ , then (2.12) simplifies to
$ f_{W_{\alpha}}(w) = f_Y(w) \frac{\theta }{(1-{\mathrm{e}}^{-\theta})} {\mathrm{e}}^{-\theta(1-U_Y^{\alpha}(w))} $ for
$w > 0.$ This is the PDF of the extended Poisson family of distributions given in Equation 3 of [Reference Ramos, Dey, Louzada and Lachos29], where it is called the G-Poisson class of distributions [Reference Tahir and Cordeiro34].
-
(ii) If N is a
$\mathrm{Geometric}(p)$ random variable with PMF
$\mathbb{P}(N=n) = (1-p)^{(n-1)} p$ for
$n \in \mathbb{N} $ , then (2.12) yields
$f_{W_{\alpha}}(w) = f_Y(w) {p}/{(1-(1-p)U_Y^{\alpha}(w))^2}$ for
$ w > 0,$ which gives the distributions in Equations 5.2 and 5.3 of [Reference Marshall and Olkin23].
2.3. Stein’s method for the CCR class of distributions
To obtain a Stein operator for the CCR random variable
$W_{\alpha} = W_{\alpha}(N, \textbf{Y})$
in (2.8), we use the density method. First, we assume that the
$Y_i$
are continuous with differentiable PDF
$f_Y$
. From (2.12), the score function for the distribution of
$W_{{\alpha}}$
is

where
$\rho_Y = {f_Y'}/{f_Y}$
is the score function of Y. Hence
$\mathcal{T}_{W_{\alpha}}$
given by

for differentiable g is a Stein operator acting on the functions
$g \in {\mathcal F}(\mathcal{T}_{W_{\alpha}})$
. For a test function
$h \in \mathcal{H}$
, the corresponding Stein equation is

Thus, for any random variable X, the distance
$d_{\mathcal{H}}$
from (2.1) between the distributions of X and
$W_{\alpha}$
can be bounded by bounding the expectation of the left-hand side of (2.16). For
$Y_i$
taking values in
$\mathbb{N}$
with PMF
$p_Y$
, the backward score function is

with corresponding discrete backward score Stein operator
$\mathcal{T}_{W_{\alpha}}$
operating as

3. A general comparison approach
To illustrate the use of Stein’s method for CCR distributions, we compare the distributions of two maxima or two minima of a random number of i.i.d. random variables
$W_{{\alpha}}(N, \textbf{Y})$
and
$W_{{\alpha}}(M, \textbf{Z})$
.
Proposition 1. Let
$W_{\alpha_1}(N, \textbf{Y})$
and
$W_{\alpha_2}(M, \textbf{Z})$
for
$\alpha_1, \alpha_2 \in \{-1, 1\}$
be CCR random variables with PDFs
$f_Y$
and
$f_Z$
and score functions
$\rho_Y$
and
$\rho_Z$
. Then for any test function h such that the
$W_{\alpha_1}(N, \textbf{Y})$
Stein equation (2.16) for h has a solution
$g=g_h$
,

where
$W = W_{\alpha_{2}}(M, \textbf{Z})$
and
$U_{\cdot}^{\alpha}$
is as in (2.9).
If the
$Y_i$
are discrete with PMF
$f_Y$
and the
$Z_i$
are discrete with PMF
$f_Z$
, then

Proof. Substitute the score functions (2.14) and (2.17) into (2.6); simplifying then gives (3.1) and (3.2) respectively.
To compare a discrete random variable and a continuous random variable, we use the concept of standardised Stein equations as in [Reference Germain and Swan14] and [Reference Ley, Reinert and Swan18]. For a continuous random variable W with score function
$\rho_W$
and a differentiable function
$c\,:\, \mathbb{R} \rightarrow \mathbb{R}$
, define a c-standardised Stein operator
${\mathcal{T}_{W}^{(c)}}$
by

For a random variable
$V \in \mathbb{N}$
with discrete backward score function
$\rho_V$
and a function
$d\,:\, \mathbb{N} \rightarrow \mathbb{R}$
, define a d-standardised Stein operator for V by

For CCR random variables
$W_{\alpha}(N, {\textbf{Y}})$
and
$W_{\alpha}(M, {\textbf{Z}})$
, when the
$Y_i$
are continuous on
$\mathbb{R}^+_{{>0}} $
with differentiable PDF
$f_y$
and the
$Z_i$
take values in
$\mathbb{N}$
, we rescale
$W_{\alpha}(M, \textbf{Z})$
by dividing it by n, to obtain
$W_n = W_{\alpha, n} = \frac1n W_{\alpha}(M, \textbf{ Z})$
. If
${W_{\alpha}(M, \textbf{Z})} \in \mathbb{N}$
has PMF p and backward score function
$\rho$
, then
$W_n$
has PMF
$ \mathbb{P} (W_n = z) = p (nz)$
and backward score function
$\tilde{\rho}_n(z) = \rho (nz)$
. We note here that the ratio
$\rho(nz) = \frac{ p(nz) - p(nz-1)}{p(nz)} $
is the score function of
${W_{\alpha}(M, \textbf{Z})}$
evaluated at nz, which for
$n \ne 1$
does not equal the score function of
$W_n$
. With
$ \Delta^{-n} f(x) \,:\!= f(x) - f(x-1/n)$
, we obtain the Stein operator
$ {\mathcal T}_{n}^{(d)}$
given by

Proposition 2. Let
${W_{\alpha}(M, \textbf{Z})}$
be a discrete CCR random variable with discrete backward score function
${\rho_W}$
; for
$n \in \mathbb{N}$
set
$W_n = {W_{\alpha}(M, \textbf{Z})}/n$
and
$\tilde{\rho}_n(z) = \rho_W(nz)$
. Let
${W = W_{\alpha}(N, \textbf{Y})}$
be a continuous CCR random variable with score function
$\rho$
. Let
$h \in \mathcal{H}$
be a test function such that the
${\mathcal{L}}({W})$
Stein equation (2.3) has solution
$g=g_h$
. Then for any differentiable function
$c\,:\, \mathbb{R}^+_{{>0}} \rightarrow \mathbb{R}$
and any function
$d\,:\, \mathbb{N} \rightarrow \mathbb{R}$
,

Proof. To compare the two distributions, for a given test function h we have

with g solving the continuous Stein equation (3.3) for h. Next, we note that for the Stein operator given in (3.5),
$\mathbb{E} {\mathcal T}_{n}^{(d)} (g) ({W_n}) = 0$
by construction. Hence, also
$n \mathbb{E} {\mathcal T}_n^{(d)} (g) (W_n) = 0.$
Thus, (3.7) yields

Rearranging gives the assertion.
Adaptation to other deterministic scaling functions should be straightforward; as in our examples, we concentrate on the case where we scale by dividing by n. As an aside, while such standardisations and scalings could perhaps be used for a ‘bespoke derivative’ as in [Reference Germain and Swan14], the connection is not obvious.
An alternative comparison of CCR distributions can be achieved using a Lindeberg argument, to arrive at the following result.
Proposition 3. Let
$W_{\alpha}(M, \textbf{X})$
and
$W_{\alpha}(N, \textbf{E})$
be given in (2.8). Then for functions
$h \in \textrm{ Lip}_b(1)$
,

If M and N are identically distributed random variables, then

Proof. We employ a Lindeberg argument, as follows. Defining
$X_1, X_2, \ldots$
and
$E_1, E_2, \ldots$
on the same probability space, we have


To bound (3.10), we simply note that

Now, to bound (3.11), if
$\alpha =1$
, then

Similarly, the minimum case (
$\alpha =-1$
) follows as
$\min(X_i) \le \max(X_i) \le \sum X_i$
. Adding (3.12) and (3.13) and taking the supremum over all
$h \in \textrm{Lip}_b(1)$
gives the first assertion. The second assertion follows from coupling M and N so that
$M=N$
almost surely, in which case (3.12) vanishes.
Propositions 1, 2 and 3 complement each other; the first two yield bounds in
$d_{\rm TV}$
distance, for a general variable W, while the last result can be translated into a bound in
$d_{\rm BW}$
distance, for comparing CCR distributions; see Remark 4 for more details. We note that Proposition 1 can be used to compare a maximum and a minimum, whereas Propositions 2 and 3 require the same value of
$\alpha$
.
4. Application to the Poisson–exponential distribution
Cancho et al. [Reference Cancho, Louzada-Neto and Barriga8] introduced the PE distribution as a distribution of the maximum of N i.i.d. exponential random variables from an infinite sequence
${\textbf{E}}= (E_1, E_2, \ldots)$
such that
$E_i \sim \mathrm{Exp}(\lambda)$
with parameter
$\lambda$
(having mean
$1/{\lambda}$
) and N follows a zero-truncated Poisson distribution with parameter
$\theta$
, independently of
${\textbf{E}}$
. This maximum has the PE distribution with parameters
$\theta, \lambda >0$
, denoted by
$\mathrm{PE}(\theta, \lambda)$
, which has the differentiable PDF

To obtain a Stein operator for
$\mathrm{PE}(\theta, \lambda)$
, we use (2.14) with
$G_N({u}) = \frac{{\mathrm{e}}^{-\theta}}{1 - {\mathrm{e}}^{-\theta}} ( {\mathrm{e}}^{\theta{u}} - 1 )$
and
$\frac{G_N''({u})}{G_N'({u})} = \theta,$
yielding the score function

Equation (2.15) gives

For a bounded test function
$h\in \mathcal{H}$
, the score
$\mathrm{PE}(\theta, \lambda)$
Stein equation is

with the solution
$f_h$
, given by (2.5), satisfying the following bounds.
Lemma 1. Let
$ h\,:\, \mathbb{R}^+_{{>0}} \rightarrow \mathbb{R} $
be bounded and let f denote the solution (2.5) of the Stein equation (4.4) for h. Let
$\tilde{h}(w) = h(w)-\mathbb{E} h(W)$
for
$W \sim \mathrm{PE}(\theta, \lambda)$
. Then for all
$w>0$
,





If in addition
$h \in \ {Lip}_b(1)$
, then at all points w at which h′ exists,

Proof. We write p for the PDF of
$\mathrm{PE}(\theta, \lambda)$
. To prove (4.5) and (4.6), we bound

and (4.5) follows. From
$1-{\mathrm{e}}^{-y} < 1$
for all
$ y>0$
, we get (4.6).
Proof of (4.7). Case 1:
$ \theta {\mathrm{e}}^{-\lambda w} - 1 > 0$
. In this case
$0< w < \frac{\ln \theta} {\lambda}$
and we have

As
$p'(t) = \lambda (\theta {\mathrm{e}}^{-\lambda t}-1) p(t)\ge \lambda (\theta {\mathrm{e}}^{-\lambda w}-1) p(t)$
for
$ 0< t < w < \frac{\ln \theta }{\lambda}$
, it follows that

Hence we obtain the bound (4.7) for
$0< w < \frac{\ln\theta} {\lambda}$
.
Case 2:
$ \theta {\mathrm{e}}^{-\lambda w} - 1 \le 0$
. In this case
$w \ge \frac{\ln\theta}{\lambda}$
and

Using (2.5) gives

Hence the bound (4.7) follows for all
$w>0$
.
Proof of (4.8)–(4.10). As
$ \lambda(\theta {\mathrm{e}}^{-\lambda w} -1)f(w) = \lambda\theta {\mathrm{e}}^{-\lambda w} f(w) - \lambda f(w),$
the triangle inequality along with (4.7) and (4.6) gives (4.8). To show (4.9), using the triangle inequality, from (4.4) we obtain
$|f'({w})| \le |h({w})-\mathbb{E} h({W})|+|\lambda (\theta {\mathrm{e}}^{-\lambda {w}}-1)f({w})|$
, and using (4.7) yields the bound (4.9).
Now, for h differentiable at w, taking the first-order derivative in (4.4) gives
$ |f''(w)| \le |h'(w)| + |\lambda(\theta {\mathrm{e}}^{-\lambda w} -1)f'(w)| + |\theta \lambda^2 {\mathrm{e}}^{-\lambda w}f(w)|. $
Using (4.6) and (4.9), we obtain the bound (4.10) through

This completes the proof.
Remark 1. If
$\theta \rightarrow 0$
, the PE distribution converges to the exponential distribution
$\mathrm{Exp} (\lambda)$
; when
$\lambda = 1$
, (4.4) reduces to the Stein equation (4.2) in [Reference Peköz and Röllin27]. For this simplified version, the bound in [Reference Peköz and Röllin27] is only one-half of the bound (4.9); this discrepancy arises through our use of the triangle inequality for
$\theta > 0$
.
In the following subsections, we compare a PE distribution with a distribution of a maximum of a random number of i.i.d. random variables, with a generalised Poisson–exponential distribution, and with a Poisson–geometric distribution.
4.1. Approximating the distribution of the maximum of a random number of i.i.d. random variables by a PE distribution
Let
$M\in \mathbb{N}$
be independent of
${\textbf{X}} = (X_1, X_2, \ldots)$
, a sequence of i.i.d. random variables, and let
$W=W_1(M, {\textbf{X}})$
have a PDF of the form (2.12) for
$\alpha = 1$
. Our first comparison result employs Stein’s method.
Corollary 1. Assume that the
$X_i$
have differentiable PDF
$p_X$
, CDF
$F_X$
, and score function
$\rho_X$
. Let
$W = W_1(M, {\textbf{X}}) = \max \{X_1, \ldots, X_M\}$
. Then

If M is a zero-truncated Poisson
$(\theta_m)$
random variable, then (4.11) reduces to

Proof. We employ Proposition 1. For a zero-truncated Poisson random variable N with parameter
$\theta$
such that
$G_N''(\cdot) / G_N'(\cdot) = \theta$
and for
$Y \sim \mathrm{Exp}(\lambda)$
with PDF
$f_Y(y) = \lambda {\mathrm{e}}^{-\lambda y}$
, using (3.1) along with (4.8) and taking h to be an indicator function so that
$\| \tilde{h} \| \le 2 \|h\| \le 2$
gives (4.11). The simplification when M is a zero-truncated Poisson random variable follows from
$G_M''(\cdot) / G_M'(\cdot) = \theta_M$
.
Remark 2. As
$-\lambda$
is the score function of the exponential distribution
$\mathrm{Exp}(\lambda)$
, for M being a zero-truncated
$\mathrm{Poisson}(\theta)$
random variable, the bound in Corollary 1 is close to zero if the density and the score function of X are close to those of
$\mathrm{Exp}(\lambda)$
.
The next result is based on the Lindeberg argument.
Corollary 2. Let
$W_1(N, {\textbf{E}}) \sim \mathrm{PE}(\theta, \lambda)$
and let
$W_{1}(M, {\textbf{X}}) = \max(X_1, \ldots, X_M)$
have a CCR distribution. Then

where
$H_n$
is the nth harmonic number.
If M is also a zero-truncated
$\mathrm{Poisson}(\theta)$
random variable, then

Proof. Proposition 3 gives

Since the random variables in
${\textbf{X}}$
are i.i.d., those in
${\textbf{E}}$
are i.i.d. exponential with parameter
$\lambda$
, and the expectation of the maximum of n exponential random variables is
$\sum_{i=1}^n \frac1i = H_n$
, the assertion follows.
4.2. Approximating the generalised Poisson–exponential distribution
The PE distribution has an increasing or constant failure rate. To model decreasing failure rate as well, Fatima & Roohi [Reference Fatima and Roohi12] introduced the family of generalised Poisson–exponential (GPE) distributions. The differentiable PDF of a GPE distribution with parameters
$\beta, \theta,\lambda > 0 $
, denoted by
$\mathrm{GPE}(\theta,\lambda,\beta)$
, is

For
$\beta = 1$
the density of the GPE distribution simplifies to that of the PE distribution given in (4.1). For
${0 < \beta < 1}$
the PDF of the GPE distribution is monotonically decreasing, while for
$\beta \ge 1$
it is unimodal positively skewed with skewness depending on the shape parameters
$\beta$
and
$\theta$
. The shape of the hazard function also depends on these two shape parameters. For example, for
$\theta = 1$
and
$\lambda =2$
, the failure rate is decreasing for
$ 0 < \beta <1$
and increasing for
$\beta \ge 1$
, as in Figure 2 of [Reference Fatima and Roohi12].
For a data set from [Reference Aarset1] which consists of 50 observations on the time to first failure of devices, in [Reference Fatima and Roohi12] it was shown that the GPE distribution provides a better fit than the PE and some other candidate distributions. However, a GPE distribution is not as easy to manipulate and interpret as a PE distribution. Therefore, a natural question is how to quantify the sacrifice when approximating a GPE distribution with a PE distribution. Here we note that GPE distributions are not from the CCR family, and hence Proposition 3 cannot be applied. Instead we use Stein’s method to bound the approximation error.
For such an approximation to be intuitive, the failure rate of the approximating distribution should be qualitatively similar. Hence we restrict attention to the case of
$\beta \ge 1$
, for which both the GPE and the PE distributions have an increasing failure rate. As an aside, we note that for
$\beta \ge 1$
, the limit of the PDF at 0 is 0, while for
$0 < \beta < 1$
it is undefined; the latter condition leads to a Stein class for the GPE Stein score operator which differs from the Stein class for the PE Stein score operator.
We note here that if
$X \sim \mathrm{GPE}(\theta, \lambda, \beta)$
, then for
$\beta \ge 1$
we have

The proof of (4.13) is given in the appendix.
The GPE random variable is not of the CCR form (2.8), but its score function, given in (4.14) below, can be derived from its PDF (4.12) with parameters
$\beta, \theta, \lambda > 0$
as

In the following theorem we bound the distance between a GPE with
$\beta \ge 1$
and a PE distribution, using their corresponding score Stein operators in (2.6).
Theorem 1. Let
$W \sim \mathrm{PE}(\theta_1, \lambda_1)$
and
$X \sim \mathrm{GPE}(\theta_2, \lambda_2, \beta)$
, let
$h\,:\, {\mathbb{R}^{+}_{>}0} \rightarrow \mathbb{R}$
be bounded, and let
$\tilde h(x) = h(x) - \mathbb{E} h(W)$
. Then for
$\lambda_1 \le \lambda_2$
and
$\beta \ge 1$
,

Proof. Let
$p_W$
and
$p_X$
denote the PDFs of W and X, and let
$\mathcal{T}_{X}$
denote a score Stein operator for a GPE distribution. To employ (2.6), we first check that
$f_W$
as in (2.5) for the PE distribution, for h bounded, is in the Stein class of
$\mathcal{T}_X$
. Invoking Lemma 1,
$f_W$
is bounded. Now
$ {\mathbb{E}} [{\mathcal T}_X\ f_W (X) ]= \int_0^\infty \frac{(f_W p_X)' (x)}{p_X(x)} p_X (x) \,\mathrm{d} x= \lim_{x \to \infty } (f_W p_X) (x) - \lim_{x \rightarrow 0} (f_W p_X) (x),$
and for
$\beta \ge 1$
we have that
$ {(f_Wp_{X})(x)} \rightarrow 0$
as
$x \rightarrow 0$
and as
$x \rightarrow \infty$
, showing that
$f_W$
is in the Stein class for
$\mathcal{T}_{X}$
. Applying (2.6) with the score functions from (4.2) and (4.14), we have

Next,

since
$|{\mathrm{e}}^{-x} - 1 | \le x$
for all
$x \ge 0$
. With
${\mathrm{e}}^{(\lambda_1 - \lambda_2) x} \le 1$
for
$\lambda_1 \le \lambda_2$
, we write

Now, using (4.5) as well as
${\mathrm{e}}^{x} - 1 \ge x$
and
$1-{\mathrm{e}}^{-x} \le x$
for
$x \ge 0$
, we have

Using (4.6), (4.8), (4.13) and (4.18) in (4.17) gives the bound (4.15).
Remark 3. For
$\lambda_1=\lambda_2$
and
$\theta_1=\theta_2$
in Theorem 1, the bound (4.15) can be improved by a factor of 2, using that with
$f_{W}$
being the solution of the PE score Stein equation,

With (4.5) and
$\tilde h(x) = h(x) - \mathbb{E} h(W)$
we obtain

This bound depends solely on the parameter
$\beta$
and tends to 0 as
$\beta\rightarrow 1$
, which is in line with the fact that for
$\beta \rightarrow 1$
,
$\mathrm{GPE}(\theta, \lambda, \beta)$
converges to
$\mathrm{PE}(\theta, \lambda)$
; see [Reference Fatima and Roohi12].
The next result follows immediately from Theorem 1 and (4.13).
Corollary 3. Let
$W_1 \sim \mathrm{PE}(\theta_1, \lambda_1)$
and
$W_2 \sim \mathrm{PE}(\theta_2, \lambda_2)$
with
$\lambda_1 \le \lambda_2$
. Let
$\mathcal{H} = \{h\,:\, \mathbb{R}^+_{>0} \rightarrow \mathbb{R}, \,\|h\| \le 1\}$
. Then for all
$h \in \mathcal{H}$
, letting
$\tilde h(w) = h(w) - \mathbb{E} h(W)$
, we have from (4.15) with
$\beta = 1$
that

Remark 4.
-
(i) For
$\|h\| \le 1$ such that
$\|\tilde{h}\| \le 2$ , the bounds can easily be converted into bounds in total variation distance using (2.2).
-
(ii) To bound
$d_{\mathcal{H}}(\mathrm{PE}(\theta_1,\lambda_1), \mathrm{GPE}(\theta_2,\lambda_2,\beta))$ when
$\lambda_1 > \lambda_2$ , we can use
\begin{align*} d_{\mathcal{H}} \bigl(\mathrm{GPE}(\theta_2,\lambda_2,\beta) , \mathrm{PE}(\theta_1,\lambda_1)\bigr) & \le d_{\mathcal{H}} \bigl(\mathrm{GPE}(\theta_2,\lambda_2,\beta) ,\mathrm{PE}(\theta_2,\lambda_2)\bigr) \\ &\quad + d_{\mathcal{H}}\bigl( \mathrm{PE}(\theta_2,\lambda_2), \mathrm{PE}(\theta_1,\lambda_1)\bigr)\end{align*}
4.3. Approximating the Poisson–geometric distribution
Next we consider the distribution of
$W_G = \max\{T_1, \ldots, T_M\}$
, where
$T_1, T_2, \ldots \in \{1, 2, \ldots\}$
are i.i.d. Geometric(p) random variables and M is an independent zero-truncated Poisson(
$\theta$
) random variable; the distribution of
$W_G$
is the Poisson–geometric (PG) distribution; it has PMF

and the discrete backward score function

For
$T \sim \mathrm{Geometric}(\lambda/n)$
, the distribution of
$n^{-1} {T}$
converges to
$\mathrm{Exp}(\lambda)$
in probability, and hence it is plausible to approximate the distribution of
$Z_n={W_{G,n}}/n$
, for
${W_{G,n}} \sim \mathrm{PG}(\theta, \lambda/n)$
, by a corresponding PE distribution. With
$q_n = 1-\lambda/n $
and
$\tilde{\rho}_n (z) = \rho_{W_G} (nz)$
,

This function is the ratio of two exponential functions, complicating the comparison using (2.6). To simplify the comparison, we use Proposition 2. From (3.4), a standardised PG Stein operator for
$Z_n$
is

here we choose

so that
$ \tilde{\rho}_n(z) d\left( z - \frac1n \right) = {\mathrm{e}}^{-q_n^{nz }\theta}- 2 {\mathrm{e}}^{-q_n^{nz-1}\theta} + {\mathrm{e}}^{-q_n^{nz -2}\theta}.$
As
$nd(z)\rightarrow \lambda \theta {\mathrm{e}}^{-\lambda z - \theta {\mathrm{e}}^{-\lambda z}}$
as
$n \rightarrow \infty$
, for the approximating PE distribution we choose

as a standardisation function in (3.3), giving rise to the standardised PE Stein equation

for
$W \sim \mathrm{PE}(\theta, \lambda)$
. Again we can bound the solution of this Stein equation as follows.
Lemma 2. For
$W \sim \mathrm{PE}(\theta, \lambda)$
, the solution g(w) of the Stein equation (4.24), bounded differentiable h such that
$\|h\| \le 1$
and
$\|h'\| \le 1$
with
$\tilde{h} (x) = h(x) - \mathbb{E} h(W)$
, and
$ c(w) = \frac{\lambda \theta}{n}\,{\mathrm{e}}^{-\lambda {w} - \theta {\mathrm{e}}^{-\lambda {w}}}$
, we have





and

Proof. In (4.24),
$cg = f$
is the solution of the Stein equation (4.4), so we use the bounds for f to bound g. The bound (4.6) in Lemma 1 immediately gives (4.25). Also
$c'(w) = c(w) \lambda (\theta {\mathrm{e}}^{-\lambda w} - 1) $
so that
$c'(w)g(w) = \lambda (\theta {\mathrm{e}}^{-\lambda w} - 1) f(w)$
; using (4.7) we get (4.26). Combining (4.25), (4.26), and the triangle inequality gives (4.27).
Since
$(cg)' = cg' + c'g$
, rearranging and using (4.9) and (4.7) with the triangle inequality gives (4.28). For
$(cg)'' - c''g - 2c'g' = cg''$
, using (4.28) we have

For
$c''(w)g(w) = \lambda^2(\theta {\mathrm{e}}^{-\lambda w} -1)^2 c(w)g(w) - \lambda^2\theta {\mathrm{e}}^{-\lambda w}c(w)g(w)$
, using the triangle inequality, (4.26), and (4.25) yields

These two results along with (4.10) give (4.29). Now, for any
$0 < \rho < 1$
,

for
$\rho = -1$
we have
$\frac{c(x)}{ c\left( x - \frac{1}{n}\right)} = {\mathrm{e}}^{- \frac{\lambda}{n} + \theta {\mathrm{e}}^{-\lambda x} ({\mathrm{e}}^{ \frac{\lambda}{n}} -1) } \le {\mathrm{e}}^{ \theta ({\mathrm{e}}^{\frac{\lambda }{n}} -1)}$
, yielding (4.30).
Theorem 2. Let
$W_{G,n} \sim \mathrm{PG}(\theta, p_n) $
with
$p_n = \lambda/n$
where
$ 0< \lambda < n$
, and let
$W \sim \mathrm{PE}(\theta,\lambda)$
. Then for the scaled PG random variable
$Z_n = W_{G,n}/n$
and any bounded function h with bounded first derivative, we have

where

Remark 5.
-
(i) For fixed
$\lambda $ and
$\theta$ , the bound (4.31) is of
$O(n^{-1})$ . As
$n \rightarrow \infty$ , for
$\lambda = \lambda(n)$ and
$\theta = \theta(n)$ the bound decreases to 0 as long as
${\lambda(n) \theta(n)}/{n} \rightarrow 0$ and
${\lambda(n)}/{n} \rightarrow 0$ .
-
(ii) Equation (4.31) can be translated into a bound in the bounded Wasserstein distance using
$\mathrm{Lip}_b(1)$ as the class of test functions.
Proof. We employ Proposition 2. First we note that
$ c(w) \rho(w) = c'(w)$
and that
$ {\tilde \rho}_n (w) d\bigl( w - \frac1n \bigr)= d \bigl( w - \frac1n \bigr) - d \bigl( w - \frac2n \bigr)$
. Thus, for (3.6), with
$h \in \textrm{Lip}_b(1)$
we have



To bound the term (4.32), for some
$0 < \rho < 1$
we write

Then (4.29) and (4.30) give the bound

To bound (4.33), we let
$\tau(z) = {\mathrm{e}}^{\lambda z + \theta {\mathrm{e}}^{-\lambda z}}$
, so that
$\tau^{-1} g = \frac{n}{\theta\lambda}cg$
can be bounded as in Lemma 2, and write

where
$l(Z_n) = \tau(Z_n)\bigl(d(Z_n) - c(Z_n) -\frac{2}{n} c'(Z_n)\bigr)$
. We show in Appendix that

By Taylor expansion, for some
$0 < \epsilon < 1$
we have

Using (4.36) and (4.37) along with (4.28), (4.29) and (4.30) yields

Finally, to bound (4.34), we show in Appendix that

Note that

and that, using (4.27) and (4.30),

Combining this with (4.39) and (4.40), we bound (4.34) as

To bound
$\mathbb{E} Z_n $
, we argue as for (4.13);
$W_{G,n} = \max \{ T_1, \ldots, T_M\} \le \sum_{i=1}^{M} T_i$
and so
$ \mathbb{E} {W_{G,n}} \le \mathbb{E} \sum_{i=1}^{M} T_i = \frac{n\theta}{\lambda(1-{\mathrm{e}}^{-\theta})}$
, giving that
$ \mathbb{E} Z_n = \frac{1}{n} \mathbb{E} W_{G,n}\le \frac{\theta}{\lambda(1-{\mathrm{e}}^{-\theta})}.$
Adding (4.35), (4.38), and (4.41) and simplifying gives (4.31).
The next result instead uses Proposition 3 to bound the distance between a PG and a PE distribution.
Corollary 4. Let
$W = W_1(M,{\boldsymbol{E}}) \sim \mathrm{PE}(\theta, \lambda)$
and
${W_{G,n}} = W_1(M,{\boldsymbol{G}}) \sim \mathrm{PG}(\theta, \frac{\lambda}{n})$
with
$G_i \sim \mathrm{Geometric}(\lambda/n)$
be two CCR random variables, and let
$Z_n = \frac{{W_{G,n}}}{n}$
. Then, for all bounded Lipschitz functions
$h\,:\, \mathbb{R}_{>0}^{+} \rightarrow \mathbb{R}$
,

Proof. Using (3.9) in Proposition 3 gives

With the coupling
$\tilde{G} = \lceil n{E}\rceil \sim \mathrm{Geometric}(1-{\mathrm{e}}^{-\lambda/n})$
,
$\tilde{G}$
is stochastically greater than or equal to G and
$\mathbb{E} |n{E} - G| \le \mathbb{E} |n{E}-\tilde{G}| + \mathbb{E} (\tilde{G} - G)$
. Moreover,

and
$\mathbb{E}(\tilde{G} - G) = ({1-{\mathrm{e}}^{-\frac{\lambda}{n}}})^{-1} - {n}/{\lambda}.$
Hence the bound (4.42) follows.
Remark 6. To compare (4.31) and (4.42), first note that as no fixed bound on
$\| h\| $
is assumed, through rescaling h we can make
$\|\tilde{h}\|$
as small as desired. Therefore, for a comparison we focus on the terms involving
$\|h'\|$
. For continuous
$\|h'\|$
, (4.31) outperforms the bound (4.42) for large n, since
${\lim}_{n \rightarrow \infty} {\mathrm{e}}^{\frac{\lambda}{n}} \bigl(1+\frac{{\mathrm{e}}^{ \frac{\theta\lambda}{n {\mathrm{e}}}}}{n}B_1(\theta, \lambda)\bigr) = 1$
and
${\lim}_{n \rightarrow \infty}\frac{\theta}{1 - {\mathrm{e}}^{-\theta}} 2\Bigl(\frac{n - {\mathrm{e}}^{\frac{\lambda}{n}}(n-\lambda)}{\lambda({\mathrm{e}}^{\frac{\lambda}{n}}-1)}\Bigr) = \frac{\theta}{(1-{\mathrm{e}}^{-\theta})}.$
In particular, for any
$n \ge n_0$
the right-hand side of (4.42) is larger than the coefficient of
$\|h'\|$
in the bound (4.31), where
$n_0 =n_0(\theta, \lambda)$
solves
$ {\mathrm{e}}^{\frac{\lambda}{n}} \le \frac{2n\theta}{\lambda(1 - {\mathrm{e}}^{-\theta})({\mathrm{e}}^{\frac{\lambda}{n}}-1)} \Bigl(\frac{n - {\mathrm{e}}^{\frac{\lambda}{n}}(n-\lambda)}{n+{\mathrm{e}}^{ \frac{\theta\lambda}{n {\mathrm{e}}}}B_1(\theta, \lambda)}\Bigr).$
Table 1 shows such values of
$n_0$
.
5. Application to the exponential–geometric distribution
The exponential–geometric (EG) distribution
$\mathrm{EG}(\lambda, p)$
introduced in [Reference Adamidis and Loukas3] is the distribution of the minimum of N i.i.d. random variables
$\textbf{E}= (E_1, E_2,\ldots)$
with
$E_i \sim \mathrm{Exp}(\lambda)$
for
$i \in \mathbb{N}$
, where N is a Geometric(p) random variable and independent of all the
$E_i$
. We set
$q=1-p$
. An EG random variable
$W = W_{-1}(N, {\textbf{E}})$
is thus a CCR random variable. As
$G_N(u) = \frac{up}{(1-qu)}$
,
$G'_N(u) = \frac{p}{(1-qu)^2}$
, and
$G''_N(u) = \frac{2pq}{(1-qu)^3}$
, the PDF of
$W \sim \text{EG}(\lambda, p)$
is

with score function

This gives the score Stein operator

and the Stein equation

The next lemma bounds the solution (2.5) of this EG Stein equation.
Lemma 3. For any bounded test function
$h\,:\, \mathbb{R}^+_{>0} \rightarrow \mathbb{R}$
, let
$\tilde{h}(w) = h(w)-\mathbb{E} h(W)$
for
$W \sim \mathrm{EG}(\lambda,p)$
. Then the solution
$g=g_h$
of the EG Stein equation (5.4) satisfies


Proof. For the solution (2.5) of the EG Stein equation (5.4),

The last inequality follows since with
$0 \le {\mathrm{e}}^{-\lambda w} \le 1$
and
$0 \le q \le 1$
, we have
$1 -q \le 1 - q{\mathrm{e}}^{-\lambda w} \le 1$
and
$1 \le 1 + q{\mathrm{e}}^{-\lambda w} \le 1+q$
. For a bound on
$| g'(w)| $
, we use the Stein equation (5.4) and the triangle inequality to obtain

Simplifying the last inequality gives (5.6).
5.1. The minimum of a geometric number of i.i.d. random variables
Let
$N\in \mathbb{N}$
be a random variable which is independent of
${\textbf{X}} = (X_1, X_2, \ldots)$
, a sequence of i.i.d. random variables, and let
$W=W_{-1}(N, {\textbf{X}})$
have PDF of the form (2.12) for
$\alpha = -1$
. In this subsection we approximate its distribution by an EG distribution. First we apply Proposition 1.
Corollary 5. Assume that the
$X_i$
have CDF
$F_X$
, differentiable PDF
$p_X$
, and score function
$\rho_X$
, and let
$N \sim \mathrm{Geometric}(p)$
. Then

Proof. Substituting
$\frac{G_N''(1-F(w))}{G_N'(1-F(w))} = \frac{2(1-p)f(w)}{1-(1-p)(1-F(w))}$
and the score function of the exponential distribution into (3.1), taking h an indicator function so that
$\| \tilde{h} \| \le 2 \|h\| \le 2$
, and using (5.5) gives the bound.
Next, we instead use Proposition 3; the corollary follows immediately from (3.9).
Corollary 6. For
$W = W_{-1}(N, {\boldsymbol{X}})$
with
$N \sim \mathrm{Geometric}(p)$
,

5.2. Approximating the extended exponential–geometric distribution
Motivated by population heterogeneity, Adamidis et al. [Reference Adamidis, Dimitrakopoulou and Loukas2] developed the extended exponential–geometric (EEG) distribution by assuming that individual units in a population have increasing failure rates that depend on a random scale parameter A. Their lifetimes
$X/A$
are modelled by a modified extreme value distribution; if
$A = \alpha$
, then the PDF is
$f(x|\alpha;\beta) = \alpha \beta {\mathrm{e}}^{\beta x + \alpha(1-{\mathrm{e}}^{\beta x})}$
, where
$x, \alpha, \beta \in \mathbb{R}^+_{>0}$
; it is assumed that A has an
$\mathrm{Exp}(\gamma)$
distribution. Then the unconditional lifetime distribution X has PDF

with
$\beta, \gamma \in \mathbb{R}^+_{>0}$
; we use the notation
$X \sim \mathrm{EEG}(\beta, \gamma)$
. Its score function is

This distribution is not in the CCR family. However, the EG distribution
$\mathrm{EG}(\beta, \gamma)$
is a special case when
$\gamma \in (0,1)$
. To assess the total variation distance between the distributions, we use the general approach developed in Section 2.
Theorem 3. For
$X \sim \mathrm{EEG}(\beta, \gamma)$
and
$W \sim \mathrm{EG}(\lambda, p)$
, with
$\beta, \gamma, \lambda \in \mathbb{R}^+_{>0}$
and
$p \in (0,1)$
, and for a bounded test function h, we have

Proof. Using the score functions (5.2) and (5.8) in (2.6) yields

To bound the expectation in the above equation, we let

Case 1:
$0 < \gamma <1$
. In this case we decompose
$|R(x)|$
as



and bound these terms separately. For (5.11) we use that
$ \frac{1 + \alpha q}{1 - \alpha q} \le \frac{1+q}{1-q}$
when
$\alpha \in [0,1]$
and
$q \in (0,1).$
Hence

For (5.12), Taylor expansion about
$1-\gamma$
of the function
$f(q) = \frac{1+aq }{1-aq} $
with
$a = {\mathrm{e}}^{-\lambda x} \in (0,1)$
and
$q=1-p$
gives
$f'(q) = \frac{a}{1 - aq} + \frac{a(1+aq)}{(1-aq)^2} = \frac{2a}{(1-aq)^2}>0$
. Moreover, for
$0< a< 1$
we have
$ \frac{2a}{(1-aq)^2} \le \frac{2}{(1-q)^2}$
, and hence for
$\theta \in (0,1)$
we have
$0< f'(\theta (1-p) + (1- \theta) (1-\gamma)) \le \frac{2}{(1- \max(1-\gamma, 1-p))^2} = \frac{2}{(\min (\gamma, p))^2}$
. Therefore,

For (5.13), first-order Taylor expansion of the function
$f(\beta) = \frac{1+(1-\gamma) {\mathrm{e}}^{-\beta x}}{1-(1-\gamma) {\mathrm{e}}^{-\beta x}} $
gives

for some
$\theta \in (0,1).$
Now the function
$f'(\beta) = \frac{{2 x (1-\gamma)} {\mathrm{e}}^{-\beta x}}{(1 - (1-\gamma) {\mathrm{e}}^{-\beta x})^2}$
is positive; moreover,
$x {\mathrm{e}}^{-\beta x} \le ({\mathrm{e}} \beta)^{-1}$
for
$x \ge 0$
, so that
$f'(\beta) \le \frac{1}{\beta {\mathrm{e}}} \frac{2(1-\gamma)}{ (1 - (1-\gamma) {\mathrm{e}}^{-\beta x})^2} \le \frac{2(1-\gamma)}{{\mathrm{e}} \beta \gamma^2}.$
Hence we can bound

This gives as overall bound

Substituting this bound into (5.10) gives the final bound for the case where
$0 < \gamma <1$
.
Case 2: For
$\gamma \ge 1$
, we have

Substituting this result into (5.10) gives the bound for
$\gamma \ge 1$
in (5.9).
6. The maximum waiting time of sequence patterns in Bernoulli trials
This application is motivated by the results in Section 4 of [Reference Peköz26], which gives bounds on the distribution of the number of trials preceding the first appearance of a pattern in dependent Bernoulli trials. Here we are interested in the distribution of the maximum of such random variables.
Consider M independent parallel systems
$(X_1^{(i)},X_2^{(i)},\ldots)$
, for
$i=1,2,\ldots ,M,$
of possibly dependent Bernoulli(a) trials, which are jointly independent of
$M \in \mathbb{N} $
. For each sequence i, let
$I_j^{(i)}$
be the indicator function that a fixed non-overlapping binary sequence pattern of length k occurs starting at
$X_j^{(i)}$
; the pattern may be specific to sequence i. Let
$V_i =\min\{j\,:\, I_j^{(i)}=1\}$
denote the first occurrence of the pattern of interest in the ith system; assume that
$\mathbb{P}(V_i = 1) = p$
for all
$i \in \mathbb{N} $
. We denote the maximum waiting time for the occurrence of a corresponding sequence pattern in all M parallel systems by
$W = W_1(M,{\textbf{V}})$
where
${\textbf{V}} = (V_1, V_2 \ldots)$
.
Example 2. If the Bernoulli trials are independent and the pattern of interest is a run of ones of length k, starting with a zero and followed by k ones, then
$\mathbb{P}(V_i = 1) = p = (1-a)a^k$
, as given in Corollary 2 of [Reference Peköz26]. Intuitively, the waiting time for the occurrence of this pattern in an individual sequence is approximately geometric with parameter p. For an approximation by a PE distribution we are particularly interested in instances where we can write the probability
$\mathbb{P}(V_i = 1) = p$
as
$p = \lambda/n$
; here this leads to scaling the run length k as
$k \sim a \log n$
.
Corollary 7. In the above setting, let
$U_n = W/n$
with
$p =\frac{\lambda}{n}$
,
$W_1(M', {\textbf{E}}) \sim \mathrm{PE}(\theta, \lambda)$
with M′ being a zero-truncated Poisson random variable, and
$E_i \sim \mathrm{Exp}(\lambda)$
. Then we have

where
$B_1(\theta, \lambda)$
,
$B_2(\theta, \lambda)$
, and
$B_3(\theta, \lambda)$
are as given in Theorem 2.
Example 3. In the above example of independent Bernoulli trials and the pattern of interest being a run of ones of length
$k \sim a \log n$
, the bound in Corollary 7 is of order
$(\log n)^{-1}.$
Proof. We couple
$U_n = \frac1n \max\{V_1, \cdots, V_M\}$
and
$Z_n = \frac1n \max\{T_1, \cdots, T_M\}$
, where
$T_i \sim \mathrm{Geometric}(\frac{\lambda}{n})$
for
$i = 1, \ldots, M$
, by using the same random variable M. Then

Taking a union bound, we have

where in the last step we have used Corollary 1 from [Reference Peköz26]. With (2.2),

Now


The term (6.4) is bounded in (4.31). To bound (6.3), (3.12) gives

The expectation of the maximum of m Geometric(p) variables satisfies

as given in [Reference Eisenberg11, p. 136]. Hence

Combining this result with (4.31) and (6.2) in (6.1), we obtain the assertion.
Remark 8. For
$M=M'$
, the bound in Corollary 7 reduces to

The assumption of i.i.d. sequences can be weakened to that of a Markov chain by applying Theorem 5.5 from [Reference Reinert, Schbath and Waterman30] with
$M=M'$
. This theorem gives a Poisson process approximation for the number of ‘declumped’ counts of each pattern, which in turn yields that the waiting time for each pattern to occur is approximately exponentially distributed. The theorem also gives an explicit bound on the approximation, but this result requires considerable notation and hence we do not pursue it here.
Acknowledgements
We thank Christina Goldschmidt, David Steinsaltz, and Tadas Temcinas for helpful discussions. We would also like to thank the editor and the anonymous reviewers for their suggestions, which have led to overall improvements of the paper.
Funding information
AF is supported by the Commonwealth Scholarship Commission, United Kingdom, and in part by EPSRC grant EP/X002195/1. GR is supported in part by EPSRC grants EP/T018445/1, EP/R018472/1, EP/X002195/1, and EP/Y028872/1. For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.
Appendix Further proofs
Proof of (4.13)
For
$X \sim \text{GPE}(\theta, \lambda, \beta)$
with
$\lambda,\theta >0$
and
$\beta \ge 1$
, we have
$(1-{\mathrm{e}}^{-\theta + \theta {\mathrm{e}}^{-\lambda x}})^{\beta-1} \le 1$
, and hence

Proof of the inequality (4.36)
First note that for
$n \in \mathbb{N}$
and
$|x| < n$
,

and
$x {\mathrm{e}}^{-x} \le {\mathrm{e}}^{-1}$
for
$x > 0$
. Hence, for
$0 < \lambda < n$
and
$z > 0$
,


Also, we can write
${\mathrm{e}}^{-\theta \left(1-\frac{\lambda}{n}\right)^{nz}}= {\mathrm{e}}^{\theta \left[ {\mathrm{e}}^{-\lambda z}- \left(1-\frac{\lambda}{n}\right)^{nz}\right]} {\mathrm{e}}^{-\theta {\mathrm{e}}^{-\lambda z}}$
, and so

Now, to bound
$l = \tau (d - c - \frac1n c')$
, we have

and

Thus,


Here

To bound (A.5), we use (A.3) and series expansion, recalling that
$0 < \lambda < n$
, to get


To bound (A.6), we first bound

Here we have used Property 4 from [Reference Salas31], that for all
$x>0$
we have
$ \left(1+\frac{x}{n}\right)^n - 1 \le x{\mathrm{e}}^{x} $
and
$ {\mathrm{e}}^x - 1 \le x {\mathrm{e}}^x$
, along with (A.2) and (A.4). Thus

Combining (A.7), (A.9), and (A.11), we get

Replacing z by
$Z_n$
gives (4.36).
Bounding (4.39)
For
$z>0$
,

where, with
$a = \theta \left(1-\frac{\lambda}{n} \right)^{nz} $
,

with
$R_1$
being the remainder term from the Taylor expansion for k(b) about 1,

for some
$1 < \xi \le b = \frac{n}{n-\lambda}$
. To bound
$R_1$
, we use that
$1<\xi \le \frac{n}{n-\lambda}$
so that
${\mathrm{e}}^{-a \xi} \le {\mathrm{e}}^{-a}$
and
${\mathrm{e}}^{-a \xi^2} \le {\mathrm{e}}^{-a};$
also
${\mathrm{e}}^{-\frac{a}{\xi}} \le {\mathrm{e}}^{-a \frac{n-\lambda}{n}}.$
Hence, with the crude bounds
$a \le \theta$
and
${\mathrm{e}}^{-a} \le 1,$

Substituting the expressions for a and b gives

where we have used
${\mathrm{e}}^{-\theta \left(1-\frac{\lambda}{n}\right)^{nz}} = {\mathrm{e}}^{\theta \left[ {\mathrm{e}}^{-\lambda z}- \left(1-\frac{\lambda}{n}\right)^{nz}\right]} {\mathrm{e}}^{-\theta {\mathrm{e}}^{-\lambda z}}.$
Hence, with (A.4),

Thus,

with

Using (A.10), we obtain the bound

Combining (A.12), (A.13), and (A.14) and simplifying then gives the bound

Multiplying the above inequality by
$n\left|g\left(z-\frac1n\right)\right|$
gives (4.39).