To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Algebraic Statistics techniques are used to define a new class of probability models which encode the notion of category distinguishability and refine the existing approaches. We study such models both from a geometric and statistical point of view. In particular, we provide an effective characterisation of the sufficient statistic.
Introduction
In this work we focus on a problem coming from rater agreement studies. We consider two independent raters. They classify n subjects using the same ordinal scale with I categories. The data are organised in a square contingency table which summarises the classifications. The cell (i, j) contains the number of items classified i by the first observer and j by the second observer.
Many applications deal with ordinal scales whose categories are partly subjective. In most cases, the ordinal scale is the discretisation of an underlying quantity continuous in nature. Classical examples in the field of medical applications are the classification of a disease in different grades through the reading of diagnostic images or the classification of the grade of a psychiatric disease based on the observation of some behavioural traits of the patients. An example of such problem is presented in detail in (Garrett-Mayer et al. 2004) and it is based on data about pancreatic neoplasia. Other relevant applications are, for instance, in lexical investigations, see e.g. (Bruce and Wiebe 1998) and (Bruce and Wiebe 1999). In their papers, category distinguishability is used as a tool to study when the definitions of the different meanings of a word in a dictionary can be considered as unambiguous. Table 6.1 presents a numerical example from (Agresti 1988).
We consider Markov bases arising from regular fractional factorial designs with three-level factors. They are used in a Markov chain Monte Carlo procedure to estimate p-values for various conditional tests. For designed experiments with a single observation for each run, we formulate a generalised linear model and consider a sample space with the same values of that sufficient statistic for the parameters under the null model as for the observed data. Each model is characterised by a covariate matrix, which is constructed from the main and the interaction effects. We investigate fractional factorial designs with 3p−q runs and underline a correspondence with models for 3p−q contingency tables.
Introduction
In the past decade, a new application of computational algebraic techniques to statistics has been developed rapidly. On one hand, (Diaconis and Sturmfels 1998) introduced the notion of Markov basis and presented a procedure for sampling from discrete conditional distributions by constructing a connected, aperiodic and reversible Markov chain on a given sample space. Since then, many works have been published on the topic of the Markov basis by both algebraists and statisticians. Contributions of the present authors on Markov bases can be found in (Aoki et al. 2008, Aoki and Takemura 2003, Aoki and Takemura 2005, Aoki and Takemura 2006, Aoki and Takemura 2008a, Aoki and Takemura 2008b Aoki et al. 2008, Hara et al. 2009, Takemura and Aoki 2004) and (Takemura and Aoki 2005).
Goodness-of-fit tests based on chi-square approximations are commonly used in the analysis of contingency tables. Results from algebraic statistics combined with MCMC methods provide alternatives to the chi-square approximation. However, within a model selection procedure usually a large number of models is considered and extensive simulations would be necessary. We show how the simulation effort can be reduced by an appropriate analysis of the involved Gröbner bases.
Introduction
Categorical data occur in many different areas of statistical applications. The analysis usually concentrates on the detection of the dependence structure between the involved random variables. Log-linear models are adopted to describe such association patterns, see (Bishop et al. 1995, Agresti 2002) and model selection methods are used to find the model from this class, which fits the data best in a given sense. Often, goodness-of-fit tests for log-linear models are applied, which involve chi-square approximations for the distribution of the test statistic. If the table is sparse such an approximation might fail. By combining methods from computational commutative algebra and from statistics, (Diaconis and Sturmfels 1998) provide the background for alternative tests. They use the MCMC approach to get a sample from a conditional distribution of a discrete exponential family with given sufficient statistic. In particular Gröbner bases are used for the construction of the Markov chain. This approach has been applied to a number of tests for the analysis of contingency tables (Rapallo 2003, Rapallo 2005, Krampe and Kuhnt 2007). Such tests have turned out to be a valuable addition to traditional exact and asymptotic tests.
A basic application of algebraic statistics to design and analysis of experiments considers a design as a zero-dimensional variety and identifies it with the ideal of the variety. Then, a subset of a standard basis of the design ideal is used as support for identifiable regression models. Estimation of the model parameters is performed by standard least square techniques. We consider this identifiability problem in the case where more than one measurement is taken at a design point.
Introduction
The application of algebraic geometry to design and analysis of experiments started with (Pistone and Wynn 1996). There a design D, giving settings for experiments, is seen as a finite set of distinct points in ℝk. This is interpreted as the zero set of a system of polynomial equations, which in turn are seen as the generator set of a polynomial ideal (see Chapter 1). The design D is uniquely identified with this ideal called the design ideal and indicated with Ideal (D). Operations over designs find a correspondence in operations over ideals, e.g. union of designs corresponds to intersection of ideals; problems of confounding are formulated in algebraic terms and computer algebra software is an aid in finding their solutions; and a large class of linear regression models identifiable by D is given by vector space bases of a ring, called the quotient ring modulo Ideal (D) and indicated as R/ Ideal(D). This was the beginning of a successful stream of research which, together with the application of algebraic geometry to contingency table analysis covered in the first part of this volume, went under the heading of Algebraic Statistics (Pistone et al. 2001).
It might seem natural that where a statistical model can be defined in algebraic terms it would be useful to use the full power of modern algebra to help with the description of the model and the associated statistical analysis. Until the mid 1990s this had been carried out, but only in some specialised areas. Examples are the use of group theory in experimental design and group invariant testing, and the use of vector space theory and the algebra of quadratic forms in fixed and random effect linear models. The newer area which has been given the name ‘algebraic statistics’ is concerned with statistical models that can be described, in some way, via polynomials. Of course, polynomials were there from the beginning of the field of statistics in polynomial regression models and in multiplicative models derived from independence models for contingency tables, or to use a more modern terminology, models for categorical data. Indeed these two examples form the bedrock of the new field. (Diaconis and Sturmfels 1998) and (Pistone and Wynn 1996) are basic references.
Innovations have entered from the use of the apparatus of polynomial rings: algebraic varieties, ideals, elimination, quotient operations and so on. See Appendix 1.7 of this chapter for useful definitions. The growth of algebraic statistics has coincided with the rapid developments of fast symbolic algebra packages such as CoCoA, Singular, 4ti2 and Macaulay 2.
If the first theme of this volume, algebraic statistics, relies upon computational commutative algebra, the other one is pinned upon differential geometry.
We present algebraic methods for studying connectivity of Markov moves with margin positivity. The purpose is to develop Markov sampling methods for exact conditional inference in statistical models where a Markov basis is hard to compute. In some cases positive margins are shown to allow a set of Markov connecting moves that are much simpler than the full Markov basis.
Introduction
Advances in algebra have impacted in a fundamental way the study of exponential families of probability distributions. In the 1990s, computational methods of commutative algebra were brought into statistics to solve both classical and new problems in the framework of exponential family models. In some cases, the computations are of an algebraic nature or could be made algebraic with some work, as in the cumulant methods of (Pistone and Wynn 1999). In other cases, the computations are ultimately Monte Carlo averages and the algebra plays a secondary role in designing algorithms. This is the nature of the work of (Diaconis and Sturmfels 1998). Commutative algebra is also used in statistics for experimental design (Pistone et al. 2001) where exponential families are not the focus.
(Diaconis and Sturmfels 1998) showed how computing a generating set for a toric ideal is fundamental to irreducibility of a Markov chain on a set of constrained tables. This theory gives a method for obtaining Markov chain moves, such as the genotype sampling method of (Guo and Thompson 1992), extensions to graphical models (Geiger et al. 2006) and beyond (Hosten and Sullivant 2004).
The aim of information geometry is to introduce a suitable geometrical structure on families of probability distributions or quantum states. For parametrised statistical models, such structure is based on two fundamental notions: the Fisher information and the exponential family with its dual mixed parametrisation, see for example (Amari 1985, Amari and Nagaoka 2000).
For the non-parametric situation, the solution was given by Pistone and Sempi (Pistone and Sempi 1995, Pistone and Rogantin 1999), who introduced a Banach manifold structure on the set P of probability distributions, equivalent to a given one. For each µ ∈ ρ, the authors considered the non-parametric exponential family at µ. As it turned out, this provides a C∞-atlas on ρ, with the exponential Orlicz spaces Lφ(µ) as the underlying Banach spaces, here φ is the Young function of the form φ(x) = cosh(x) — 1.
The present contribution deals with the case of quantum states: we want to introduce a similar manifold structure on the set of faithful normal states of a von Neumann algebra M. Since there is no suitable definition of a non-commutative Orlicz space with respect to a state φ, it is not clear how to choose the Banach space for the manifold. Of course, there is a natural Banach space structure, inherited from the predualM. But, as it was already pointed out in (Streater 2004), this structure is not suitable to define the geometry of states: for example, any neighbourhood of a state φ contains states such that the relative entropy with respect to φ is infinite.
Contingency tables represent the joint distribution of categorical variables. In this chapter we use modern algebraic geometry to update the geometric representation of 2 × 2 contingency tables first explored in (Fienberg 1968) and (Fienberg and Gilbert 1970). Then we use this geometry for a series of new ends including various characterizations of the joint distribution in terms of combinations of margins, conditionals, and odds ratios. We also consider incomplete characterisations of the joint distribution and the link to latent class models and to the phenomenon known as Simpson's paradox. Many of the ideas explored here generalise rather naturally to I × J and higher-way tables. We end with a brief discussion of generalisations and open problems.
Introduction
(Pearson 1956) in his presidential address to the Royal Statistical Society was one of the earliest statistical authors towrite explicitly about the role of geometric thinking for the theory of statistics, although many authors previously, such as (Edgeworth 1914) and (Fisher 1921), had relied heuristically upon geometric characterisations.
For contingency tables, beginning with (Fienberg 1968) and (Fienberg and Gilbert 1970), several authors have exploited the geometric representation of contingency table models, in terms of quantities such as margins and odds ratios, both for the proof of statistical results and to gain deeper understanding of models used for contingency table representation. For example, see (Fienberg 1970) for the convergence of iterative proportional fitting procedure, (Diaconis 1977) for the geometric representation of exchangeability, and (Kenett 1983) for uses in exploratory data analysis.
Information Geometry and Algebraic Statistics are brought together in this volume to suggest that the interaction between them is possible and auspicious.
To meet this aim, we couple expository material with more advanced research topics sometimes within the same chapter, cross-reference the various chapters, and include many examples both in the printed volume and in the on-line supplement, held at the Cambridge University Press web site at www.cambridge.org/9780521896191. The on-line part includes proofs that are instructive but long or repetitive, computer codes and detailed development of special cases.
Chapter 1 gives a brief introduction to both Algebraic Statistics and Information Geometry based on the simplest possible examples and on selected topics that, to the editors, seem most promising for the interlacing between them. Then, the volume splits naturally in two lines. Part I, on contingency tables, and Part II, on designed experiments, are authored by researchers active mainly within Algebraic Statistics, while Part III includes chapters on both classical and quantum Information Geometry. This material comes together in Part IV which consists of only one chapter by Giovanni Pistone, to whom the volume is dedicated, and provides examples of the interplay between Information Geometry and Algebraic Statistics.
Statistical models with latent structure have a history going back to the 1950s and have seen widespread use in the social sciences and, more recently, in computational biology and in machine learning. Here we study the basic latent class model proposed originally by the sociologist Paul F. Lazarfeld for categorical variables, and we explain its geometric structure. We draw parallels between the statistical and geometric properties of latent class models and we illustrate geometrically the causes of many problems associated with maximum likelihood estimation and related statistical inference. In particular, we focus on issues of non-identifiability and determination of the model dimension, of maximisation of the likelihood function and on the effect of symmetric data. We illustrate these phenomena with a variety of synthetic and real-life tables, of different dimension and complexity. Much of the motivation for this work stems from the ‘100 Swiss Francs’ problem, which we introduce and describe in detail.
Introduction
Latent class (LC) or latent structure analysis models were introduced in the 1950s in the social science literature to model the distribution of dichotomous attributes based on a survey sample from a populations of individuals organised into distinct homogeneous classes on the basis of an unobservable attitudinal feature. See (Anderson 1954, Gibson 1955, Madansky 1960) and, in particular, (Henry and Lazarfeld 1968). These models were later generalised in (Goodman 1974, Haberman 1974, Clogg and Goodman 1984) as models for the joint marginal distribution of a set of manifest categorical variables, assumed to be conditionally independent given an unobservable or latent categorical variable, building upon the then recently developed literature on log-linear models for contingency tables.