To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This paper puts forward a theoretically derived measure of firm-level political influence defined over a sample of firms from a diverse set of countries, permitting new inferences into state-business relations. We derive this measure from original surveys of 27,613 firms in 41 countries, which include information on several interactions with political actors. Using a Bayesian item response theory measurement model that incorporates non-ignorable missing data, we estimate influence scores that incorporate survey data on diverse mechanisms by which firms attempt to obtain influence. From the measurement model, we learn that membership in a business association contains the most positive information about a firm’s influence, while bribes, state ownership, firm size, and a reliance on collective lobbying tend to be substitutes for influence in equilibrium. Empirically, we are able to show for the first time how such influence is distributed across different types of political regimes using a measurement model, leading to intriguing hypotheses about how the costs and benefits of political activity structure corporate influence-seeking.
This chapter discusses how to build probabilistic models that include both discrete and continuous variables. Mathematically, this is achieved by defining them as random variables within the same probability space. In practice, the variables are manipulated using their marginal and conditional distributions. We define the conditional pmf of a discrete random variable given a continuous variable, and the conditional probability density of a continuous random variable given a discrete variable. We use these objects to build mixture models and apply them to model height in a population. Next, we describe Gaussian discriminant analysis, a classification method based on mixture models with Gaussian conditional distributions, and apply it to diagnose Alzheimer's disease. Then, we explain how to perform clustering using Gaussian mixture models and leverage the approach to cluster NBA players. Finally, we introduce the framework of Bayesian statistics which enables us to explicitly encode our uncertainty about model parameters, and use it to analyze poll data from the 2020 United States presidential election.
Mass polarization is one of the defining features of politics in the twenty-first century, but efforts to understand its causes and effects are often hindered by empirical challenges related to measurement and data availability. To address these challenges and provide a common standard of analysis for researchers, this Element presents the Polarization in Comparative Attitudes Project (PolarCAP). PolarCAP clearly defines polarization as a property of group relations and uses a Bayesian measurement model to estimate smooth panels of ideological and affective polarization across ninety-two countries and forty-nine years. The author uses these data to provide a descriptive account of mass polarization across time and space. They further show how PolarCAP facilitates substantive inference by applying it to three sets of variables often hypothesized as causes or consequences of polarization: institutional design, economic crisis, and democracy. Open-source software makes PolarCAP easily accessible to scholars and practitioners.
The polyserial and point polyserial correlations are discussed as generalizations of the biserial and point biserial correlations. The relationship between the polyserial and point polyserial correlation is derived. The maximum likelihood estimator of the polyserial correlation is compared with a two-step estimator and with a computationally convenient ad hoc estimator. All three estimators perform reasonably well in a Monte Carlo simulation. Some practical applications of the polyserial correlation are described.
The Maximum-likelihood estimator dominates the estimation of general structural equation models. Noniterative, equation-by-equation estimators for factor analysis have received some attention, but little has been done on such estimators for latent variable equations. I propose an alternative 2SLS estimator of the parameters in LISREL type models and contrast it with the existing ones. The new 2SLS estimator allows observed and latent variables to originate from nonnormal distributions, is consistent, has a known asymptotic covariance matrix, and is estimable with standard statistical software. Diagnostics for evaluating instrumental variables are described. An empirical example illustrates the estimator.
Markov chains are probabilistic models for sequences of categorical events, with applications throughout scientific psychology. This paper provides a method for analyzing data consisting of event sequences and covariate observations. It is assumed that each sequence is a Markov process characterized by a distinct transition probability matrix. The objective is to use the covariate data to explain differences between individuals in the transition probability matrices characterizing their sequential data. The elements of the transition probability matrices are written as functions of a vector of latent variables, with variation in the latent variables explained through a multivariate regression on the covariates. The regression is estimated using the EM algorithm, and requires the numerical calculation of a multivariate integral. An example using simulated cognitive developmental data is presented, which shows that the estimation of individual variation in the parameters of a probability model may have substantial theoretical importance, even when individual differences are not the focus of the investigator's concerns.
In this paper, linear structural equation models with latent variables are considered. It is shown how many common models arise from incomplete observation of a relatively simple system. Subclasses of models with conditional independence interpretations are also discussed. Using an incomplete data point of view, the relationships between the incomplete and complete data likelihoods, assuming normality, are highlighted. For computing maximum likelihood estimates, the EM algorithm and alternatives are surveyed. For the alternative algorithms, simplified expressions for computing function values and derivatives are given. Likelihood ratio tests based on complete and incomplete data are related, and an example on using their relationship to improve the fit of a model is given.
Methodological development of the model-implied instrumental variable (MIIV) estimation framework has proved fruitful over the last three decades. Major milestones include Bollen’s (Psychometrika 61(1):109–121, 1996) original development of the MIIV estimator and its robustness properties for continuous endogenous variable SEMs, the extension of the MIIV estimator to ordered categorical endogenous variables (Bollen and Maydeu-Olivares in Psychometrika 72(3):309, 2007), and the introduction of a generalized method of moments estimator (Bollen et al., in Psychometrika 79(1):20–50, 2014). This paper furthers these developments by making several unique contributions not present in the prior literature: (1) we use matrix calculus to derive the analytic derivatives of the PIV estimator, (2) we extend the PIV estimator to apply to any mixture of binary, ordinal, and continuous variables, (3) we generalize the PIV model to include intercepts and means, (4) we devise a method to input known threshold values for ordinal observed variables, and (5) we enable a general parameterization that permits the estimation of means, variances, and covariances of the underlying variables to use as input into a SEM analysis with PIV. An empirical example illustrates a mixture of continuous variables and ordinal variables with fixed thresholds. We also include a simulation study to compare the performance of this novel estimator to WLSMV.
The paper proposes a composite likelihood estimation approach that uses bivariate instead of multivariate marginal probabilities for ordinal longitudinal responses using a latent variable model. The model considers time-dependent latent variables and item-specific random effects to be accountable for the interdependencies of the multivariate ordinal items. Time-dependent latent variables are linked with an autoregressive model. Simulation results have shown composite likelihood estimators to have a small amount of bias and mean square error and as such they are feasible alternatives to full maximum likelihood. Model selection criteria developed for composite likelihood estimation are used in the applications. Furthermore, lower-order residuals are used as measures-of-fit for the selected models.
We consider models which combine latent class measurement models for categorical latent variables with structural regression models for the relationships between the latent classes and observed explanatory and response variables. We propose a two-step method of estimating such models. In its first step, the measurement model is estimated alone, and in the second step the parameters of this measurement model are held fixed when the structural model is estimated. Simulation studies and applied examples suggest that the two-step method is an attractive alternative to existing one-step and three-step methods. We derive estimated standard errors for the two-step estimates of the structural model which account for the uncertainty from both steps of the estimation, and show how the method can be implemented in existing software for latent variable modelling.
Graphical models have received an increasing amount of attention in network psychometrics as a promising probabilistic approach to study the conditional relations among variables using graph theory. Despite recent advances, existing methods on graphical models usually assume a homogeneous population and focus on binary or continuous variables. However, ordinal variables are very popular in many areas of psychological science, and the population often consists of several different groups based on the heterogeneity in ordinal data. Driven by these needs, we introduce the finite mixture of ordinal graphical models to effectively study the heterogeneous conditional dependence relationships of ordinal data. We develop a penalized likelihood approach for model estimation, and design a generalized expectation-maximization (EM) algorithm to solve the significant computational challenges. We examine the performance of the proposed method and algorithm in simulation studies. Moreover, we demonstrate the potential usefulness of the proposed method in psychological science through a real application concerning the interests and attitudes related to fan avidity for students in a large public university in the United States.
Although the Bock–Aitkin likelihood-based estimation method for factor analysis of dichotomous item response data has important advantages over classical analysis of item tetrachoric correlations, a serious limitation of the method is its reliance on fixed-point Gauss-Hermite (G-H) quadrature in the solution of the likelihood equations and likelihood-ratio tests. When the number of latent dimensions is large, computational considerations require that the number of quadrature points per dimension be few. But with large numbers of items, the dispersion of the likelihood, given the response pattern, becomes so small that the likelihood cannot be accurately evaluated with the sparse fixed points in the latent space. In this paper, we demonstrate that substantial improvement in accuracy can be obtained by adapting the quadrature points to the location and dispersion of the likelihood surfaces corresponding to each distinct pattern in the data. In particular, we show that adaptive G-H quadrature, combined with mean and covariance adjustments at each iteration of an EM algorithm, produces an accurate fast-converging solution with as few as two points per dimension. Evaluations of this method with simulated data are shown to yield accurate recovery of the generating factor loadings for models of up to eight dimensions. Unlike an earlier application of adaptive Gibbs sampling to this problem by Meng and Schilling, the simulations also confirm the validity of the present method in calculating likelihood-ratio chi-square statistics for determining the number of factors required in the model. Finally, we apply the method to a sample of real data from a test of teacher qualifications.
This paper presents a new polychoric instrumental variable (PIV) estimator to use in structural equation models (SEMs) with categorical observed variables. The PIV estimator is a generalization of Bollen’s (Psychometrika 61:109–121, 1996) 2SLS/IV estimator for continuous variables to categorical endogenous variables. We derive the PIV estimator and its asymptotic standard errors for the regression coefficients in the latent variable and measurement models. We also provide an estimator of the variance and covariance parameters of the model, asymptotic standard errors for these, and test statistics of overall model fit. We examine this estimator via an empirical study and also via a small simulation study. Our results illustrate the greater robustness of the PIV estimator to structural misspecifications than the system-wide estimators that are commonly applied in SEMs.
The method of finding the maximum likelihood estimates of the parameters in a multivariate normal model with some of the component variables observable only in polytomous form is developed. The main stratagem used is a reparameterization which converts the corresponding log likelihood function to an easily handled one. The maximum likelihood estimates are found by a Fletcher-Powell algorithm, and their standard error estimates are obtained from the information matrix. When the dimension of the random vector observable only in polytomous form is large, obtaining the maximum likelihood estimates is computationally rather labor expensive. Therefore, a more efficient method, the partition maximum likelihood method, is proposed. These estimation methods are demonstrated by real and simulated data, and are compared by means of a simulation study.
The common maximum likelihood (ML) estimator for structural equation models (SEMs) has optimal asymptotic properties under ideal conditions (e.g., correct structure, no excess kurtosis, etc.) that are rarely met in practice. This paper proposes model-implied instrumental variable – generalized method of moments (MIIV-GMM) estimators for latent variable SEMs that are more robust than ML to violations of both the model structure and distributional assumptions. Under less demanding assumptions, the MIIV-GMM estimators are consistent, asymptotically unbiased, asymptotically normal, and have an asymptotic covariance matrix. They are “distribution-free,” robust to heteroscedasticity, and have overidentification goodness-of-fit J-tests with asymptotic chi-square distributions. In addition, MIIV-GMM estimators are “scalable” in that they can estimate and test the full model or any subset of equations, and hence allow better pinpointing of those parts of the model that fit and do not fit the data. An empirical example illustrates MIIV-GMM estimators. Two simulation studies explore their finite sample properties and find that they perform well across a range of sample sizes.
This paper presents a hierarchical Bayes circumplex model for ordinal ratings data. The circumplex model was proposed to represent the circular ordering of items in psychological testing by imposing inequalities on the correlations of the items. We provide a specification of the circumplex, propose identifying constraints and conjugate priors for the angular parameters, and accommodate theory-driven constraints in the form of inequalities. We investigate the performance of the proposed MCMC algorithm and apply the model to the analysis of value priorities data obtained from a representative sample of Dutch citizens.
Structural equation models with latent variables are sometimes estimated using an intuitive three-step approach, here denoted factor score regression. Consider a structural equation model composed of an explanatory latent variable and a response latent variable related by a structural parameter of scientific interest. In this simple example estimation of the structural parameter proceeds as follows: First, common factor models areseparately estimated for each latent variable. Second, factor scores areseparately assigned to each latent variable, based on the estimates. Third, ordinary linear regression analysis is performed among the factor scores producing an estimate for the structural parameter. We investigate the asymptotic and finite sample performance of different factor score regression methods for structural equation models with latent variables. It is demonstrated that the conventional approach to factor score regression performs very badly. Revised factor score regression, using Regression factor scores for the explanatory latent variables and Bartlett scores for the response latent variables, produces consistent estimators for all parameters.
In survey research it is not uncommon to ask questions of the following type: “How many times did you undertake action a in reference period T of length τ?.” The relationship is established between τ and the correlation of the number of reported actions with some background variable. To this end it is assumed that the process of actions satisfies a renewal model with individual heterogeneity. Also a model has to be formulated for possible recall effects. Applications are given in the field of medical consumption.
In behavioral, biomedical, and psychological studies, structural equation models (SEMs) have been widely used for assessing relationships between latent variables. Regression-type structural models based on parametric functions are often used for such purposes. In many applications, however, parametric SEMs are not adequate to capture subtle patterns in the functions over the entire range of the predictor variable. A different but equally important limitation of traditional parametric SEMs is that they are not designed to handle mixed data types—continuous, count, ordered, and unordered categorical. This paper develops a generalized semiparametric SEM that is able to handle mixed data types and to simultaneously model different functional relationships among latent variables. A structural equation of the proposed SEM is formulated using a series of unspecified smooth functions. The Bayesian P-splines approach and Markov chain Monte Carlo methods are developed to estimate the smooth functions and the unknown parameters. Moreover, we examine the relative benefits of semiparametric modeling over parametric modeling using a Bayesian model-comparison statistic, called the complete deviance information criterion (DIC). The performance of the developed methodology is evaluated using a simulation study. To illustrate the method, we used a data set derived from the National Longitudinal Survey of Youth.
A unifying framework for generalized multilevel structural equation modeling is introduced. The models in the framework, called generalized linear latent and mixed models (GLLAMM), combine features of generalized linear mixed models (GLMM) and structural equation models (SEM) and consist of a response model and a structural model for the latent variables. The response model generalizes GLMMs to incorporate factor structures in addition to random intercepts and coefficients. As in GLMMs, the data can have an arbitrary number of levels and can be highly unbalanced with different numbers of lower-level units in the higher-level units and missing data. A wide range of response processes can be modeled including ordered and unordered categorical responses, counts, and responses of mixed types. The structural model is similar to the structural part of a SEM except that it may include latent and observed variables varying at different levels. For example, unit-level latent variables (factors or random coefficients) can be regressed on cluster-level latent variables. Special cases of this framework are explored and data from the British Social Attitudes Survey are used for illustration. Maximum likelihood estimation and empirical Bayes latent score prediction within the GLLAMM framework can be performed using adaptive quadrature in gllamm, a freely available program running in Stata.