To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A common way to handle non-linearity in complex time series data is to try splitting the data up into a number of simpler segments. Sometimes we have domain knowledge to support this piecewise modelling approach, for example in condition monitoring applications. In such problems, the evolution of some observed data is governed by a number of hidden factors that switch between different modes of operation. In real-world data, e.g. from medicine, robotic control or finance, we might be interested in factors which represent pathologies, mechanical failure modes, or economic conditions respectively. Given just the monitoring data, we are interested in recovering the state of the factors that gave rise to it.
A good model for this type of problem is the switching linear dynamical system (SLDS), which has been discussed in previous chapters. A latent ‘switch’ variable in this type of model selects between different linear-Gaussian state spaces. In this chapter we consider a generalisation, the factorial switching linear dynamical system (FSLDS), where instead of a single switch setting there are multiple discrete factors that collectively determine the dynamics. In practice there may be a very large number of possible factors, and we may only have explicit knowledge of commonly occurring ones.
We illustrate how the FSLDS can be used in the physiological monitoring of premature babies in intensive care. This application is a useful introduction because it has complex observed data, a diverse range of factors affecting the observations, and the challenge of many ‘unknown’ factors.
Time series are studied in a variety of disciplines and appear in many modern applications such as financial time series prediction, video-tracking, music analysis, control and genetic sequence analysis. This widespread interest at times obscures the commonalities in the developed models and techniques. A central aim of this book is to attempt to make modern time series techniques, specifically those based on probabilistic modelling, accessible to a broad range of researchers.
In order to achieve this goal, leading researchers that span the more traditional disciplines of statistics, control theory, engineering and signal processing, and the more recent areas of machine learning and pattern recognition, have been brought together to discuss advancements and developments in their respective fields. In addition, the book makes extensive use of the graphical models framework. This framework facilitates the representation of many classical models and provides insight into the computational complexity of their implementation. Furthermore, it enables to easily envisage new models tailored for a particular environment. For example, the book discusses novel state space models and their application in signal processing including condition monitoring and tracking. The book also describes modern developments in the machine learning community applied to more traditional areas of control theory.
The effective application of probabilistic models in the real world is gaining pace, largely through increased computational power which brings more general models into consideration through carefully developed implementations.
Sensor networks have recently generated a great deal of research interest within the computer and physical sciences, and their use for the scientific monitoring of remote and hostile environments is increasingly commonplace. While early sensor networks were a simple evolution of existing automated data loggers, that collected data for later offline scientific analysis, more recent sensor networks typically make current data available through the Internet, and thus, are increasingly being used for the real-time monitoring of environmental events such as floods or storm events (see [10] for a review of such environmental sensor networks).
Using real-time sensor data in this manner presents many novel challenges. However, more significantly for us, many of the information processing tasks that would previously have been performed offline by the owner or single user of an environmental sensor network (such as detecting faulty sensors, fusing noisy measurements from several sensors, and deciding how frequently readings should be taken), must now be performed in real-time on the mobile computers and PDAs carried by the multiple different users of the system (who may have different goals and may be using sensor readings for very different tasks). Importantly, it may also be necessary to use the trends and correlations observed in previous data to predict the value of environmental parameters into the future, or to predict the reading of a sensor that is temporarily unavailable (e.g. due to network outages).
Optimising a sequence of actions to attain some future goal is the general topic of control theory [26, 9]. It views an agent as an automaton that seeks to maximise expected reward (or minimise cost) over some future time period. Two typical examples that illustrate this are motor control and foraging for food.
As an example of a motor control task, consider a human throwing a spear to kill an animal. Throwing a spear requires the execution of a motor program that is such that at the moment that the hand releases the spear it has the correct speed and direction to hit the desired target. A motor program is a sequence of actions, and this sequence can be assigned a cost that consists generally of two terms: a path cost that specifies the energy consumption to contract the muscles to execute the motor program, and an end cost that specifies whether the spear will kill the animal, just hurt it, or miss it altogether. The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort. If x denotes the state space (the positions and velocities of the muscles), the optimal control solution is a function u(x, t) that depends both on the actual state of the system at each time t and also explicitly on time.
Hidden Markov models (HMMs) are a rich family of probabilistic time series models with a long and successful history of applications in natural language processing, speech recognition, computer vision, bioinformatics, and many other areas of engineering, statistics and computer science. A defining property of HMMs is that the time series is modelled in terms of a number of discrete hidden states. Usually, the number of such states is specified in advance by the modeller, but this limits the flexibility of HMMs. Recently, attention has turned to Bayesian methods which can automatically infer the number of states in an HMM from data. A particularly elegant and flexible approach is to assume a countable but unbounded number of hidden states; this is the nonparametric Bayesian approach to hidden Markov models first introduced by Beal et al. [4] and called the infinite HMM (iHMM). In this chapter, we review the literature on Bayesian inference in HMMs, focusing on nonparametric Bayesian models. We show the equivalence between the Polya urn interpretation of the infinite HMM and the hierarchical Dirichlet process interpretation of the iHMM in Teh et al. [35]. We describe efficient inference algorithms, including the beam sampler which uses dynamic programming. Finally, we illustrate how to use the iHMM on a simple sequence labelling task and discuss several extensions.
Gaussian processes (GPs) have a long history in statistical physics and mathematical probability. Two of the most well-studied stochastic processes, Brownian motion [12, 47] and the Ornstein–Uhlenbeck process [43], are instances of GPs. In the context of regression and statistical learning, GPs have been used extensively in applications that arise in geostatistics and experimental design [26, 45, 7, 40]. More recently, in the machine learning literature, GPs have been considered as general estimation tools for solving problems such as non-linear regression and classification [29]. In the context of machine learning, GPs offer a flexible nonparametric Bayesian framework for estimating latent functions from data and they share similarities with neural networks [23] and kernel methods [35].
In standard GP regression, where the likelihood is Gaussian, the posterior over the latent function (given data and hyperparameters) is described by a new GP that is obtained analytically. In all other cases, where the likelihood function is non-Gaussian, exact inference is intractable and approximate inference methods are needed. Deterministic approximate methods are currently widely used for inference in GP models [48, 16, 8, 29, 19, 34]. However, they are limited by an assumption that the likelihood function factorises. In addition, these methods usually treat the hyperparameters of the model (the parameters that appear in the likelihood and the kernel function) in a non full Bayesian way by providing only point estimates.
Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods, and yet generally requiring less computational time than Markov chain Monte Carlo methods. In particular the variational expectation maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free-energy, are widely used in time series modelling. Here, we investigate the success of vEM in simple probabilistic time series models. First we consider the inference step of vEM, and show that a consequence of the well-known compactness property of variational inference is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such as mean-field) can lead to less bias than more complicated structured approximations.
The variational approach
We begin this chapter with a brief theoretical review of the variational expectation maximisation algorithm, before illustrating the important concepts with a simple example in the next section. The vEM algorithm is an approximate version of the expectation maximisation (EM) algorithm [4]. Expectation maximisation is a standard approach to finding maximum likelihood (ML) parameters for latent variable models, including hidden Markov models and linear or non-linear state space models (SSMs) for time series.