This glossary gives brief definitions of all the key terms used in the book.
adjusted R2: a measure of how well a model fits the sample data that automatically penalises models with large numbers of parameters.
Akaike information criterion (AIC): a metric that can be used to select the best fitting from a set of competing models and that incorporates a weak penalty term for including additional parameters.
alternative hypothesis: a formal expression as part of a hypothesis testing framework that encompasses all of the remaining outcomes of interest aside from that incorporated into the null hypothesis.
arbitrage: a concept from finance that refers to the situation where profits can be made without taking any risk (and without using any wealth).
asymptotic: a property that applies as the sample size tends to infinity.
autocorrelation: a standardised measure, which must lie between −1 and +1, of the extent to which the current value of a series is related to its own previous values.
autocorrelation function: a set of estimated values showing the strength of association between a variable and its previous values as the lag length increases.
autocovariance: an unstandardised measure of the extent to which the current value of a series is related to its own previous values.
autoregressive conditional heteroscedasticity (ARCH) model: a time series model for volatilities.
autoregressive (AR) model: a time series model where the current value of a series is fitted with its previous values.
autoregressive moving average (ARMA) model: a time series model where the current value of a series is fitted with its previous values (the autoregressive part) and the current and previous values of an error term (the moving average part).
autoregressive volatility (ARV) model: a time series model where the current volatility is fitted with its previous values.
auxiliary regression: a second stage regression that is usually not of direct interest in its own right, but rather is conducted in order to test the statistical adequacy of the original regression model.
balanced panel: a dataset where the variables have both time series and cross-sectional dimensions, and where there are equally long samples for each cross-sectional entity (i.e. no missing data).
Bayes information criterion: see Schwarz’s Bayesian information criterion (SBIC).
BDS test: a test for whether there are patterns in a series, predominantly used for determining whether there is evidence for nonlinearities.
BEKK model: a multivariate model for volatilities and covariances between series that ensures the variance–covariance matrix is positive definite.
BHHH algorithm: a technique that can be used for solving optimisation problems including maximum likelihood.
backshift operator: see lag operator.
Bera–Jarque test: a widely employed test for determining whether a series closely approximates a normal distribution.
best linear unbiased estimator (BLUE): is one that provides the lowest sampling variance and which is also unbiased.
between estimator: is used in the context of a fixed effects panel model, involving running a cross-sectional regression on the time averaged values of all the variables in order to reduce the number of parameters requiring estimation.
biased estimator: where the expected value of the parameter to be estimated is not equal to the true value.
bid–ask spread: the difference between the amount paid for an asset (the ask or offer price) when it is purchased and the amount received if it is sold (the bid).
binary choice: a discrete choice situation with only two possible outcomes.
bivariate regression: a regression model where there are only two variables – the dependent variable and a single independent variable.
bootstrapping: a technique for constructing standard errors and conducting hypothesis tests that requires no distributional assumptions and works by resampling from the data.
Box–Jenkins approach: a methodology for estimating ARMA models.
Box–Pierce Q-statistic: a general measure of the extent to which a series is autocorrelated.
break date: the date at which a structural change occurs in a time series or in a model’s parameters.
Breusch–Godfrey test: a test for autocorrelation of any order in the residuals from an estimated regression model, based on an auxiliary regression of the residuals on the original explanatory variables plus lags of the residuals.
broken trend: a process which is a deterministic trend with a structural break.
calendar effects: the systematic tendency for a series, especially stock returns, to be higher at certain times than others.
capital asset pricing model (CAPM): a financial model for determining the expected return on stocks as a function of their level of market risk.
capital market line (CML): a straight line showing the risks and returns of all combinations of a risk-free asset and an optimal portfolio of risky assets.
Carhart model: a time series model for explaining the performance of mutual funds or trading rules based on four factors: excess market returns, size, value and momentum.
causality tests: a way to examine whether one series leads or lags another.
censored dependent variable: where values of the dependent variable above or below a certain threshold cannot be observed, while the corresponding values for the independent variables are still available.
central limit theorem: the mean of a sample of data having any distribution converges upon a normal distribution as the sample size tends to infinity.
chaos theory: an idea taken from the physical sciences whereby although a series may appear completely random to the naked eye or to many statistical tests, in fact there is an entirely deterministic set of non-linear equations driving its behaviour.
Chow test: an approach to determine whether a regression model contains a change in behaviour (structural break) part-way through based on splitting the sample into two parts, assuming that the break-date is known.
Cochrane–Orcutt procedure: an iterative approach that corrects standard errors for a specific form of autocorrelation.
coefficient of multiple determination: see R2.
cointegration: a concept whereby time series have a fixed relationship in the long run.
cointegrating vector: the set of parameters that describes the long-run relationship between two or more time series.
common factor restrictions: these are the conditions on the parameter estimates that are implicitly assumed when an iterative procedure such as Cochrane–Orcutt is employed to correct for autocorrelation.
conditional expectation: the value of a random variable that is expected for time t + s (s = 1, 2, . . .) given information available until time t.
conditional mean: the mean of a series at a point in time t fitted given all information available until the previous point in time t − 1.
conditional variance: the variance of a series at a point in time t fitted given all information available until the previous point in time t − 1.
confidence interval: a range of values within which we are confident to a given degree (e.g. 95% confident) that the true value of a given parameter lies.
confidence level: one minus the significance level (expressed as a proportion rather than a percentage) for a hypothesis test.
consistency: the desirable property of an estimator whereby the calculated value of a parameter converges upon the true value as the sample size increases.
contemporaneous terms: those variables that are measured at the same time as the dependent variable – i.e. both are at time t.
continuous variable: a random variable that can take on any value (possibly within a given range).
convergence criterion: a pre-specified rule that tells an optimiser when to stop looking further for a solution and to stick with the best one it has already found.
copulas: a flexible way to link together the distributions for individual series in order to form joint distributions.
correlation: a standardised measure, bounded between −1 and +1, of the strength of association between two variables.
correlogram: see autocorrelation function.
cost of carry (COC) model: shows the equilibrium relationship between spot and corresponding futures prices where the spot price is adjusted for the cost of ‘carrying’ the spot asset forward to the maturity date.
covariance matrix: see variance–covariance matrix.
covariance stationary process: see weakly stationary process.
covered interest parity (CIP): states that exchange rates should adjust so that borrowing funds in one currency and investing them in another would not be expected to earn abnormal profits.
credit rating: an evaluation made by a ratings agency of the ability of a borrower to meet its obligations to meet interest costs and to make capital repayments when due.
critical values (CV): key points in a statistical distribution that determine whether, given a calculated value of a test statistic, the null hypothesis will be rejected or not.
cross-equation restrictions: a set of restrictions needed for a hypothesis test that involves more than one equation within a system.
cross-sectional regression: a regression involving series that are measured only at a single point in time but across many entities.
cumulative distribution: a function giving the probability that a random variable will take on a value lower than some pre-specified value.
CUSUM and CUSUMSQ tests: tests for parameter stability in an estimated model based on the cumulative sum of residuals (CUSUM) or cumulative sum of squared residuals (CUSUMSQ) from a recursive regression.
daily range estimator: a crude measure of volatility calculated as the difference between the day’s lowest and highest observed prices.
damped sine wave: a pattern, especially in an autocorrelation function plot, where the values cycle from positive to negative in a declining manner as the lag length increases.
data generating process (DGP): the true relationship between the series in a model.
data mining: looking very intensively for patterns in data and relationships between series without recourse to financial theory, possibly leading to spurious findings.
data revisions: changes to series, especially macroeconomic variables, that are made after they are first published.
data snooping: see data mining.
day-of-the-week effect: the systematic tendency for stock returns to be higher on some days of the week than others.
degrees of freedom: a parameter that affects the shape of a statistical distribution and therefore its critical values. Some distributions have one degree of freedom parameter, while others have more.
degree of persistence: the extent to which a series is positively related to its previous values.
dependent variable: the variable, usually denoted by y that the model tries to explain.
deterministic: a process that has no random (stochastic) component.
Dickey–Fuller (DF) test: an approach to determining whether a series conta