To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We present numerical evidence for the blow-up of solution for theEuler equations. Our approximate solutions are Taylor polynomials in the timevariable of an exact solution, and we believe that in terms of the exact solution,the blow-up will be rigorously proved.
We consider a general loaded arch problem with a small thickness. Toapproximate the solution of this problem, a conforming mixed finite elementmethod which takes into account an approximation of the middle line of thearch is given. But for a very small thickness such a method gives poor errorbounds. the conforming Galerkin method is then enriched with residual-freebubble functions.
Chapter 10 provided an overview of Monte Carlo methods and dealt solely with the problem of generating from the uniform distribution. Since the uniform distribution is the fundamental distribution, we're now prepared to deal with the postponed problem of generating from other distributions. Given the results of Chapter 10, this problem should be viewed as transforming a source sequence of IID uniform random variables {Ui} to an IID sequence of random variables {Xi} with cumulative distribution function (cdf) F. A discussion of general methods for generating from continuous distributions forms Section 11.2. Specific algorithms designed for various distributions, such as the normal and Student's t, follow in Section 11.3. General methods for discrete distributions are discussed in Section 11.4, with specific cases in Section 11.5. Special problems, including random sampling from a population, are handled in Section 11.6. The problem of accuracy in Monte Carlo is tackled in Section 11.7.
Some general remarks are in order before pursuing the problem at hand. Algorithms for generating random variables should always be simple, fast, and exact. Simplicity is paramount, since users must often code and debug their own programs. Finding errors in random output is very difficult (see Exercises 11.14 and 11.20). If an algorithm is simple, most mistakes will bring consequences so severe that the error can be easily discovered. Speed is not so important, since the computational effort in generation is usually only a small fraction of the total effort in the Monte Carlo experiment.
Maximum likelihood is generally regarded as the best all-purpose approach for statistical analysis. Outside of the most common statistical procedures, when the “optimal” or “usual” method is unknown, most statisticians follow the principle of maximum likelihood for parameter estimation and statistical hypothesis tests. Bayesian statistical methods also rely heavily on maximum likelihood. The main reason for this reliance is that following the principle of maximum likelihood usually leads to very reasonable and effective estimators and tests. From a theoretical viewpoint, under very mild conditions, maximum likelihood estimators (MLEs) are consistent, asymptotically unbiased, and efficient. Moreover, MLEs are invariant under reparameterizations or transformations: the MLE of a function of the parameter is the function of the MLE. From a practical viewpoint, the estimates and test statistics can be constructed without a great deal of analysis, and large-sample standard errors can be computed. Overall, experience has shown that maximum likelihood works well most of the time.
The biggest computational challenge comes from the naive expectation that any statistical problem can be solved if the maximum of some function is found. Instead of relying solely on the unconstrained optimization methods presented in Chapter 8 to meet this unrealistic expectation, the nature of the likelihood function can be exploited in ways that are more effective for computing MLEs. Since the exploitable properties of likelihood functions follow from the large-sample theory, this chapter will begin with a summary of the consistency and asymptotic normality properties of MLEs.
One of the main advantages of Monte Carlo integration is a rate of convergence that is unaffected by increasing dimension, but a more important advantage for statisticians is the familiarity of the technique and its tools. Although Markov chain Monte Carlo (MCMC) methods are designed to integrate high-dimensional functions, the ability to exploit distributional tools makes these methods much more appealing to statisticians. In contrast to importance sampling with weighted observations, MCMC methods produce observations that are no longer independent; rather, the observations come from a stationary distribution and so time-series methods are needed for their analysis. The emphasis here will be on using MCMC methods for Bayesian problems with the goal of generating a series of observations whose stationary distribution π(t) is proportional to the unnormalized posterior p*(t). Standard statistical methods can then be used to gain information about the posterior.
The two general approaches covered in this chapter are known as Gibbs sampling and the Metropolis–Hastings algorithm, although the former can be written as a special case of the latter. Gibbs sampling shows the potential of MCMC methods for Bayesian problems with hierarchical structure, also known as random effects or variance components. The key ingredient in Gibbs sampling is the ability to generate from the conditional distribution of each variable given the others; in the case of three components, generating from f(x | Y = y, Z = z), f(y | X = x, Z = z), and f(z | X = x, Y = y).
Discussing algorithms before computers emphasizes the point that algorithms are valuable mathematical constructs in themselves and exist even in the absence of computers. An algorithm is a list of instructions for the completion of a task. The best examples of algorithms in everyday life are cooking recipes, which specify how a list of ingredients are to be manipulated to produce a desired result. For example, consider the somewhat trivial recipe for cooking a three-minute egg. The recipe (or algorithm) for such a task exemplifies the need to take nothing for granted in the list of instructions.
Algorithm Three-Minute Egg
Put water in a pan.
Turn on the heat.
When the water boils, flip over the egg timer.
When the timer has run out, turn off the heat.
Pour some cold water in the pan to cool the water.
Remove egg.
Although this algorithm may appear trivial to most readers, detailed examination further emphasizes (or belabors) how clear and unambiguous an algorithm must be. First, the receiver of these instructions, call it the actor, must recognize all of the jargon of food preparation: water, pan, egg, egg timer, boil, and so forth. The actor must recognize constructions: “put ____ in ____”; “when ____, do ____.” Some parts of these instructions are unnecessary: “to cool the water.” If the actor is an adult who understands English, this may be a fine algorithm. To continue to belabor the point, the actor can only do what the instructions say.
One of the advantages of Monte Carlo methods, as highlighted in Chapter 10, is that the whole array of statistical tools are available to analyze the results and assess the accuracy of any estimate. Sadly, the statistical analysis of many Monte Carlo experiments has been absent, with others poorly done. Quite simply, statisticians do not always practice what they preach. One rationalization with some validity is that the statistical tools for analyzing these data are beyond the mainstream of statistical methodology; one of the goals of this chapter is to remove this as a possible excuse. Some of the fundamental statistical tools are reviewed in Section 12.2. Density estimation, long an object of theoretical discourse, becomes an important tool in expressing the results of Monte Carlo studies; a brief discussion of the highlights of density estimation is included in this section. The most common statistical tests for these data involve testing whether a sample arises from a specified distribution; a brief discussion of goodness-of-fit tests forms Section 12.3. Importance sampling, discussed briefly in Chapter 10, presents a class of statistical problems with weighted observations. This requires some minor modifications of common statistical tools that are outlined in Section 12.4. An attendant problem with importance sampling is concern for the distribution of the weights; tests on the behavior of the distribution of weights are discussed in Section 12.5.
The other goal of this chapter is to introduce some specialized integration tools. In Section 12.6, Laplace's method provides an asymptotic approximation for moments of a posterior based mainly on the large-sample normal approximation to the posterior.
In recent years, linear algebra has become as fundamental a mathematical tool as calculus. Since its role in statistics is so prominent, matrix computations and the solution of linear equations are fundamental to the computing of statistics. Hence the treatment in this chapter is rather traditional. The study of one particular statistical problem, regression, is postponed, and some problems arising in time-series analysis are discussed in the next chapter.
Numerical analysts always talk about the solution to a system of equations, Ax = b, for the thought of computing an inverse is considered (for reasons often unstated) naive and gauche. Although the tone is haughty, the reasoning is sound, and while the mathematics of A−1B speaks of inverses, its computation means solving systems of equations with several right-hand sides. To emphasize, although the algebra may be written in terms of inverses, careful analysis to convert the computations to solving systems of equations with many right-hand sides may lead to substantial savings in computing time.
The systems of equations to be treated here will always be square and consistent – that is, they will always have a solution. When this assumption is violated, the problem of solving a system of equations changes its nature, sometimes to a regression problem (discussed in Chapter 5) or to an eigenproblem (Chapter 6).
The first topic to be covered is an introduction to the computational and storage tricks that are so useful in matrix computations. Other acts of computational sleight of hand will be introduced as they arise.
The theme of this chapter is a simple one: there may be better, faster ways of computing something than you may ever have thought of. One of the maxims of computer science is that just a few programs use most of the resources. Early in the history of computing, people recognized that much of the computing resources went into the common task of sorting a list of numbers. If this task could be done faster, then everything could be done better. A concentrated effort on improving sorting algorithms led to several breakthroughs, all following the principle known as “divide and conquer.” In simple terms, to solve a large task, break it into smaller tasks of the same kind. Cleverly done, the resulting algorithm can be more efficient than anyone would have expected, earning the jargon adjective “fast.” The principle of divide and conquer will be discussed in the next section, followed by a discussion of fast algorithms for sorting. Section 14.4 comprises statistical applications of divide and conquer. Another great breakthrough, the fast Fourier transform (FFT), will be discussed in Section 14.5. Using the FFT to compute convolutions will be discussed in Section 14.6, followed by some interesting applications of the FFT to statistics in Section 14.7. This chapter will close with some topics that are important but don't really fit elsewhere: algorithms for constructing permutations and combinations.
Divide and Conquer
The general principle of divide and conquer is to break a difficult task into subtasks, solve the subtasks, and then put the solutions together to solve the original task.
Most of the time, we wish to be blissfully ignorant of the inner workings of any complicated machine. When we drive an automobile, traffic and road conditions demand our concentration, and we would prefer that our attention wander to a favorite song on the radio than to the oil pressure gauge. Only when trouble arises need we concern ourselves with the internal combustion engine, air pressure in the tires, the lever arms in the steering system, or a lug wrench. With as complicated a machine as a computer, most of the time we can likewise treat its inner workings as a collection of black boxes. However, researchers regularly operate a computer at its limits in the same way a race-car driver takes an automobile to the limits of its capabilities. In order to drive safely bumper-to-bumper at 200 mph, a race-car driver must understand the operation of every system of the machine. A researcher must understand the inner workings of the arithmetic of the computer; otherwise, Overflow and Underflow become mysterious demons. Knowledge will not only dispel the fears brought on by ignorance, it will also permit the researcher to control his or her computational destiny and not fall victim to “roundoff error” any more than to “racing luck.”
The first three sections of this chapter present a brief overview of the mechanics of computer arithmetic. Although necessary for ground-level knowledge, they should be skimmed at first reading because the interesting details of the problems can easily sidetrack the reader.
This book grew out of notes for my Statistical Computing course that I have been teaching for the past 20 years at North Carolina State University. The goal of this course is to prepare doctoral students with the computing tools needed for statistical research, and I have augmented this core with related topics that through the years I have found useful for colleagues and graduate students. As a result, this book covers a wide range of computational issues, from arithmetic, numerical linear algebra, and approximation, which are typical numerical analysis topics, to optimization and nonlinear regression, to random number generation, and finally to fast algorithms. I have emphasized numerical techniques, but restricted the scope to those regularly employed in the field of statistics, and dropped some traditional numerical analysis topics such as differential equations. Many of the exercises in this book arose from questions posed to me by colleagues and students.
Most of the students that I have taught come with a graduate level understanding of statistics, no experience in numerical analysis, and little skill in a programming language. Consequently, I cover only about half of this material in a one-semester course. For those with a background in numerical analysis, a basic understanding of two statistical topics, regression and maximum likelihood, would be necessary.
I would advise any instructor of statistical computing not to shortchange the fundamental topic of arithmetic.
The juxtaposition of these two topics may appear strange to many readers. Upon further reflection, the common thread of spreading points in space may become apparent. My point in combining these topics is to emphasize that this thread is not weak. Monte Carlo should be viewed as just another way to compute an integral; numerical integration should be viewed as just another way to sample points in space. Great gains can be made by exploiting the strengths of one approach when the other is floundering. Only with the willingness to adjust one's viewpoint and use these tools in combination can the full array of techniques be brought to bear on a difficult problem.
Tools such as Riemann sums and Simpson's rule characterize the set of tools known as fixed quadrature or simply quadrature. A viewpoint of these methods as a discretization of the continuous problem of integration is indeed naive. The points are spread in a fixed way in space, with the number of points set in advance. Most of these methods employ a weighting scheme, so that the points (abscissas) where a function is to be evaluated have varying importance. For estimating an integral by evaluating a function at N points in one dimension, the error converges to zero at a rate of O(N−2) or better, depending on the smoothness of the function. In higher dimensions, however, this rate slows considerably. Assessing the accuracy of quadrature methods is extremely difficult, and most assessments are quite naive.
This chapter serves as an appetizer to the main course, maximum likelihood and nonlinear least squares. This is stated so boldly because many statistical problems of this type originate in estimation problems with maximum likelihood (or a similar criterion) as the goal. Our discussion begins with some of the background calculus and definitions. Next, the discussion turns to the safe and slow methods for optimization in a single variable, for which the statistical term “nonparametric” has the correct connotations. Next, the root-finding problem is addressed with the standard techniques, Newton and secant methods, followed by a brief presentation of convergence rates. After a short digression on stopping and condition, the multivariate problem is first approached with Newton's methods. After a second digression on numerical differentiation, quasi-Newton methods are discussed for optimization and nonlinear equations. Discussions of condition, scaling, and implementation conclude the chapter.
Some topics are not addressed in this discussion. One problem is the solution of polynomial equations, which arise rarely in an isolated form in statistics. Constrained optimization can often be avoided through reparameterization. The specialized problem of nonlinear regression is postponed until the next chapter, to be treated as a special topic in maximum likelihood.
Before attacking the problems at hand, it is wise to review some foundations to gain a clearer perspective of the situation. The cornerstone for everything are the first results of calculus, the primary tools in applied mathematics. These results will first be stated in their univariate form.