To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Be aware of algorithms that can be useful to you. There are many textbooks on algorithms and data structure. One of the more encyclopedic is the book by Cormen, Leiserson, Rivest, and Stein [23].
Scientific and engineering software almost always needs to be efficient in both time and memory. You should first consider optimizing at a high level, choosing data structures and algorithms that are inherently efficient. At a lower level, you should understand how computers work, and how to efficiently use modern computer architectures.
Choosing good algorithms and data structures is the foundation of designing efficient software. Without this, no amount of low-level tweaking of the code will make it run fast. This is especially true for large problems. By tuning your code you could gain an improvement of anything from, say, a factor of two to a factor of ten, in the speed of an algorithm. But moving from an algorithm that is O(n3) in time and O(n2) in memory to one that is O(n2) in time and O(n) in memory can give you a much greater benefit, especially if n ≍ 10 000 or larger. Sometimes you can get approximate algorithms that are fast, but the speed will depend on how accurately you want the answer. Here you need to know the problem well in order to see how large an error can be tolerated.
Numerical algorithms
Since the development of electronic digital computers (and even well before then) there has been a great deal of development of numerical algorithms. Many are highly efficient, accurate, and robust.
In the middle of a routine you realize that you need some scratch space n floating point numbers long, and n is a complicated function of the inputs to the routine. What do you do? Do you add another scratch space argument to your routine along with its length as an argument and check that it is big enough (stopping the program immediately if it is not)? Often a better idea is to allocate the memory needed. In Fortran 90 this is done using the allocate command; in C you use the malloc function; in C++ and Java the new operator will do this task; in Pascal the allocation is done by the new operator, but the syntax is different from C++ or Java. These can be used to dynamically allocate memory. All of these commands return, or set, a pointer to the allocated block of memory.
The allocated block of memory is taken from a global list of available memory which is ultimately controlled by the operating system. This block of memory remains allocated until it is no longer accessible (if garbage collection is used), explicitly de-allocated, or the program terminates. So dynamically allocated memory can be used to hold return values or returned data structures. Dynamically allocated memory can be passed to other routines, and treated like memory that has been statically allocated, or allocated on a stack in most respects.
The data structure that controls the allocation and de-allocation of memory is called a memory heap. A memory heap may contain a pair of linked lists of pointers to blocks of memory.
Optimizing the performance of software requires working with at least two different points of view. One is the global view where the questions are about how to structure the overall architecture of the system. Another view is how the individual routines are written, and even how each line of code is written. All of these views are important. Selecting good algorithms is as important as selecting good hardware and implementing algorithms efficiently.
When the term “optimization” is used in computing, it is often taken to mean something like picking compiler options or “writing tight code” that uses the least number of clock cycles or operations. This is clearly important in writing fast, effective software, but it is only a part of the process, which begins early in the design stage.
Usually the first part of the design stage is the design of the central data structures and databases that will be used. These should be chosen so that there are well-known efficient algorithms for handling these data structures, preferably with readily available implementations that perform correctly, reliably and efficiently. Then the algorithms to carry out the main tasks need to be selected. An important guide to selecting them is their asymptotic complexity or estimate of the time needed. However, this is not the only guide; see the last section of this chapter for more information about how to refine this information and to make sure that the asymptotic estimates are relevant in practice. The last part is the detailed design process where the algorithms are implemented, and this should be done with an eye on efficiency to ensure that the whole system works well.
The development of the Internet has contributed to the development of public libraries of scientific software. Some of this development has occurred through the efforts of individuals, through Internet collaborations (as with the development of Linux), and through government supported software development by academics and others (as with the development of LAPACK). There is a wide range of other software packages for scientific computing, many now written in C/C++, although Fortran and other languages are used for parts of many of these systems: PETSc (which supports the use of MPI for parallel computation), IML++ (an iterative methods library in C++), SparseLib++ (for handling sparse matrices in C++), and PLTMG (for solving partial differential equations).
In parallel with this, there has also been a tremendous development of commercial numerical software. Beginning in 1970 the Numerical Algorithms Group (NAG), based in the UK, developed libraries which have been sold commercially as the NAG libraries since 1976; the Harwell library was also developed in the UK; the IMSL libraries were developed commercially in the US. Another set of numerical libraries, called SLATEC, was developed by the Sandia and Los Alamos US National Laboratories and the US Air Force. These are available through netlib (see the next section). Perhaps the most spectacular example of commercial numerical software is the development of MATLAB. Initially a collection of Fortran 77 routines based on the early LINPACK and EISPACK libraries for dense matrix computations with a text interface, MATLAB has evolved into a full-featured interactive programming language with special support for numerical computation and scientific visualization.
The Discontinuous Galerkin Time Domain (DGTD) methods are now popular for the solution of wave propagation problems. Able to deal with unstructured, possibly locally-refined meshes, they handleeasily complex geometries and remain fully explicit with easy parallelization and extension to high orders of accuracy. Non-dissipative versions exist, where some discrete electromagnetic energy is exactly conserved. However, the stability limit of the methods, related to the smallest elements in the mesh, calls for the construction of local-time stepping algorithms. These schemes have already been developed for N-body mechanical problems and are known as symplectic schemes. They are applied here to DGTD methods on wave propagation problems.
We consider the Stokes problem provided with non standard boundary conditions which involve the normal component of the velocity and the tangential components of the vorticity. We write a variational formulation of this problem with three independent unknowns: the vorticity, the velocity and the pressure. Next we propose a discretization by spectral element methods which relies on this formulation. A detailed numerical analysis leads to optimal error estimates for the three unknowns and numerical experiments confirm the interest of the discretization.
We consider models based on conservation laws. For the optimizationof such systems, a sensitivity analysis is essential to determinehow changes in the decision variables influence the objectivefunction. Here we study the sensitivity with respect to the initialdata of objective functions that depend upon the solution of Riemannproblems with piecewise linear flux functions. We presentrepresentations for the one–sided directional derivatives of theobjective functions. The results can be used in the numerical methodcalled Front-Tracking.
In recent years several papers have been devoted to stabilityand smoothing properties in maximum-norm offinite element discretizations of parabolic problems.Using the theory of analytic semigroups it has been possibleto rephrase such properties as bounds for the resolventof the associated discrete elliptic operator. In all thesecases the triangulations of the spatial domain has beenassumed to be quasiuniform. In the present paper weshow a resolvent estimate, in one and two space dimensions,under weaker conditions on the triangulations than quasiuniformity.In the two-dimensional case, the bound for the resolvent containsa logarithmic factor.
In this paper we develop a residual based a posteriori error analysis for an augmentedmixed finite element method applied to the problem of linear elasticity in the plane.More precisely, we derive a reliable and efficient a posteriori error estimator for thecase of pure Dirichlet boundary conditions. In addition, several numericalexperiments confirming the theoretical properties of the estimator, andillustrating the capability of the corresponding adaptive algorithm to localize the singularities and the large stress regions of the solution, are also reported.
This paper proposes and analyzes a BEM-FEM scheme to approximatea time-harmonic diffusion problem in the plane with non-constantcoefficients in a bounded area. The model is set as a Helmholtztransmission problem with adsorption and with non-constantcoefficients in a bounded domain. We reformulate the problem as afour-field system. For the temperature and the heat flux we usepiecewise constant functions and lowest order Raviart-Thomaselements associated to a triangulation approximating the boundeddomain. For the boundary unknowns we take spaces of periodicsplines. We show how to transmit information from the approximateboundary to the exact one in an efficient way and provewell-posedness of the Galerkin method. Error estimates areprovided and experimentally corroborated at the end of the work.
The SGI Power C compiler (PCA) does not allow more threads than processors (cf. the document “Multiprocessing C Compiler Directives”). In this sense, programs execute like the fork() programming model.
The keyword critical corresponds most closely with mutex in that only one thread at a time can execute this code and all threads execute it. The keyword synchronize corresponds most closely with barrier in that all threads must arrive at this point before any thread can go on.
There are also additional directives. The directive one processor means that the first thread to reach this code executes it meanwhile other threads wait. After execution by the first thread, the code is skipped by subsequent threads. There is an enter gate and corresponding exit gate directive. Threads must wait at the exit gate until all threads have passed the matching enter gate.
Loops to run in parallel must be marked with the pfor directive. It takes the argument iterate (start index; number of times through the loop; increment/decrement amount).
A reduction variable is local to each thread and their contributions must be added in a critical section.
The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential rate (see Fig. 1). Such an exponential growth is predicted by the well-known “Moore's Law,” first advanced in 1965 by Gordon Moore of Intel, asserting that the number of transistors per inch on integrated circuits will double every 18 months. Clearly there has been a great need for ever more computation. This need continues today unabated. The calculations performed by those original computers were in the fields of ballistics, nuclear fission, and cryptography. And, today these fields, in the form of computational fluid dynamics, advanced simulation for nuclear testing, and cryptography, are among computing's Grand Challenges.
In 1991, the U.S. Congress passed the High Performance Computing Act, which authorized The Federal High Performance Computing and Communications (HPCC) Program. A class of problems developed in conjunction with the HPCC Program was designated “Grand Challenge Problems” by Dr. Ken Wilson of Cornell University. These problems were characterized as “fundamental problems in science and engineering that have broad economic or scientific impact and whose solution can be advanced by applying high performance computing techniques and resources.” Since then various scientific and engineering committees and governmental agencies have added problems to the original list. As a result, today there are many Grand Challenge problems in engineering, mathematics, and all the fundamental sciences. The ambitious goals of recent Grand Challenge efforts strive to
build more energy-efficient cars and airplanes,
design better drugs,
forecast weather and predict global climate change,