Numerical homogenization beyond scale separation

Numerical homogenization is a methodology for the computational solution of multiscale partial differential equations. It aims at reducing complex large-scale problems to simplified numerical models valid on some target scale of interest, thereby accounting for the impact of features on smaller scales that are otherwise not resolved. While constructive approaches in the mathematical theory of homogenization are restricted to problems with a clear scale separation, modern numerical homogenization methods can accurately handle problems with a continuum of scales. This paper reviews such approaches embedded in a historical context and provides a unified variational framework for their design and numerical analysis. Apart from prototypical elliptic model problems, the class of partial differential equations covered here includes wave scattering in heterogeneous media and serves as a template for more general multi-physics problems.


Introduction
This article reviews numerical homogenization as an important tool for computeraided simulation of complex multiscale processes involving characteristic features at a large range of possibly non-separable length scales.

Multiscale problems
Multiscale problems appear in all kinds of disciplines, such as physics, engineering or the geosciences. Within the latter category, one of the most influential examples for numerical homogenization is two-phase flow in highly heterogeneous porous media. Examples are the simulation of groundwater flow (with fluids air and water) or the modelling of carbon sequestration, i.e. the storage of carbon dioxide in deep geological layers (with fluids CO 2 and brine). These problems have a strong multiscale character due to the strongly heterogeneous geological subsurface formations which consist of mixtures of different rock and soil types; see Helmig (1997). The mathematical model of the flow of two immiscible and incompressible fluids is driven by the law of mass balance, which yields Θ(x) ∂ t S α (x, t) + ∇ · v α (x, t) = q α (x) for α = w, n. The unknowns S α (with 0 ≤ S α (x, t) ≤ 1) describe the saturations of the wetting (α = w) and non-wetting (α = n) fluid and v α are the corresponding fluxes. Furthermore, Θ describes the porosity and q w and q n are two source terms. Darcy's law relates the fluxes with the two unknown pressures p w and p n by v α = −K k α (S α ) µ α (∇p α − ρ α g) for α = w, n. (1.1) Here K denotes the hydraulic conductivity, k α the relative permeabilities depending on the saturations, µ α the viscosities, ρ α the densities and g the gravity vector. The saturations are coupled via S n + S w = 1, and a relation between the two pressures is typically given by the capillary pressure relation P c (S w ) = p n − p w for a monotonically decreasing capillary pressure curve P c . In this case we obtain the full two-phase flow system, which consists of two strongly coupled, possibly degenerate N 3 parabolic equations. The multiscale character is encoded in the conductivity K, the permeabilities k α , the porosity Θ and the capillary pressure P c , which strongly vary on several scales down to the pore scale. The interplay of these oscillatory, non-smooth, large-amplitude coefficients plus the non-linear nature of the problem makes its numerical solution extremely challenging. The computational domains appearing in hydrogeological simulations typically span horizontally over several square kilometres and have thicknesses (in the vertical direction) of a few hundred metres. On the other hand, suitable geological and hydrological measurements allow resolutions of rock and soil parameters that are of the order of centimetres. These fine-scale variations have to be resolved by numerical methods in order to get accurate macroscopic approximations on the kilometre scale; see Niessner and Helmig (2006) and the references therein.
Besides the natural emergence of multiscale problems in the geosciences, the interaction of effects across scales can also trigger astonishing physical phenomena. Metamaterials are engineered multiscale and multiphase materials with electromagnetic or other wave-related properties that are otherwise unattainable in nature. They can exhibit surprising effects such as negative optical refraction Efros 2003, Efros andPokrovsky 2004) or its acoustic and seismic analogues. In wave optics, the engineering of such materials paves the way to perfect flat lenses without restrictions on the obtainable spatial resolution of the imaged objects (Pendry 2000) or invisibility cloaking (Pendry, Schurig and Smith 2006). On a mathematical level, negative optical refraction can be modelled in terms of Maxwell's equations and additional material laws. In the frequency domain, this leads to partial differential equations (PDEs) for the electric and magnetic field intensities E and H, namely given the current density J and the wave number κ. The permittivity µ and the permeability σ are strongly heterogeneous coefficients that vary between some background material and certain built-in resonators with highly contrastive wavepropagation properties. The negative index is the result of a smart combination of the choice, scaling and arrangement of large arrays of microscopic resonators in a macroscopic component. For perfectly periodic arrangements (i.e. periodic µ and σ) negative refraction can be predicted from the mathematical models via homogenization theory; see Schweizer (2013, 2016) and Lipton and Schweizer (2018). However, any deviation from such idealized geometrical restriction, for example to study the effect of the unavoidable imperfection of production processes, is computationally challenging.
In many other applications, it is exactly the deviation from any geometric structure -often referred to as disorder -that causes desirable physical effects. A representative example is Anderson localization (Anderson 1958), originally referring to the possible electron localization in a sufficiently disordered (random) lattice, which implies the absence of diffusion of electron waves in the lattice. Just like negative refraction, Anderson localization is an almost universal wave phenomenon known for sound (Hu et al. 2008), electromagnetic waves such as light (Wiersma, Bartolini, Lagendijk and Righini 1997) and also matter waves in the context of Bose-Einstein condensates (Billy et al. 2008, Roati et al. 2008).

Limitation of direct numerical simulation
In the applications above and many others, scientists are confronted by heterogeneous non-periodic microstructures on several non-separable scales or even a whole continuum of scales. Although engineering, physics and other applied sciences often provide proper models of PDEs that implicitly describe the phenomena arising from these microstructures, the problems are intractable for an analytical solution as soon as idealized periodic (or stationary) regimes are left. Hence their understanding is intrinsically tied to numerical simulation. However, the coexistence of heterogeneous microstructures and high contrast, which underlies the target (physical) phenomena, creates an almost unbreakable barrier for direct numerical simulations. Even for the simplest deterministic model problem (e.g. an elliptic PDE modelling heat diffusion in a composite material), the attempt to resolve the whole range of scales in a direct numerical simulation leads to an almost intractable number of degrees of freedom. For non-linear, non-stationary or non-deterministic problems and the corresponding eigenvalue problems, the required computational work of direct simulations easily exceeds today's computing resources by multiple orders of magnitude. The truncation of microscopic information or its naive averaging would make simulation tractable but will lead to questionable results. While in engineering practice this crude oversimplification can be compensated to some extent by calibration to experimental data, it has severe risks and clear limitations when new phenomena are to be explored or real data are rare due to the high complexity or cost of physical experiments.
We shall illustrate the phenomenon of under-resolution in the prototypical example of linear diffusion with a rapidly oscillating diffusion coefficient A. To be precise, let us seek u ∈ V := H 1 0 (D) as the weak solution of −∇ · (A(x)∇u(x)) = f (x) in D. (1.2) Here D is a polyhedral Lipschitz domain, f ∈ L 2 (D) denotes a given source term, and A ≥ α > 0 is a bounded, scalar coefficient that is varying on a very fine scale. Despite its simplicity, this linear heterogeneous diffusion problem is an important building block of the aforementioned more complex PDE models, and it is clear that it is a necessary benchmark to be passed by a numerical method. For example, problems of the form (1.2) appear in the two-phase flow equations as part of Darcy's law (1.1) or in acoustic wave propagation. This shows the practical relevance of the model problem beyond academic benchmarking. The finite element method (FEM) is certainly the most popular and sophisticated approach to the computational solution of such problems. Its simplest standard first-order variant (P 1 -FEM) seeks a globally continuous and piecewise affine N 5 approximation with respect to a regular simplicial mesh (triangulation) T H , i.e. u H ∈ U H := P 1 (T H ) ∩ V, where P 1 (T H ) is the space of piecewise affine functions with respect to T H . Since the gradients of these finite element functions are constant on each T ∈ T H , the finite element solution u H satisfies, for all v H ∈ U H , where B denotes any coefficient such that ⨏ T A dx = ⨏ T B dx on each T ∈ T H . Obviously the FEM cannot be uniformly accurate for coefficients with this property, since the potential variations of A (or B) are fully averaged out and remain hidden for the finite element approximation. Note that neither the amplitude of the variations nor the speed of variations is seen by ⨏ T A dx, even though they are crucial for an accurate solution. Hence a standard finite element approach like the one above will lead to reasonable results only if the coefficient A is well approximated by its integral means on the elements of the mesh T H . This statement is easily illustrated and validated in the one-dimensional situation.
For equidistant partitions T H of D into intervals of width H > 0, where H is an integer multiple of ε 1, the integral means of these two coefficients agree on all T ∈ T H , that is, the corresponding P 1 -FEM cannot distinguish them. For the right-hand side f ≡ 1, the exact solution of the model problem (1.2) for the coefficient A reads Since the global amplitudes differ roughly by a factor of √ 3/2, the distance between the two solutions u A − u B is large in the H 1 (D)-norm, but also in the energy norm and the L 2 (D)-norm. Due to its nodal exactness in the constant coefficient case, the obtained FEM approximation will be a nodal interpolation of u B . Hence, although the FEM provides the energy-best approximation to both u A and u B , it totally 6 R. A , P. H D. P fails to approximate u A in the regime H ≥ ε of under-resolution. In the classical a priori error analysis, this observation correlates with a large H 2 (D)-norm of u A , which scales like ε −1 and appears as a multiplicative factor in the usual error estimates. For more details on this one-dimensional example, we refer to Målqvist and Peterseim (2020, Chapter 2).
We would like to emphasize that the failure of the FEM for rapidly oscillating coefficients is a general observation. This is due to the fact that the method only incorporates discrete information of the coefficient A where the amount of information is at most proportional to the number of degrees of freedom. In Example 1.1 these were the integral means over mesh elements. It might as well be some point values when numerical quadrature is used. Even for higherorder methods, where additional information per element is used, the fundamental problem of under-resolution remains (Peterseim and Sauter 2012). Unless all oscillations of the coefficients are sufficiently resolved by the underlying mesh, the FEM fails to resolve the solution, obtaining neither microscopic features nor macroscopic behaviour. In fact, given the universality of polynomials or other problem-independent shape functions in standard FEMs, better behaviour cannot be expected. Quite on the contrary, using variable coefficients, one can provoke arbitrary slow convergences of the FEM (Babuška and Osborn 2000). Clearly, further complications of the problem, for example non-smooth and high-contrast coefficients or singular perturbation in the PDE, will strengthen effects that take the direct numerical simulation by standard FEMs beyond easy reach.

Numerical homogenization
The only way out of this limitation is adaptation of the discretization methodology to the particular problem at hand, i.e. to the coefficients of the PDE. For problems with periodically oscillatory coefficients, this can be achieved by mathematical homogenization. Homogenization is a technique from mathematical modelling that does not consider the PDE as a single fixed equation but rather as a member of a whole family of problems. These problems are all of the same form and they are parametrized by a characteristic fine-scale unit ε, which enters through some coefficient A = A ε . In such a setting, mathematical homogenization theory studies suitable limits as ε tends to zero and aims to identify a macroscopic limit problem that is easy to solve. The list of standard references is long and includes monographs such as those by Bensoussan, Lions and Papanicolaou (1978), Jikov, Kozlov and Oleinik (1994) and Cioranescu and Donato (1999).
Mathematical homogenization is constructive (i.e. it characterizes the limit problem explicitly or implicitly) for locally periodic coefficients and certain types of stochastic problems. For prototypical linear elliptic PDEs with perfectly periodic coefficients, mathematical homogenization provides an effective (macroscopic) PDE with a constant (effective) coefficient, which is implicitly described by an additional PDE on the microscopic periodic cell. Although the theory of homo-genization can also treat fairly general non-periodic coefficients in the framework of Γ-, G-and H-convergence (Spagnolo 1968, De Giorgi 1975, Murat and Tartar 1978, the constructive results based on cell problems rely on strong structural assumptions such as (local) periodicity and scale separation. More precisely, in deterministic settings, constructive representations of the homogenized coefficient require coefficients of the form A(x, x/ε), where the first variable is a (macroscopic) slow variable and where A(x, •/ε) is ε-periodic and describes the rapidly oscillating fine-scale behaviour. Hence the scale separation of the problem needs to be naturally embedded in the coefficient by an explicit separation into slow and fast variables. Numerical methods such as the heterogeneous multiscale method (HMM) are computationally meaningful generalizations of mathematical homogenization; see E and Engquist (2003), Abdulle (2005), Abdulle, E, Engquist and Vanden-Eijnden (2012) and the references therein. The scope of these empirically very successful approaches is, however, tied to settings with scale separation or moderate perturbations of locally periodic structures.
To shift the range of tractable multiscale problems beyond periodicity and scale separation, the notion of numerical homogenization of this paper will be much more general. Rather than studying the limit of infinite scale separation, we will take the perspective of numerical approximation and instead employ the flexibility of variational numerical methods such as the FEM. We will replace universal (typically polynomial) shape functions with problem-dependent shape functions that incorporate the required subscale information. This idea goes back to Babuška and Osborn (1983). The advantage of this viewpoint is that the 'coarse scale' is no longer some unique notion that can be directly extracted from the problem, but rather a design choice made by the user, depending on the available computing resources and the desired accuracy or resolution level. Hence this perspective does not require there to be a natural scale separation in the considered problem and, at the same time, fine-scale information is always accessible via modified shape functions.
It is rather easy to identify some generalized finite element shape functions that lead to universally accurate numerical homogenization. Natural candidates are eigenfunctions of the partial differential operator under consideration or images of simple functions under its inverse. The true art in numerical homogenization is to identify a localized basis of the generalized finite element space to preserve the favourable sparsity and complexity properties of standard FEMs without affecting the universal accuracy. As will be outlined below, the search for localized bases turned out to be delicate and marked an open problem in the community for many years. A rigorous existence result of an exponentially localized basis for the model problem (1.2) was first given by Grasedyck, Greff and Sauter (2012). The first constructive result followed a few years later by Målqvist and Peterseim (2014) and is based on the variational multiscale method of Hughes, Feijóo, Mazzei and Quincy (1998). It is now known as the localized orthogonal decomposition (LOD), and it inspired many of the recent developments that motivate this review paper. To connect the various approaches and to present the main concepts of modern numerical homogenization techniques in a unified way, we shall present an abstract variational framework that forms the core of this work.

Outline
Before presenting the general framework, we will give a broader survey of the history of numerical homogenization methods in Section 2. An abstract unified framework, which is based on a decomposition of the solution into fine-scale and problem-specific coarse-scale features, is then presented in Section 3. We prove an abstract exponential decay result for suitable corrector functions, which then leads to a localized numerical homogenization method. The subject of Section 4 is an illustrative application of the framework to the prototypical example of linear diffusion with a rapidly oscillating diffusion coefficient. Further wave-type applications such as the Helmholtz or Maxwell problem are discussed in Section 5. Finally, we investigate connections of the abstract framework to early variational concepts in Section 6 before we provide a short conclusion and identify open problems in Section 7.

Historical survey
Numerical homogenization has a long history which dates back to at least the 1970s, when considerations for stationary multiscale problems were first found in Babuška (1976), based on analytical homogenization theory. In the 1980s, early approaches for oscillatory time-dependent problems were investigated by Scheid Jr (1983) and Engquist (1987), by exploiting scale separation and the particular structure of evolution problems. The first milestone in the development of multiscale methods for one-dimensional problems without scale separation goes back to Babuška and Osborn (1983), who proposed constructing generalized finite element methods by incorporating fine-scale features of the problem directly into the discrete solution space. The idea was later generalized to two-dimensional problems in Babuška, Caloz and Osborn (1994).
The true dawn of numerical homogenization methods, however, was in the mid-1990s, when numerous different approaches emerged and evolved with an increasing scientific interest. One important development was the seminal work by Hughes (1995), who proposed the so-called variational multiscale method (typically abbreviated VMM, VMS[cheme], or VMF[ramework]). In fact, this fundamental work can be identified as the the ancestor of many of today's commonly used multiscale methods. Hughes (1995) discussed the possibility of stabilizing the conventional FEM for multiscale problems by constructing local Green's functions and using them to incorporate under-resolved scales into a coarse-scale variational formulation. Brezzi, Franca, Hughes and Russo (1997) revisited this approach and showed that it is equivalent to the concept of residual-free bubbles. Residual-free N 9 bubbles are smooth element bubbles that vanish on the boundary of grid elements and, at that time, they were a well-established tool for stabilizing finite elementbased Galerkin methods (e.g. in the context of advection-dominated problems). Essentially, Brezzi et al. (1997) proved in their work that residual-free bubbles can also be used to stabilize Galerkin methods for multiscale problems, where the bubbles are constructed by integrating the local Green's functions from Hughes (1995). Another approach that emerged at the same time was the multiscale finite element method (MsFEM) introduced in Hou and Wu (1997). The framework of the MsFEM is also built upon the use of residual-free bubbles; however, contrary to Brezzi et al. (1997), they are interpreted and justified by exploiting analytical homogenization theory.
Hence, in 1997, three widely equivalent main concepts for numerical multiscale methods were available: VMS, the residual-free bubbles approach and MsFEM. We classify all these methods as variational approaches as they all aim for a correction or stabilization of conventional discretizations by incorporating information from unresolved scales globally into a discrete problem. The diverse advancement of these methods to the state-of-the-art technology in numerical homogenization is briefly outlined below.

History of variational approaches
Hughes' initial work from 1995 was still based on the availability of particular meshes where fine-scale information is strongly localized to these elements (hence making it sufficient to compute local Green's functions on each element). These somewhat unrealistic assumptions made the approach quite restrictive. However, this initial shortcoming was fixed just three years later, when Hughes et al. (1998) refined and generalized the approach by making the framework more abstract and by introducing global corrector Green's functions (also called fine-scale Green's functions) that are constructed from fine-scale residuals. This work can hence be considered as the first general formulation of the VMS. In fact, it was already sufficiently general to be applicable not only to FEM-based realizations but also to wavelet spaces, as simultaneously proposed by Dorobantu and Engquist (1998) and later refined by Engquist and Runborg (2002) and Chertock and Levy (2005).
One issue of the initial VMS was how to make proper choices for the coarsescale and the fine-scale space, so that the direct sum of these two spaces equals the full solution space for the original problem. Hughes and Sangalli (2007) discovered that projections are particularly useful in this context. They proposed introducing a projection onto a conventional coarse space, for example a finite element space on a coarse mesh, which makes the coarse-scale space identical to the image of the projection and the fine-scale space can be defined as the kernel of the projection. This setting allowed for an explicit representation of the global corrector Green's functions depending on the regular Green's functions of the original problem. The operator representation of these corrector Green's functions 10 R. A , P. H D. P can then be used to 'correct' regular (coarse) basis functions and hence incorporate fine-scale information into the discrete solution space. Since the ideal corrector is global, this formally leads to global basis functions for the corrected space. However, already early on it was numerically observed that these ideal corrected basis functions are quasi-local in the sense that they show an exponential decay and can in fact be truncated to small nodal environments; see Målqvist (2005), Larson and Målqvist (2005), Målqvist (2011) and Hughes and Sangalli (2007). This was successfully exploited in the construction of a posteriori error estimators in a series of papers by Larson and Målqvist (2005, 2007, 2009) and in Nolen, Papanicolaou and Pironneau (2008). The first rigorous proof for this exponential decay for elliptic problems was established by Målqvist and Peterseim (2014) and an efficient localization strategy that exploits the decay without numerical pollution effects was suggested by Henning and Peterseim (2013). Later, the strategy was generalized by Henning and Målqvist (2014) and Henning, Morgenstern and Peterseim (2015) in terms of the admissible finite element spaces and boundary conditions, but still for symmetric elliptic problems. At this point, the name localized orthogonal decomposition (LOD) was introduced for these fully localized realizations of the VMS. Peterseim (2016) generalized the localized framework yet again so that it would also include non-symmetric and inf-sup stable problems. An alternative localization strategy motivated by the theory of additive Schwarz methods was given in Kornhuber and Yserentant (2016) and Kornhuber, Peterseim and Yserentant (2018). The step from conforming to non-conforming spaces was taken by Elfverson, Georgoulis, Målqvist and Peterseim (2013b) in the context of discontinuous Galerkin methods and later in a more general way by Maier (2020). Finally, we also note that the VMS/LOD concept can be interpreted as a particular realization of a game-theoretical approach to numerical homogenization, as established by Owhadi (2017) and Owhadi and Scovel (2019).
Going back historically to the year 1997, residual-free bubbles, as suggested by Brezzi et al. (1997), became a popular tool in the construction of multiscale methods. However, it was more the homogenization-based interpretation of these bubbles (Hou and Wu 1997) that really triggered increasing interest, so that MsFEM became one of the most popular multiscale methods for years to come. Important steps in its rigorous error analysis for elliptic homogenization problems (with periodicity and scale separation) were established by Hou, Wu and Cai (1999), Efendiev, Hou and Wu (2000), Hou, Wu and Zhang (2004), Efendiev and Hou (2009) and Le Bris, Legoll and Thomines (2014b). Important asymptotic convergence results in non-periodic homogenization settings were established by Gloria (2006Gloria ( , 2008Gloria ( , 2012. A Crouzeix-Raviart MsFEM that is particularly suited to problems in perforated domains was developed by Le Lozinski (2013, 2014a). A posteriori error estimators for MsFEM approximations were derived by Henning, Ohlberger and Schweizer (2014c) and Chamoin and Legoll (2018). The interpretation of the residual-free bubbles as special correctors from analytical homogenization theory gave rise to several new realizations and modifications of the MsFEM. One such modification proposed in Allaire and Brizzi (2005) suggested that if the MsFEM is indeed mimicking a homogenization process, then the additive residual-free bubbles should not be interpreted as an additive modification of standard finite element basis functions, but rather as a change of coordinates that transforms a homogenized function into its multiscale counterpart (i.e. the multiscale basis function converges strongly in L 2 to the conventional basis function if the speed of the fine-scale variations tends to infinity). In this interpretation, the bubbles are the result of a concatenation of a nodal basis function with a coordinate transformation. Indeed, for the case of piecewise linear basis functions, an additive modification and a concatenative modification yield the same results, which is why the corresponding methods are identical for this case. However, the approach by Allaire and Brizzi also allowed for a straightforward generalization to higher-order finite element spaces. It was extended by oversampling in Hesthaven, Zhang and Zhu (2014) Owhadi and Zhang (2007) proposed that it is not even necessary to interpret the change of coordinates as a transformation that involves the homogenized coefficient, but that it can be a general harmonic coordinate transformation that no longer relies on homogenization theory. In fact, the proposed harmonic coordinate transformation was historically the first approach that could be analysed without the assumption of scale separation, which was a significant result. A rigorous localization still remained open. In general, coordinate transformations such as the one in Allaire and Brizzi (2005) or Owhadi and Zhang (2007) can be considered as a limited global information, that is, some precomputed a priori knowledge that simplifies the computation of multiscale approximations at a later stage. However, the global information is not necessarily precomputed but could also be induced from measured data.
Aarnes (Aarnes 2004, Aarnes, Efendiev andJiang 2008) proposed that such a limited global information could be used to construct multiscale basis functions by imposing it as an oscillatory boundary condition on the grid cells. In Aarnes (2004) this was done in the context of mixed finite elements. Later, this idea was transferred to Lagrange finite elements (Jiang, Efendiev and Ginting 2007, 2010, Jiang and Efendiev 2012, following a strategy that is closer to the original spirit of the MsFEM, where stabilization is based on additive bubbles. More precisely, the MsFEM using limited global information does not exploit a change of coordinates as in Allaire and Brizzi (2005) and Owhadi and Zhang (2007). Instead, the global information is combined with the original nodal basis functions in an additive or multiplicative way. This ensures that the multiscale basis functions keep their original local support. However, the method can no longer be rigorously analysed.
To overcome this issue, Owhadi and Zhang (2011) proposed using flux-transfer as an alternative for a coordinate transformation. The main advantage of a fluxtransfer is that its computation can be localized to small domains, whereas this 12 R. A , P. H D. P can be an issue for the often strongly non-local coordinate transformations. Owhadi, Zhang and Berlyand (2014) then proposed the concept of rough polyharmonic splines based on (local) constrained energy-minimization problems. The approach has many conceptual similarities to LOD, and both fit into the more recent game-theoretic approach to numerical homogenization (Owhadi 2015(Owhadi , 2017 that ultimately led to the general framework of operator-adapted wavelets (gamblets) (Owhadi and Scovel 2019). For further works in this context we refer to Hou, Li and Zhang (2017b,a), Hou and Zhang (2017) and Hou, Huang, Lam and Zhang (2018).
Other variational multiscale approaches that emerged in the past decade are the localized reduced basis multiscale method (Albrecht, Haasdonk, Kaulmann and Ohlberger 2012, Ohlberger and Schindler 2014, the multiscale hybridmixed method (Araya, Harder, Paredes and Valentin 2013), FETI-DP methods (finite element tearing and interconnect -dual-primal) with modified coarse spaces (see Pechstein, Sarkis and Scheichl 2013, Klawonn, Radtke and Rheinbach 2015, Klawonn, Kühn and Rheinbach 2018, or the hybrid high-order methods proposed in Cicuttin, Ern and Lemaire (2019). For a review of the combination of multiscale and reduced order techniques we refer to Buhr et al. (2021).

History of homogenization-based approaches
A very different concept that is wholly unrelated to Hughes' variational approach goes back to the celebrated work by E and Engquist (2003), introducing the heterogeneous multiscale method (HMM). Contrary to previous approaches, the HMM is not based on extracting fine-scale information in the whole computational domain, but only locally in small representative cells that just cover a very small fraction of the computational domain. From these local computations, fine-scale information is extracted and then upscaled. This allows us to determine a macroscopic differential operator that describes the effective coarse-scale behaviour of the underlying process to be modelled.
This approach strongly relies on scale separation and the availability of representative cells, which is typically the case in engineering applications. In such a setting, the HMM is computationally highly efficient, since it only requires us to process fine-scale information in very small subregions. The approach is inspired by classical homogenization theory, where the local HMM-problems can be seen as cell problems that determine the homogenized coefficients of the differential equation. Exploiting this relation, a priori error estimates were first obtained in Abdulle (2005) and a posteriori error estimates in Ohlberger (2005). An important computational improvement of the HMM was proposed in Abdulle and Bai (2012). Here the methodology is combined with a reduced basis strategy that allows us to determine an appropriate number (and location) for the cell problems to make the method as efficient as possible. Since the HMM is extensively presented in Abdulle's Acta Numerica article ), we will not discuss its N 13 history further. For another survey of the methodology we refer to Abdulle (2009), and for various applications we refer to Henning and Ohlberger (2009), Abdulle and Budáč (2015), Abdulle and Pouchon (2016), Henning, Ohlberger and Verfürth (2016), Arjmand and Runborg (2017), Abdulle and Pouchon (2018), Hochbruck, Maier and Stohrer (2019), Verfürth (2019a) and the references therein. More recent developments that focus on the reduction of resonance errors can be found in Gloria (2012), Arjmand and Stohrer (2016), Arjmand and Runborg (2016) and Abdulle, Arjmand and Paganoni (2019).
Other homogenization approaches make use of two-scale expansions of the solution (Matache and Schwab 2002) or, more generally, are based on particular multiscale homogenized equations (Hoang and Schwab 2005) which can be obtained with the concepts of two-scale convergence (Allaire 1992) or generalized multiscale convergence (Allaire and Briane 1996). This allows us to incorporate effects on a hierarchy of different scales and leads to a problem in where N is the number of different scales and d the space dimension. To overcome the curse of dimensionality that would be striking in such a setting, Hoang and Schwab (2005) proposed discretizing the multiscale homogenized equation using sparse grids. Later the concept was generalized and extended to other settings; see Hoang (2008), Harbrecht and Schwab (2011), Xia and Hoang (2014), Chu and Hoang (2018), Hoang (2019, 2020) and the references therein.

History of spectral approaches
At the same time as direct variational methods and homogenization-based methods, another approach evolved based on identifying relevant fine-scale information via (local) spectral problems and incorporating this information in problem-specific basis functions that are then used in a variational formulation. Clearly this point of view shares many similarities with the approaches discussed in Section 2.1. However, since (local) spectral decompositions form the cornerstone of the following approaches, they are conceptually different to the previous methods and can, arguably, be classified as a third way to numerical homogenization. The first results in this direction go back to Lipton (2011) andHou (2013). Hence they were historically part of the main wave of new methods in the 2010s that helped to overcome the barrier of numerical homogenization beyond periodicity.
The underlying question of spectral-based approaches is the following: What is the optimal subspace of a given dimension n so that the image of a compact operator P between two Hilbert spaces is approximated as accurately as possible? The operator P is typically related to the multiscale differential operator and the optimality of subspaces is measured in terms of the Kolmogorov n-width. It is known (see Pinkus 1985) that the optimal subspace of dimension n is spanned by the eigenfunctions to the n largest eigenvalues of the self-adjoint operator P * • P. Even though this potentially yields a computational strategy, it is not directly 14 R. A , P. H D. P applicable to multiscale problems since it would involve the computation of global problems, which increases rather than decreases the computational complexity. A suitable localization that avoids this issue was first proposed by Babuška and Lipton (2011), who suggested dividing the global domain into small subdomains. On each of these subdomains a suitable (local) compact operator is considered that accounts for fine-scale information. After that, a finite-dimensional subspace is constructed on each of these subdomains by solving local spectral problems as mentioned above. Through multiplication with a partition of unity, the local subspaces are made conforming so that the final set of (local) basis functions can be used in a standard Galerkin approach. The authors proved quasi-optimality of the final multiscale space in the sense that Galerkin approximations to symmetric elliptic problems converge exponentially fast depending on the dimension of the space. The approach was later generalized to other applications, for which we refer to Smetana and Patera (2016), Babuška, Lipton, Sinz and Stuebner (2020) and the references therein. As an alternative to the spectral construction of basis functions, it is also possible to use randomized sampling algorithms to approximate the optimal subspaces with provable nearly optimal convergence rate; see Buhr and Smetana (2018). This can be computationally more efficient and allows us to exploit modern computer architectures in a parallel implementation. A similar approach was proposed by Chen, Li, Lu and Wright (2020). The strategy of identifying important fine-scale features by solving local spectral problems was also applied in combination with variational approaches to obtain even higher accuracy. One of the first works that hinted in this direction is based on component mode synthesis, and was proposed by Hetmaniuk and Lehoucq (2010). This approximate component mode synthesis shares important features from the VMS framework as well as from the MsFEM framework. Spectral problems are solved on grid cells as well as on interfaces (edges) to identify local fine-scale features, and regular multiscale finite element basis functions are used to introduce a nodal coupling. Analytical results and implementation details can be found in Hetmaniuk and Klawonn (2014) and Heinlein, Hetmaniuk, Klawonn and Rheinbach (2015). The concept of problem-specific coarse-scale spaces is rooted in the field of fast algebraic solvers, where suitable coarse spaces are used for the construction of good preconditioners. Two of the many other examples are the partial inversion of hierarchical matrices (Hackbusch 2015) or the GenEO coarse spaces (generalized eigenproblems in the overlap), as introduced by Spillane et al. (2014).
Another spectral approach was proposed by Efendiev et al. (2013). This generalized multiscale finite element method (GMsFEM) combines characteristics of the conventional MsFEM (Hou and Wu 1997) with the aforementioned spectral approach (Babuška and Lipton 2011). The classical MsFEM requires computation of local residual-free bubbles with a homogeneous Dirichlet boundary condition (on boundaries of coarse grid elements). In order to increase the accuracy of the resulting multiscale space, one could think of additionally computing bubbles with boundary conditions that are induced by fine nodal basis functions. Since this

Variational framework for numerical homogenization
This section presents a unified variational framework for the numerical homogenization of some general class of linear PDEs in divergence form. The framework merges different concepts, which were successfully applied to multiscale problems. More precisely, one may recognize ideas from the LOD as presented in Peterseim (2016) and Maier (2020) as well as from operator-adapted wavelets (Owhadi and Scovel 2019).

Abstract model problem
Let D ⊆ R d denote a bounded Lipschitz domain and V some closed subspace of H 1 (D; C); • denotes the complex conjugate. Throughout this section • V denotes the full H 1 (D)-norm, independent of the particular choice of the subspace. Accordingly, we write • V,ω for the H 1 -norm on a subdomain ω ⊆ D.
The variational model problem as it typically appears for the weak formulation of second-order PDEs then reads: seek u ∈ V such that (3.1) The sesquilinear form is assumed to satisfy an inf-sup condition of the form Note that the two values are indeed equal as long as they do not vanish; see Demkowicz (2006). The well-posedness of the abstract problem (3.1) along with the a priori estimate then follow from Babuška (1971). Later on, we will also use the equivalent operator form of the variational problem Note that problem (3.1) not only includes the prototypical Poisson-type problem (1.2), which we consider in more detail in Section 4, but also admits non-Hermitian and non-elliptic problems with complex-valued solutions, such as the Helmholtz problem; see Section 5.1.

Quantities of interest and fine-scale space
Keeping the multiscale problems of the Introduction in mind, our goal is to split the solution u into a coarse-scale contribution and a fine-scale part. We tackle this task from a more abstract perspective and define a number of macroscopic quantities of interest, which extract the desired information from the exact solution. These functionals are denoted by q j ∈ V * , j ∈ J, for some finite index set J of size N := |J |. Throughout the article we assume that these quantities are (pairwise) linearly independent. The aim of this subsection is to construct a (discrete) method, which approximates the quantities q j (u) as well as possible. Whereas in the LOD framework (Peterseim 2016) such quantities are implicitly encoded within a suitable quasi-interpolation operator, the functionals are used more explicitly in the framework of operator-adapted wavelets (Owhadi and Scovel 2019), where they are combined to yield a so-called information operator u → [q j (u)] j ∈J .
Examples of the macroscopic quantities of interest q j are local integral means of functions over suitable entities of some finite element mesh or local averaging procedures, rather as they are used for finite element quasi-interpolation operators. We would like to mention four possibilities in more detail.

Example 3.1 (finite element-based quantities of interest).
For simplicity, we let V = H 1 0 (D). Consider some regular subdivision T H of the domain D into closed simplices with the set of interior vertices N H . We will refer to T H as a finite element mesh. The parameter H refers to the mesh size that could represent the macroscopic length scale of interest. Further, let Π p T denote the local L 2 (T)-orthogonal projector on the space of polynomials of degree (at most) p on an element T ∈ T H . Suitable examples of the q j are given as follows.
(a) The integral mean on an element: In this example the set J corresponds to the elements of the mesh T H . Accordingly, N equals the number of elements (simplices). N 17 (b) The previous choice can be generalized to higher orders. To this end, let p ≥ 0 denote the polynomial degree. On a simplex T ∈ T H , we let b T,k , k = 1, . . . , K(d, p) := (d + p)!/(d! p!) denote a set of L 2 (T)-orthogonal basis functions of the space of polynomials of degree p. We then define Note that the particular scaling of the quantities of interest is not important, but it is meaningful in the sense that the definition of the q T,k may, for example, be invariant under rescaling of the domain D and the underlying mesh. Here the scaling is chosen such that which is also in agreement with the case p = 0 in (a). Further note that the b T,k may be generalized to non-polynomial bases of any kind. Here N is the product of K(d, p) and the number of simplices in T H .
(c) One example, which is based on interior vertices rather than elements, is given by Here T(z) := {T ∈ T H | z ∈ T } is the set of triangles with node z and |T(z)| is the number of triangles in this set. In this example, J equals the set of interior nodes N H and N its cardinality.
(d) Finally, we mention the seemingly canonical choice for z ∈ N H with the nodal basis functions Λ z (hat functions) of the P 1 -finite element space. Here, again, N equals the number of interior nodes.
Note that many other choices of quantities of interest are possible. In particular, they could be based on local eigenfunctions in the spirit of the spectral approaches to numerical homogenization as discussed in Section 2.3. The examples above are based on local volume averaging. The averaging with respect to lower-dimensional subdomains is possible provided that they are stable in V. Following the nonconforming Crouzeix-Raviart finite element method, one may think of averages over (d − 1)-dimensional mesh entities (edges or faces). This includes point evaluation in the one-dimensional case. As analysed by Owhadi and Scovel (2019), point evaluation is also a valid option for problems posed in higher-order Sobolev spaces, which are beyond the scope of the present paper. With the macroscopic quantities of interest at hand, we define the fine-scale space Note that W is closed as it is defined as the intersection of (finitely many) kernels of continuous functionals q j ∈ V * . Two projections C : for any v ∈ V and all w ∈ W. Since these projection operators are defined in terms of the sesquilinear form a, they are obviously problem-dependent. Further, we have C = C * in the case of a Hermitian problem. We need to discuss the well-posedness of the two operators, which is directly connected to the well-posedness of variational problems (3.4) and hence an inf-sup condition as in (3.2) but restricted to the closed subspace W ⊆ V. To be precise, we assume Remark 3.2. If a is a coercive sesquilinear form, then the inf-sup stability (3.2) is automatically satisfied for V and all closed subspaces of V, which includes (3.5).
In the general case, however, this stability property may be lost when restricting to a subspace.

Lemma 3.3 (fine-scale projections).
Let the inf-sup condition (3.5) be satisfied with constant α W and recall that C a is the continuity constant of a. Then C and C * are well-defined, surjective and continuous projections onto W. Moreover, we have Proof. The well-posedness of C and C * follows again by the continuity of a, the assumed inf-sup stability (3.5) and Babuška (1971). Further, C and C * equal the identity operator on W, which implies the surjectivity. For v ∈ V, Cv is defined by for all w ∈ W, where •, • denotes the duality pairing. In particular, we have the stability estimate The estimate for C * follows in the same way. Remark 3.4 (characterization of the fine-scale space). Since C and C * are linear projections, we directly conclude that Based on the kernel space W and the projections C and C * , we construct the (finite-dimensional) trial and test spaces In the following, these spaces will also be denoted as the problem-specific coarsescale or multiscale spaces. The tilde in the notation represents the incorporation of oscillatory fine-scale features. We will show that (under certain assumptions) both spaces have dimension N and that they give rise to stable decompositions of the space V.

Theorem 3.5 ('a-orthogonal' decompositions).
Let the quantities of interest q j ∈ V * , j ∈ J, be (pairwise) linearly independent. Under the assumption of (3.5), the spaces U H and V H have dimension N = |J | and define conforming decompositions of the overall space, namely Proof. Since C and C * define surjective and continuous projections, we know that V can be decomposed into Obviously, the functions v j , j ∈ J, are linearly independent, so that their span defines an N-dimensional space. Further note that v ∈ V is orthogonal to v j (in the given inner product of V) if and only if q j (v) = 0. Hence we can regard W as a V-orthogonal complement, namely This yields the decomposition As a consequence, U H and V H must have dimension N. Finally, the two relations a( U H , W) = 0 and a(W, V H ) = 0 follow directly by the definition of the projections C and C * , respectively. 20 R. A , P. H D. P Remark 3.6 (a-orthogonal decomposition). In the special case of a Hermitian and coercive sesquilinear form, a defines an inner product in V. Then Theorem 3.5 means that U H = V H is a-orthogonal to the space W.

Remark 3.7 (characterizations of U H and V H ).
Two alternative characterizations of the coarse spaces are given below.
(i) Theorem 3.5 gives rise to the following definitions of the discrete coarse spaces: (ii) A less obvious characterization of the discrete ansatz space is given by To see this, first note that both spaces have dimension N. A function Characterizations of this form have already been considered for the motivation and construction of suitable coarse spaces for numerical homogenization; see Grasedyck et al. (2012) and Owhadi et al. (2014). Note, however, that the basis functions A −1 q j used in (3.7) have global support without computationally relevant decay properties for the macroscopic quantities of interest that we have in mind. The constructive choice of an approximately local basis is the major achievement for numerical homogenization that will be discussed in Section 3.5 below.

Ideal numerical homogenization
The discrete variational problem now reads: seekũ H ∈ U H such that Note that the trial and test space do not coincide in general, that is, we consider a Petrov-Galerkin scheme. Its well-posedness follows from the inf-sup conditions Note that in the last step we used the fact that the complementary projections C * and (1 − C * ) have the same norm in the Hilbert space V; see Szyld (2006). The above argument is almost verbatim the same as in Peterseim (2017), where a special case of the present framework was discussed. Experts in the theory of mixed finite elements may recognize that (1 − C * ) acts as a Fortin operator; see Boffi, Brezzi and Fortin (2008). We call (3.8) the ideal discrete problem as it satisfies the following desirable approximation property.
Lemma 3.8 (conservation of quantities of interest). The error between the exact and discrete solution satisfies u −ũ H = Cu ∈ W. Hence it holds that Moreover, the energy error satisfies Proof. Due to the conformity V H ⊆ V, we have a Galerkin orthogonality of the form We have thus proved that the Petrov-Galerkin method (3.8) preserves the quantities of interest of the exact solution. Under further assumptions, this readily implies global error estimates. In order to quantify the quality of the approximation, we introduce the abstract (discretization) parameter H. As seen in Example 3.1, this may correspond to a mesh size. More generally, H represents a selectable lower bound of length scales to be captured. Theorem 3.9 (global error estimates). Assume that the exact and discrete solutions u andũ H are well-defined. Further assume that there exists a constant C W > 0, which does not depend on H, such that Then the difference of the exact and the discrete solution satisfies Proof. Let e := u −ũ H denote the error. Since e ∈ W by Lemma 3.8, we directly conclude the second estimate. For the first estimate we use that all q j vanish on W, leading to According to this theorem, accuracy of the method is achieved by two properties of the quantities of interest, that is, the spectral filtering of highly oscillatory functions represented by a large Rayleigh quotient of the kernel functions and favourable approximation properties in W * ; see (3.9). In fact, in many cases the two properties are closely related. We shall quantify the abstract error estimates in terms of discretization parameters such as the coarse mesh size for the choices introduced in Example 3.1.

Example 3.10 (global error estimates).
Within the setting of Example 3.1 and with F = ( f , •) L 2 (D) , f ∈ L 2 (D), we get the following global error estimates for the discrete problem (3.8).
(a) For the choice q T = ⨏ T • dx we obtain with the Poincaré inequality on convex simplices (see Payne andWeinberger 1960, Bebendorf 2003), for w ∈ W with Note that for f ∈ H 1 (D) we can squeeze out one additional factor H due to the Poincaré inequality In total, this leads to the estimate At this point, the notation hides all constants that do not depend on the mesh size H. Corresponding estimates also exist in weaker norms for the case of more general right-hand sides F. In the present case V = H 1 0 (D), the estimate is also true for k = −1.
(b) The bound for C W from case (a) remains valid. Including polynomials of order p ≥ 1 leads to the improved bound C W 1/p. Further, replacing Π 0 H with the corresponding higher-order projection operator Π p H , we obtain for with a constant C(p, k) depending on p and k; see Schwab (1998, Chapter 4.5) or Di Pietro and Ern (2012, Chapter 1.4). In total, this gives yields estimates of the form for f ∈ L 2 (D) and accordingly for the more general case with F ∈ V * . For the proof we implement a well-known quasi-interpolation operator. We refer to Målqvist and Peterseim (2020, Chapter 3) for details; see also Section 4.3. Note that additional smoothness in the form of f ∈ H k (D) for k ∈ {1, 2} cannot be fully exploited unless f ∈ H 1 0 (D); see Målqvist and Peterseim (2020, Lemma 8.1).
(d) Finally, for we can achieve the same estimates as in (c) but based on a different projection operator; see Peterseim (2014, 2015).

Characterization of a coarse-scale basis
As mentioned earlier, the construction of a local basis of the multiscale spaces is crucial. Our construction is based on a discrete subspace which has a local basis and satisfies (1 − C)U H = U H . We will see later that the complementary correctors are quasi-local so that a local basis of U H motivates a local (approximate) basis of U H . Before discussing this in detail, we need to quantify the term 'locality'. To do this, we introduce a (possibly artificial) regular triangulation T H of D into closed simplices with mesh size H := sup T ∈T H diam(T). For the sake of simplicity, this partition is assumed to be regular and quasi-uniform. To characterize the shape regularity (non-degeneracy) as well as the quasi-uniformity of T H , we define the global parameter Based on the triangulation T H , we define subsets on which the quantities of interest act. For this, we set D j as a minimal (with respect to the number of elements) element patch (i.e. a connected union of elements in In what follows, we assume that there is some (small) uniform parameter r ∈ N, such that for each j ∈ J we can find a function for all k ∈ J. This means, in particular, that we assume the functions u j to be local in the sense that supp(u j ) is not much larger than D j . Without loss of generality, we assume that D j is an element patch. In some sense, the functions u j are local V-conforming representatives of the quantities of interest q j , which will be useful in many proofs below. They define the space Moreover, the functions u j canonically lead to the definitioñ We will show below that theũ j indeed provide a basis of U H . As already mentioned, it cannot be expected that theũ j are local. Nevertheless, we will observe in the upcoming examples that these functions have favourable decay properties; see Figure 3.2 below.
At this point we would like to emphasize that the basis functions u j of U H will be exploited extensively when studying the decay properties of the operators C and C * . However, they are not of practical interest, in particular in the implementation N 25 of the method. More precisely, the function (1 − C)u j does not depend on the particular choice of the function u j as long as the properties (3.11) are satisfied. To see this, we give an alternative characterization of these functions in the form of a saddle point problem. For k ∈ J seekũ k ∈ V andλ ∈ C N such that This problem is well-posed due to condition (3.11b), the finite number of quantities of interest, and the inf-sup stability of a on W (the kernel of the constraints); see Braess (2007, Chapter III.4). Obviously the unique solution (ũ k ,λ) is characterized independently of u k . Nevertheless, we show thatũ k = (1 − C)u k by proving that ((1 − C)u k , λ) solves the saddle point problem for some λ ∈ C N . It holds that Finally, we characterize the Lagrange multiplier λ with the help of the remaining test functions v ∈ V, which are complementary to W, for example by the basis functions of which defines λ componentwise.
To illustrate some relevant examples for u j , we revisit the choices of q j from Example 3.1.
Example 3.11 (local bases u j ). Consider the setting of Example 3.1 and let P k (T H ) denote the space of piecewise polynomials of degree k with respect to T H . Let T ∈ T H denote an element with vertices z 1 , . . . , z d+1 and corresponding nodal basis functions Λ z j (hat functions). We define the element bubble function Such bubble functions are well known from the theory of mixed methods for the Stokes problem (Brezzi and Fortin 1991) as well as a posteriori finite element error analysis (Verfürth 1996). They satisfy the estimates where the notation ≈ means that we have estimates in both directions with constants, independent of the mesh size H. With these, we may consider the following bases and u T (c) in the one-dimensional setting; see Example 3.11(a).
(a) For q T = ⨏ T • dx a seemingly canonical choice for U H is the space of piecewise constants spanned by the characteristic functions 1 T of the elements T ∈ T H . These functions, however, are not part of V. Instead we need to construct V-conforming replacements. This is done in a two-step procedure. By definition we have that q T (b K ) = δ T K for T, K ∈ T H . Although the b T are a possible choice for the u T , they are not optimal because they fail to approximate constants properly. We consider the following alternative. Let For boundary nodes we set (E H v)(z) := 0. This operator is well known from the theory of domain decomposition (Oswald 1993, Brenner 1994. Further recall that Π 0 H : V → P 0 (T H ) denotes the L 2 -orthogonal projection onto the space of T H -piecewise constants. The concatenation of the two operators defines a stable quasi-interpolation operator However, due to the 'spread' of information by E H , the quantities of interest are not preserved under I H . To prevent this, we define a modified version, using the above bubble functions, namely (3.13) Obviously we have q T • P H = q T . We therefore define (3.14) These functions satisfy condition (3.11b) as well as (3.11a) with r = 1; see the illustration in  (2021), whereas averaging operators are analysed in detail in Ern and Guermond (2017).
(c) For we simply define u z := Λ z , which yields the standard P 1 -finite element space for U H . Thus the subdomains D z equal nodal patches of the triangulation.
(d) For with nodal basis function Λ z , we define the scaled mass matrix M ∈ C N ×N given by M zy := q z (Λ y ). With this, we set leading to the standard P 1 -finite element space for U H as well. However, in contrast to case (c) the basis functions u z are not local but only decay exponentially fast away from z (due to cond(M) 1). A local choice would be possible with some space U H different from P 1 (T H ) ∩ V. In this sense, the functionals of case (c) seem simpler and more natural. That is why we will omit case (d) from now on, although it represents the canonical choice in the original paper by Målqvist and Peterseim (2014).

Lemma 3.12 (characterization of coarse spaces).
Let the correctors C and C * be well-defined. Given (3.11), the functions u j , j ∈ J, are linearly independent such that the dimension of U H is equal to N. Further, we have Proof. For the linear independence assume that j ∈J c j u j = 0 for coefficients c j ∈ C. Due to (3.11b) the application of q k yields for any k ∈ J. Thus all coefficients vanish and the dimension of U H equals N. This also shows that The previous lemma guarantees that the functionsũ j := (1−C)u j indeed provide a basis of U H . An illustration ofũ j for case (a) of Example 3.11 is given in Figure 3.2. Similarly, the functionsṽ j := (1 − C * )u j define a basis of V H .

Remark 3.13 (variants of the method).
Due to (1 − C)U H = U H and the 'orthogonality' relation from Theorem 3.5, we can always replace U H with U H in the ideal discrete problem (3.8). This means that we may equivalently search for u H ∈ U H such that for allṽ H ∈ V H . Obviouslyũ H is then given by (1 − C)u H . Alternatively one may replace the test space V H with U H , which may facilitate the implementation of the method; see Elfverson, Ginting and Henning (2015) and Engwer, Henning, Målqvist and Peterseim (2019). Note that in these approaches the space U H becomes relevant in practice. For case (c) in the previous examples where U H is a standard finite element space, this is very natural and may even lead to computationally superior schemes; see Section 4.4.
Lemma 3.12 motivates the definition of a projection operator P H : V → U H . With the (linearly independent) functions u j , we set (3.15) We would like to emphasize that this is exactly the inverse operator of (1 − C) and (1 − C * ) if restricted to U H ; that is, for u H ∈ U H we have From Lemma 3.12 we can conclude that ker P H = W and thus (1 − P H )v ∈ W for any v ∈ V. Since we also know that C is a projection onto W, we have Clearly the same equality holds for the operator C * . In the following analysis, we will assume that the projection operator P H is stable in V. By this we mean that the continuity constant C P H , which satisfies is independent of the mesh parameter H. Note that this illustrates another requirement for the choice of the functions u j . The resulting projections for the quantities of interest introduced in Example 3.1 are discussed in the following.
Example 3.14 (V-stable projection operator). In the setting of Example 3.1, we obtain the following projection operators.
This result can be extracted from Di Pietro and Ern (2012, Chapter 5.5.2), which analyses the averaging operator, and a careful application of trace theorems; see e.g. Di Pietro and Ern (2012, Chapter 1.1). Hence, in order to estimate Due to the finite overlap of the mesh elements, which can be bounded in terms of ρ, we conclude local V-stability of the form P H u V,T u V, N 1 (T ) . Finally we would like to emphasize that the choice u T := b T discussed in Example 3.11 would not lead to a V-stable projection operator, since the estimate would include a negative power of H.
(b) In a similar manner, one can show the (local) V-stability of P H in the higherorder case. To do this, the operator I H applied in (a) needs to be adapted accordingly, which is fully analysed in Ern and Guermond (2017).

Exponential decay of correctors
This subsection is devoted to the observed decay properties of the corrector C; see Figure 3.3. Based on geometrical parameters of the quantities of interest and local stability assumptions, we prove an analytical decay result, in which we apply the particular proof technique established in Gallistl and Peterseim (2015) and Peterseim (2017). This then motivates the definition of local correctors in Section 3.6 and a localized numerical homogenization method in Section 3.7. To keep the derivations as short and clear as possible, we will make extensive use of the notation . With we hide generic constants that do not depend on the mesh size parameter H.
We make the following standing assumption: problem (3.1) is well-posed, the quantities of interest q j are linearly independent, there exist functions u j satisfying (3.11), and the correctors C, C * are well-defined. Further, in order to avoid certain technical discussions, we assume that ηv ∈ V for all v ∈ V and η ∈ W 1,∞ (D). Note that this is trivially satisfied for V = H 1 0 (D). For some applications such as the Neumann Laplacian with possibly oscillatory diffusion coefficients, where a global mean value constraint needs to be satisfied, minor modifications of the upcoming proofs are necessary; see Henning and Målqvist (2014) and .
Let r define the maximal relative diameter of the (local) supports of the functions u j , that is, We can think of r as the maximum number of grid layers on which u j is supported. This parameter bounds the spread of the projection operator P H in the following sense. Consider a function v ∈ V with supp(v) ⊆ S, where S denotes some patch in T H . Then the application of P H can only enlarge the support by at most r layers, i.e. supp(P H v) ⊆ N r (S) with the neighbourhood defined in (3.10). Note that the definition of r depicts a simplified yet pessimistic bound of the spread. This choice aims to facilitate the understanding of the subsequent arguments rather than optimizing the results.
As already mentioned, we need local versions of the previous continuity, stability and approximation assumptions. Considering the derivations in Example 3.10 in detail, one can observe that all estimates have a local character. Based on this, we assume the existence of a constant C loc W such that for all w ∈ W and any finite element patch S we have ( 3.17) Similarly, Example 3.14 has shown the local character of the V-stability of the projection operator P H . Accordingly, we assume for v ∈ V and a finite element patch S that For an abstract decay result, we further assume that the sesquilinear form a is locally bounded as well as locally inf-sup stable on W. The first property means that there exists a constant C loc a > 0 such that for u, v ∈ V with supp(v 1 ) ⊆ S 1 and supp(v 2 ) ⊆ S 2 . Note that such a condition is typically satisfied for the weak formulation of linear second-order PDEs, where a equals the integral of the product of two functions or its derivatives. The second property considers local versions of W on patches, that is, Here we assume that there exists a constant α loc W > 0 (uniformly for all patches) such that (3.20) Recall that such a condition is automatically satisfied with α loc W ≥ α W ≥ α V in the case that a defines a coercive sesquilinear form. With these local versions of the continuity and stability of the sesquilinear form a, one can derive a local version of the error estimates in Theorem 3.9.
We would like to focus on the exponential decay property of the operator C that was pioneered in Målqvist and Peterseim (2014 Proof. We follow the proof given in Maier (2020, Chapter 2.4) which, in turn, follows Gallistl and Peterseim (2015) and Peterseim (2016). Let ∈ N with ≥ 7r be fixed. Define a finite element cut-off function η ∈ W 1,∞ (D; [0, 1]) ∩ P 1 (T H ) with the properties Due to the shape regularity of the underlying mesh, this directly implies that H ∇η L ∞ (D) ≤ ρ. The idea is to apply η in order to cut off the part of Cv which lies in N −1 (D v ). Since the product η Cv is no longer an element of W, we need to project it back to W with the help of P H . This then enlarges the support by r additional element layers. By the local inf-sup stability (3.20), there exists a Considering the supports of w and v, we conclude with ≥ r + 1 and the local stability (3.20) that a(v, w) = 0. For the second term, we apply once more the local stability. Since it is sufficient to estimate (1 − P H ) (1 − η) Cv within the ring-shaped domain R. First, we apply the (local) V-continuity of P H from (3.18). Second, we consider a single simplex T ∈ T H on which we apply the triangle inequality. Using η := 1 − η and z := Cv, we get with (3.17) T ) . Note that the constant here is bounded in terms of ρ. Considering this estimate for N 33 all T in the current patch, we conclude that In total, we obtain the existence of a constant C > 0 such that , we conclude that

Repeated application of this estimate finally gives
with the rate γ := C/ √ 1 + C 2 < 1. Thus the optimal choice is k = /(6r + 1) , which allows the reformulation With ≥ 7r we get which yields the assertion. Finally, we need to discuss the result for < 7r. Since we trivially have Cv V,D\N (D v ) ≤ Cv V , the assertion remains valid if the (hidden) multiplicative constant is chosen as exp( 1 7 |logγ|). Remark 3.16. The exponential decay of Theorem 3.15 applies to the corrector C * in the same way, following an almost identical proof.

Localized correctors
The exponential decay result of the previous subsection motivates the definition of local versions of the correctors C and C * . More precisely, we aim to construct 34 R. A , P. H D. P localized operators C : V → W and C * : V → W with the property supp (C v This means that we search for approximations of C and C * which only spread information locally. Let us discuss a possible definition of such localized correctors.
To do this, we define the fine-scale functions Recall that, due to the orthonormality (3.11b) (see also (3.15)), we have for any function u H ∈ U H that Consequently we can characterize the application of C on a function in U H by and analogously for C * in terms of the ψ j . Hence a canonical choice of C replaces the φ j with local functions φ j ∈ W(N (D j )), which solve for all w ∈ W(N (D j )). Note that this problem is well-posed due to the local stability condition (3.20). In the same way, the functions ψ j ∈ W(N (D j )) are defined by the variational problem a(w, ψ j ) = a(w, u j ) for all w ∈ W(N (D j )). This then leads to the local versions C , C * : U H → W, namely (3.23)

Remark 3.17 (alternative localization of correctors).
The above definition of a localized corrector is natural but not always optimal with regard to numerical stability properties. For example, for the case of hat functions as in case (c) of Example 3.1, there are better choices; see Henning and Peterseim (2013) and Henning and Målqvist (2014) for a detailed discussion. It is also possible to compute localized approximations of the ideal basis functions (1 − C)u j by preconditioned iterative solvers based on domain decomposition. An approach in the spirit of additive Schwarz methods is presented in Kornhuber and Yserentant (2016) and Kornhuber et al. (2018).
For the following estimates of the error caused by the localization, i.e. of the difference C − C , we need one additional bound. For this, consider an arbitrary N 35 u H ∈ U H and its unique representation u H = j ∈J c j u j . We assume that there exists an exponent s ≥ 0 such that (3.24) As before, we inspect the assumed bound for the finite element-based examples.

Example 3.18 (stability of basis representation).
In the setting of Example 3.1, the exponent s in (3.24) can be characterized as follows.
(a) Since the functions u T ∈ U H (as defined in (3.14)) are similar to the usual hat functions, it is no surprise that such a decomposition is stable in L 2 (D) but not in V. This means that we expect an exponent s > 0. Indeed, we show that s = 1. First note that u T is a piecewise polynomial of order d + 1 such that it satisfies an inverse inequality of the form With the bubble function b T introduced in Example 3.11 and the property u T = P H b T (with the L 2 -stable projector P H ), we can estimate .  (c) With u z = Λ z we have, again similarly to (a), s = 1. More precisely, we obtain for u H = z ∈N H c z Λ z , due to an inverse inequality (Graham, Hackbusch and Sauter 2005) and the moderate overlap of the hat functions,

Then, using
Based on the previous results and assumption (3.24), we now show that the error resulting from C − C decays exponentially fast in the parameter .

Theorem 3.19 (error estimate of localized corrector). The localized operator
The rate c depends on r, γ (see Theorem 3.15), the mesh regularity, and all local stability and continuity constants. For the localized operator C * : U H → W a similar estimate holds.
Proof. We only show the result for C and first consider > 4r. For a fixed index j ∈ J recall the definitions of φ j = Cu j ∈ W and φ j ∈ W(N (D j )); see (3.22). This means, in particular, that φ j is a Galerkin approximation of φ j in the subspace W(N (D j )) ⊆ W. Hence we can apply the quasi-optimality of the method, which yields the error estimate for any w ∈ W(N (D j )). We set w := (1 − P H )(ηφ j ) with the finite element cut-off function η ∈ W 1,∞ (D; [0, 1]) ∩ P 1 (T H ) satisfying . Since the application of P H can only enlarge the support by r layers, this particular choice ensures that supp(w ) ⊆ N (D j ) and thus w ∈ W (N (D j )). An estimate in the spirit of (3.21) yields Since φ j = Cu j , we can apply Theorem 3.15. Thus there exists a rate c (depending on r and γ from Theorem 3.15) such that The combination of all these estimates for j ∈ J then yields for u H ∈ U H Note that we have used the continuity of C as well as (3.24). Finally, the result for ≤ 4r follows again by an adjustment of the (hidden) constant.
Remark 3.20. The appearance of the multiplicative constant H −s √ N in the estimate of Theorem 3.19 is caused by the very general setting. This constant is not critical and can be easily compensated by the exponential term. The factor √ N is the result of the simplified proof presented here. Using a more careful analysis that exploits the decay properties of the corrector, it can be avoided. The factor H −s is usually necessary, given the simple construction of the localized correction operators in (3.23). However, using more sophisticated constructions, this factor may be reduced to (d−1)/2 ; see Henning and Peterseim (2013) and Peterseim (2017). This applies at least to case (c) of Example 3.1, where an estimate without any negative orders of H can be achieved; see Section 4.3. In an ideal situation, the negative powers of H may even be replaced by a factor of order one; see Kornhuber et al. (2018). This, however, is based on a slightly more involved localized correction operator based on preconditioned iterative solvers.
With the definition of localized correctors, we are now interested in approximating the discrete problem (3.8) by a localized version. This is the subject of the following subsection.

Localized numerical homogenization
As a direct consequence of the local correctors C and C * introduced in the previous subsection, we define the spaces We would like to emphasize that the functionsũ k := (1 − C )u k , k ∈ J, can again be characterized in the form of a saddle point problem, which does not depend on u k . To be precise, there exists a multiplierλ such that the pair (ũ k ,λ) ∈ V(N (D k ))×C N solves where V(N (D k )) denotes the subspace of functions v ∈ V with supp(v) ⊆ N (D k ). Note, however, that the multiplierλ is in general not unique, since the quantities of interest may be linearly dependent if restricted to the subdomains N (D k ). To get a unique solution, one needs to restrict the number of constraints to a suitable subset of J depending on .
Remark 3.21. Similarly to Lemma 3.12, one can show that the spaces U H and V H have dimension N. In particular, this means that (1−C ) is a bijection mapping from U H to U H . The inverse is again given by the projection operator P H . Analogously, (1 − C * ) : U H → V H also defines a bijection.
In this subsection we are concerned with a localized version of the Petrov-Galerkin scheme (3.8), which reads as follows: seekũ H ∈ U H such that for allṽ H ∈ V H . For the well-posedness, we need once more the inf-sup stability of a, now restricted to the localized discrete spaces.

Lemma 3.22 (inf-sup stability).
Assume the parameter to be sufficiently large in the sense that log(H −s √ N). Then the sesquilinear form a satisfies an inf-sup condition for the spaces U H and V H .
Proof. Letũ H ∈ U H and defineũ H := (1 − C)P Hũ H ∈ U H . Then, by the discrete inf-sup stability shown in Section 3.3, there existsṽ H ∈ V H such that Letṽ H := (1 − C * )P HṽH ∈ V H . Due to (1 − C)P HũH =ũ H (and analogously for v H ), we have Thus, for sufficiently large in the range Due to P Hṽ H = P H (1 − C * )P HṽH = P HṽH , we know thatṽ H −ṽ H ∈ W such that the second term vanishes. For the first term we estimate Thus, for sufficiently large we conclude the inf-sup stability of a for the spaces U H and V H .
Since we now know that the discrete problem (3.27) is uniquely solvable, we can discuss the resulting error compared to the ideal discrete solutionũ H .

Theorem 3.23 (error in quantities of interest).
Let u ∈ V denote the exact solution,ũ H ∈ U H the solution of the discrete problem (3.8), andũ H ∈ U H the solution of (3.27). Then the errorũ H −ũ H satisfies where c denotes the rate introduced in Theorem 3.19.
Proof. Let e :=ũ H −ũ H . Since P H (1 − C) equals the identity for functions in U H , by the discrete inf-sup stability for U H , V H discussed at the beginning of Section 3.3 we obtain All together, this yields Together with u −ũ H V u V , this gives the first assertion. Finally, since (1 − P H ) maps into W, which is the kernel of the quantities of interest, we get Now we can apply the previous estimate.
The previous result shows that the errorũ H −ũ H decays exponentially in if measured in the quantities of interest. Considering the full error, we obtain the following result.
Theorem 3.24 (full error of localized numerical homogenization). Let u ∈ V denote the exact solution andũ H ∈ U H the solution of (3.27). Then the difference u −ũ H satisfies the error estimate Proof. The quasi-optimality of the Petrov-Galerkin method (3.27) yields forũ H ∈ U H solving (3.8) and arbitraryṽ H ∈ U H the estimate The first term on the right-hand side can be bounded with the help of Theorem 3.9. For the particular choiceṽ H := (1 − C )P HũH , we can show as in the proof of Theorem 3.23 that Together with the a priori estimate u V F V * , this yields the claim.
We end this section by looking once more at the finite element-based examples and the resulting error estimates.
(b) In the higher-order case with s = 1, we get for (c) We obtain the same error bound as in (a) for k ∈ {−1, 0} and for k = 1 if F = ( f , •) L 2 (D) with f ∈ H 1 0 (D). Note, however, that when switching to a N 41 computationally efficient Petrov-Galerkin method with U H as solution space and U H as test space (see Remark 3.13), then any higher order estimates for f ∈ H k (D) with k > 0 are lost. This holds for (a), (b) and (c) equally.
After this detailed discussion of the abstract variational framework, which yields a general approach for numerical homogenization, we now turn to more specific applications. We start with the prototypical example of the Poisson problem with a rapidly oscillating diffusion coefficient.

Application to the elliptic model problem
In this section we apply the abstract framework introduced in Section 3 to a prototypical second-order model problem. More precisely, we consider the elliptic diffusion problem (1.2) with a multiscale coefficient.
Let D ⊆ R d be a bounded Lipschitz domain with d ∈ {1, 2, 3}, f ∈ L 2 (D) a given source term, and V := H 1 0 (D). Further, let C F > 0 denote the constant such that We consider the Poisson problem with homogeneous Dirichlet boundary condition in the variational formulation, that is, we seek u ∈ V such that for all test functions v ∈ V. Therein, A ∈ L ∞ (D, R d×d ) denotes a symmetric multiscale coefficient that is elliptic in the sense that there are real constants 0 < α ≤ β such that α |ξ | 2 ≤ A(x)ξ · ξ and | A(x)ξ | ≤ β |ξ | for all ξ ∈ R d and almost every x ∈ D. This directly implies the ellipticity of a and hence the existence of all the assumed inf-sup constants from the previous section. More precisely, we have Note, however, that the ellipticity constant may improve on subspaces such as W, leading to larger values of α W . Moreover, a is (locally) continuous with C a = C loc a = β. The following subsections analyse the quantities of interest, which were already introduced in Example 3.1. These approaches generate well-known multiscale methods. Caused by the very general structure introduced in Section 3, some of the above estimates can be improved in the particular settings. It is interesting to mention that, following the theory of operator-adapted wavelets (Owhadi and Scovel 2019), the variational framework can be generalized to arbitrary bounded 42 R. A , P. H D. P elliptic operators that satisfy a suitable notion of locality. In this sense, numerical homogenization is universal to all elliptic operators and, at the same time, enables fast algorithms for their compression.

P 0 -based numerical homogenization
This subsection is devoted to the multiscale method, which arises from Example 3.1(a). Recall that the quantities of interest are given by integral means over simplices of a given triangulation T H . Hence the fine-scale space W consists of functions in V whose integral means vanish on all simplices. As a result, the local approximation property (3.17) is satisfied due to the Poincaré inequality; see Payne and Weinberger (1960) and Bebendorf (2003). For the theory, we have made the choice of u T being certain P 1 (T H )-functions, which are enhanced by bubble functions; see the construction in Example 3.11 and the illustration in Figure 3.1.
In total, the abstract variational framework yields the error estimate for right-hand sides f ∈ H k (D), k ∈ {0, 1}. Although this case is not treated by Maier (2020), we believe that the arguments therein can be used to avoid the factor H −d/2 even in the P 0 -case. In any case we obtain a numerical homogenization method, which is of order k + 1 as long as the localization parameter satisfies |logH|. Figure 4.1 illustrates the method applied to the elliptic model problem (4.1) with the multiscale coefficient presented in Example 1.1. The considered mesh size H = 1/8 leads to a very poor approximation of the standard P 1 -FEM. In contrast, the numerical homogenization approach yields an indistinguishable approximation of the exact solution u for all ≥ 1. Moreover, the integral means of u over elements T ∈ T H are preserved to high (exponential-in-) accuracy byũ H and even exact by the ideal approximationũ H by construction. Note that the u T are not needed for practical computations and the multiscale basis functionsũ T can be computed directly using (3.25) with the index set J restricted to the labels of the elements that form the patch N (T).
The above attempt to approximate second-order PDEs with piecewise constants is certainly one of the simplest possible in the given framework. It is interesting to notice that, for a constant diffusion coefficient in one dimension, it recovers the Galerkin method with quadratic B-splines. The approach was first considered by Owhadi (2017), within the general framework of operator-adapted wavelets (gamblets). For surprising reinterpretations in a game-theoretical framework and optimal recovery splines, we refer to the corresponding monograph by Owhadi and Scovel (2019). Therein, the methodology is generalized in a multi-level context where the use of piecewise constants is a particularly meaningful. The space P 0 (T H ) that gives rise to the macroscopic quantities of interest is equipped with the L 2 (D)- orthogonal Haar basis with respect to a hierarchy of nested meshes T 0 , T 1 , . . . , T L = T H . Correspondingly, the quantities q T are replaced by L 2 (D)-inner products with Haar wavelets on the different levels. The advantage is that such an approach of operator-adapted wavelets not only achieves numerical homogenization as above but simultaneously also preconditions the resulting coarse system so that it can be easily inverted with standard iterative solvers such as the conjugate gradient method (without any further preconditioning). In a language that is close to the one used here, this is also covered in Feischl and Peterseim (2020), where the multilevel wavelet structure is actually a crucial tool to preserve sparsity in the context of stochastic homogenization problems. We shall also mention that in the analysis of operator-adapted wavelets for the uniformly elliptic and symmetric setting of this section, the iterative variant of the localized correction operator is typically employed, as proposed by Kornhuber and Yserentant (2016) and Kornhuber et al. (2018). On the theoretical level, the iterative approach is opposed to the variant of this paper in Målqvist and Peterseim (2020, Chapter 4.3). While the iterative version certainly leads to the neatest and most elegant error analysis, the variant of this paper is well understood for a much larger class of applications including non-Hermitian ones; see Section 5.

Higher-order numerical homogenization
If we assume (element-wise) higher regularity of the right-hand side f , then it may be worthwhile to apply higher-order methods. Based on local polynomials of degree p, we have discussed in Example 3.1(b) suitable quantities of interest. The resulting error estimate from Example 3.25 indicates that the numerical homogenization method yields an approximationũ H with P for f ∈ H k (D), k ≤ p + 1, if the localization parameter is chosen such that |logH|. The method was first discussed for the case p = 1 in Elfverson, Georgoulis and Målqvist (2013a) and Elfverson, Georgoulis, Målqvist and Peterseim (2013b). The high-order generalization is included in the aforementioned abstract theory of operator-adapted wavelets (Owhadi 2017, Owhadi andScovel 2019). A complete detailed error analysis for the elliptic model problem (4.1) (on rectangular meshes) has been derived in Maier (2020Maier ( , 2021. A very interesting question is the dependence of on the polynomial degree p, which is made explicit in Maier (2021). In theory, needs to grow with p but numerical experiments indicate that this may not be necessary. On the other hand, the negative order of H in front of the exponentially decaying term is not a result of an inaccurate analysis but actually occurs numerically. This has been discussed extensively in Maier (2020, Chapter 3.4).

Classical LOD
Finally, we revisit case (c) of the examples presented in Section 3. Recall that we consider a regular triangulation T H and that the quantities of interest correspond to interior nodes (denoted by N H ) in order to achieve conformity in V = H 1 0 (D). As discussed in Example 3.11, this leads to the local functions u z = Λ z and U H being equal to the standard P 1 -finite element space. The resulting projection operator has the form and coincides with the projective quasi-interpolation operator from Målqvist and Peterseim (2020, Example 3.1). This projector first applies local L 2 (T)-orthogonal projections to P 1 (T) and then averages the function values at the nodes. Values at boundary nodes are set to zero. The numerical homogenization method, which results from this approach, is the most popular example of the LOD; see Målqvist and Peterseim (2014). As discussed in Example 3.25,ũ H provides a good approximation of the exact solution u if the localization parameter satisfies |logH|. In this case we get Accordingly, one can show with a standard Aubin-Nitsche duality argument that the L 2 -error scales with one additional order of the mesh size, i.e. u − u H L 2 (D) H 2 f L 2 (D) . For more details we refer to Målqvist and Peterseim (2020, Chapter 5).
In order to solve the numerical homogenization problem (3.27), one may compute the problem-specific basis functions C Λ z . In practice, however, it has been observed that the localization properties improve significantly if one considers the correctors of Λ z | T , T ∈ T H , separately. In particular, the error estimate of the N 45 localized corrector in Theorem 3.19 can be improved such that no negative powers of H appear in the estimate. Moreover, for the special case of the elliptic model problem and quasi-uniform rectangular meshes, the decay rate |logγ|/49r in Theorem 3.15 can be improved to |logγ|/8; see Maier (2020, Chapter 2.4).
At the end of this subsection we would like to emphasize that the so-called contrast β/α occurs as a constant in all estimates. Thus none of the approaches above are provably robust in terms of high contrast, and we cannot guarantee fast decay of the correctors or small approximation errors in such a scenario. As a way out, one can adjust the projection operator P H (and thus the fine-scale space W) such that the correctors still decay exponentially, independently of the contrast.

Implementation details
We end this section with some comments on the implementation of the numerical homogenization framework presented. Here we focus on the method resulting from case (c) of the examples presented in Section 3, since this offers the best developed techniques. For more details we also refer to Målqvist and Peterseim (2020, Chapter 7) and Engwer et al. (2019). The method resulting from (c) is typically referred to as the classical LOD. The range of different applications is wide, and depending on the scenario and equation considered, the practical implementation of the method may change accordingly. For the prototypical elliptic model problem that we consider in this section, the computational issues are caused by the diffusion coefficient A, which oscillates on such a fine scale that it is very difficult (or even impossible) to resolve these variations with a global fine mesh and a corresponding standard FEM. In such scenarios, the LOD can be used to make the problem computationally tractable by splitting it into small subproblems on patches. The subproblems are cheap to approximate by standard FEMs on a finer mesh. In particular, the subproblems are independent from each other, which allows for a tremendous speed-up in a parallel implementation. Furthermore, since the memory is a limiting factor, the method is typically realized in a Petrov-Galerkin version in the spirit of Remark 3.13. This avoids storage of computed correctors, since it is not necessary to compute their interaction with other correctors but only with regular coarse basis functions, which have a simple shape and fully local support. In practice this means that we only have to calculate the influence of a given corrector on the global LOD stiffness matrix and afterwards it can be deleted again. Hence the LOD matrices arising in such a setting can be assembled in a single loop over the coarse grid elements.

Application to wave propagation
In this section we discuss some direct and indirect applications of the abstract framework of Section 3 to PDEs that describe wave phenomena.

Helmholtz equation
As an initial application of the abstract framework, we consider high-frequency time-harmonic acoustic wave propagation modelled by the Helmholtz equation.
For simplicity, we assume that the computational domain D ⊆ R d , d ∈ {2, 3}, is a bounded and convex Lipschitz domain. By κ ∈ R >0 we denote the wave number, where we have in mind large values of κ. For a source term f ∈ L 2 (D) := L 2 (D, C) we seek u : D → C, which solves the Helmholtz equation with the Robin boundary condition ∇u · n − iκ u = 0 on ∂D.
Here, n denotes the outward unit normal to the boundary ∂D. Let V := H 1 (D, C) be equipped with a weighted H 1 -norm that is defined by . Clearly this norm is equivalent to the standard H 1 -norm, which was considered in Section 3. To formulate problem (5.1) in the weak form, we introduce the sesquilinear form a : We then seek u ∈ V such that Under the given assumptions, it is known that the sesquilinear form a is infsup stable (see Melenk 1995 and in particular Esterhazy and Melenk 2012, Theorem 2.5) with Hence problem (5.2) is well-posed, admits the stability estimate u V f L 2 (D) , and fits into the abstract framework of Section 3. Here we note that we do not hide κ-dependences in the ' '-notation, since the wave number determines the multiscale character of the problem and is hence crucial.
In order to understand in what sense the problem can be numerically homogenized, we consider the standard P 1 -finite element space U H ⊆ V based on a regular yields the expected error estimate of the form This, however, only holds under the resolution condition H κ −2 , which may impose a severe computational constraint. Also known as the pollution effect Sauter 1997, Melenk andSauter 2011), this condition is analytically sharp and numerically clearly visible. The phenomenon is surprising in the sense that if the mesh size resolves the wavelength, i.e. H κ −1 , then the space U H is rich enough to contain good H 1 -approximations of u. These good approximations, however, are not found by a standard Galerkin method, unless H κ −2 . This is visualized in Figure 5.1, where we observe that despite the fact that the wave is well resolved with mesh size H = (π/2)κ −1 , the standard finite element solution suffers from a severe phase shift.
In order to demonstrate how the pollution effect can be eliminated within the abstract variational framework, we can select the quantities of interest q z as the coefficients of an H 1 (D)-stable projection operator that is defined via local projections. To work with a precise construction, we let N H denote the set of vertices of the triangulation T H and define q z according to Example 3.1(c). This results in the Ern-Guermond projection operator P H : V → U H with P H (v) := z ∈N H q z (v)Λ z for the nodal basis functions Λ z . Following the general construction in Section 3.2, we first need to ensure the inf-sup stability of a on the fine-scale space W := ker P H ; see property (3.5). In fact, on W we even have coercivity of a if H κ −1 . To see this, we first recall from Ern and Guermond (2017) Hence we conclude that for any w ∈ W we have , and consequently Re a(w, w) = ∇w 2 L 2 (D) − κ 2 w 2 , provided that H ≤ cκ −1 with a sufficiently small constant c > 0 that does not depend on κ or H. Hence we have verified (3.5) with constant α W = O(1).
In the next step we define the multiscale spaces U H and V H based on the corrector C, as in Lemma 3.12. Thus we seek the numerically homogenized Applying Theorem 3.9 and observing that we conclude the κ-explicit error estimates as well as Hence we have two error estimates that guarantee linear convergence for the H 1error and quadratic convergence for the L 2 -error, without a prefactor that depends on the wave number. Note that they are valid in the natural resolution regime H κ −1 , which guarantees inf-sup stability of the corrector problems with a constant of order 1. For comparison, recall that the classical finite element estimate (5.3) only holds in the regime H κ −2 . Thus the numerical homogenization strategy not only removes the potentially large factor κ from the estimate but also relaxes the admissible resolution regime considerably. This clearly reflects the stabilizing nature of the methodology. This is well known for variational multiscale methods, which are the most popular methods for the stabilization of convectiondominated flow simulations. For example, Figure 5.1 shows thatũ H no longer suffers from the phase shift of the standard FEM in the regime H = (π/2)κ −1 .
In order to localize the construction of the ideal multiscale method, we can proceed analogously to the abstract arguments presented in Sections 3.5 and 3.6. For the details and for more general considerations including non-convex domains, N 49 general coefficients in heterogeneous media and other boundary conditions, we refer to the series of papers by Gallistl and Peterseim (2015), Peterseim (2016Peterseim ( , 2017, Brown, Gallistl and Peterseim (2017) and Chaumont-Frelet, Gallistl, Nicaise and Tomezyk (2018). The elastic version of the Helmholtz problem is treated in Brown and Gallistl (2016). Applications to high-contrast variable wave numbers related to negative refraction phenomena are discussed in  and non-linear Helmholtz equations are considered in .

Maxwell's equations
Even though the abstract framework of Section 3 only covers closed subspaces of H 1 (D), the arguments can be transferred to other Hilbert spaces, such as H(curl). This subsection illustrates some of the required modifications.
As a particular example, we discuss the time-harmonic Maxwell's equations in a bounded domain D ⊆ R 3 . Moreover, we assume D to be a contractible polyhedral Lipschitz domain. We will not focus on the role of a large wave number κ as this leads to results similar to those of the Helmholtz case. Instead, we investigate the situation of electromagnetic waves that propagate through a multiscale medium, for example a photonic crystal or a different metamaterial. In this context we seek the electric field u : D → C 3 such that curl(µ curl u) + σ u = f in D, where n is again the outward unit normal to ∂D. The source term f is related to current densities and the coefficients µ and σ are scalar complex-valued functions that characterize the properties of the propagation medium, namely the electric permittivity and the conductivity. In particular, these functions are rapidly oscillating multiscale coefficients.
To state the weak formulation of the problem, we define the natural solution space as H 0 (curl) := {v ∈ L 2 (D, C 3 ) | curl v ∈ L 2 (D, C 3 ), v × n = 0 on ∂D}. This is a Hilbert space when equipped with the inner product For the well-posedness, we assume bounded coefficients µ ∈ L ∞ (D, R 3×3 ), σ ∈ L ∞ (D, C 3×3 ) and that the source term has a weak divergence, that is, Moreover, we assume that the sesquilinear form a : H 0 (curl) × H 0 (curl) → is elliptic on H 0 (curl), that is, there exists α > 0 such that The latter assumption is fulfilled in many practical scenarios; see Fernandes and Raffetto (2005) and Gallistl, Henning and Verfürth (2018). With (5.5), there exists a unique weak solution u ∈ H 0 (curl) to To start with a standard discretization of this problem, let T H denote a regular, quasi-uniform, simplicial partition of D into (closed) tetrahedra. The lowest-order H(curl)-conforming Nédélec finite element space is now given by (see Monk 2003, Section 5.5) ( 5.7) Here the norms u H 1 (D) and curl u H 1 (D) (if they exist) both blow up with the speed of the fine-scale variations, that is, they scale with µ W 1,∞ (D) and σ W 1,∞ (D) . This effect is identical to the blow-up observed for standard Galerkin FEMs in Example 1.1. However, there is one crucial difference: here the estimate (5.7) does not improve, even if we consider only the best approximation in the L 2 -norm. In other words, U H does not even contain good L 2 -approximations of u. This makes the task of finding suitable quantities of interest rather delicate.
The key observation for the construction of a suitable approach is that U H contains accurate approximations in the dual space norm · H(div) . To see this, we consider two projection operators. The first projection, denoted by P H : H 0 (curl) → U H , is H(curl)-stable and fulfils an L 2 -stability estimate of the form For the second projection, let U H ⊆ H 1 0 (D, C) denote the usual P 1 -finite element space and assume that P H : C). If both projection operators are connected through the commuting property C), then projection errors can be decomposed. To see this, first recall that for any v ∈ H 0 (curl) there is a regular decomposition (see Hiptmair 2002, Hiptmair andXu 2007) of the form v = z + ∇θ, where z ∈ H 1 0 (D, C 3 ) and θ ∈ H 1 0 (D, C) N 51 with z H 1 (D) + θ H 1 (D) v H(curl) . Applying this decomposition to the exact solution u ∈ H 0 (curl) and using the commuting property, we obtain Hence we conclude for the projection error in the H(div) -norm that which is independent of the speed of the multiscale variations. Consequently meaningful quantities of interest could be the coefficients of the projection operator P H . A suitable projection operator that not only has the crucial commuting property but is also computable was proposed by Winther (2014, 2015) and exploited by  and Henning and Persson (2020). To date, it is the only known choice that leads to meaningful and computable quantities of interest in the sense of Section 3.2. We stress once more that (local) L 2 -projections cannot be used to define reasonable quantities of interest, since there exists no good L 2 -approximation of u in U H . Using the aforementioned Falk-Winther projection P H , we can define the finescale space as W := ker P H ⊆ H 0 (curl). From this point on, we can proceed analogously to Section 3.3. By defining we obtain a numerically homogenized approximation that is an accurate approximation of u in the H(curl)-norm. In particular, we have the error estimate The proof is based on the same arguments as before and by exploiting f ∈ H(div). For further details, localization proofs and an algorithmic realization of P H , we refer to  and Henning and Persson (2020).

Wave equation
As the last application to wave-type phenomena, we shall look at the acoustic wave equation with a multiscale coefficient. In comparison to the previous applications, we now have an explicit time dependence. We will consider the case when the solution is rapidly oscillating in space but only slowly varying in time. This assumption allows us to construct a fixed multiscale space that can be re-used over time.
To make the setting precise, let D ⊆ R d , d ∈ {1, 2, 3}, denote a polygonal Lipschitz domain, t end > 0 a maximum time, f ∈ L 2 (0, t end ; L 2 (D)) a given source term, and A ∈ L ∞ (D, R d×d ) a symmetric and elliptic multiscale coefficient. In this setting, we seek u : Here the initial states are given by u 0 ∈ V and u 1 ∈ L 2 (D). For the variational form, we define the bilinear form a : V × V → R as usual by for which there exists a unique weak solution u ∈ L 2 (0, t end ; V) with ∂ tt u ∈ L 2 (0, t end ; V * ) such that, for all v ∈ V and almost every t > 0, we have together with the initial conditions In fact the initial values play a crucial role for the multiscale wave equation, since they determine whether the exact solution u develops fast oscillations in time or the temporal behaviour remains slow. For a detailed discussion of this issue, we refer to Abdulle and Henning (2017). In order to avoid technical assumptions on the initial values at this point and to be able to focus on the crucial arguments for how the variational framework applies, let us consider the simplified setting with vanishing starting values and a source term that is H 2 -regular in time, that is, In this case it is well known that u ∈ W 2,∞ (0, t end ; V) and ∂ ttt u ∈ L ∞ (0, t end ; L 2 (D)); N 53 see Evans (2010, Chapter 7.2). Furthermore, we have the estimate ∂ ttt u L ∞ (0,t end ;L 2 (D)) + u W 2,∞ (0,t end ;V) ≤ C f ,t end := C(t end ) ( f L ∞ (0,t end ;H 1 (D)) + f H 2 (0,t end ;L 2 (D)) ), where C(t end ) only depends on the maximum time t end , the domain D and the spectral bounds of A. Hence it is independent of the regularity of A and the speed of the fine-scale variations.
In the next step we will see how these stability bounds can be directly exploited in the variational framework to construct a numerically homogenized solution. To this end, let U H ⊆ V denote a finite-dimensional space and let u H (t) ∈ U H be the corresponding semi-discrete approximation with If the initial values for u H and ∂ t u H are appropriately chosen in U H , then we have the classical error estimate (see Baker 1976) of the form It becomes evident that if U H is selected as the standard P 1 -finite element space, then u − π H (u) L ∞ (0,t end ;L 2 (D)) behaves like H 2 A 2 W 1,∞ (D) , which raises the usual issue of multiscale problems. On the other hand, if we construct a multiscale space U H , as in Section 3, with Ritz projectionπ H : V → U H , then we can exploit the fact that u(t) ∈ V solves . This leads to the error estimates Hence we have (at least) second-order convergence in H for a semi-discrete Galerkin This shows that under appropriate assumptions on the initial values, it is possible to construct a multiscale space that approximates u(t) well for all times t ≥ 0. However, there are also initial values that can trigger fast oscillations in time. In this case it is not clear if the space U H needs to be updated over time. The above discussion only refers to a semi-discrete approximation, but it neglects the practically important influence of time discretizations. It should be noted that the influence of the time discretization can tighten the required regularity assumptions for the initial values and the right-hand side in order to still guarantee a rate of O(H 2 ) for the L ∞ (0, t end ; L 2 (D))-error; see Abdulle and Henning (2017) and .
However, the computational advantages of a fully discrete method are still striking. For example, implicit time-stepping schemes for the multiscale wave equation are considered in Owhadi and Zhang (2017). Here a gamblet-based multiresolution decomposition (adapted to the multiscale equation and the time discretization) is used in the spirit of a multigrid algorithm in order to efficiently execute the time steps.
Furthermore, in connection with explicit time-stepping schemes, the analysis of fully discrete methods reveals very favourable stabilization effects of the space U H . For example, a popular choice for the wave equation is the leapfrog scheme, since it conserves the discrete energy. The classical CFL condition that ensures stability of the method for standard P 1 -finite element spaces is of the form τ h, where τ denotes the time step size and h is the minimum (fine-scale) mesh size. In the context of multiscale problems, h needs to be small enough to resolve the microstructure, which imposes tremendous restrictions on the size of the time steps and hence introduces yet another challenge. However, if the leapfrog scheme is combined with the multiscale space U H , then the CFL condition can be relaxed to τ H; see Peterseim and Schedensack (2017) and . Since H is a coarse mesh size, this allows for much larger time steps and consequently reduces the computational complexity considerably.
This consideration is not only useful for the multiscale wave equation, but it can also be applied to conventional (coarse-scale) wave equations. Here we can think of a situation where we have features in the solution u that are resolved by an adaptive finite element mesh T h , where the refinement is such that the minimum mesh size h min is much smaller than the maximum mesh size h max , i.e. h min h max . In this case the CFL condition needs to consider the elements with the smallest diameter, leading to the stability constraint τ h min . This makes adaptive finite elements not very attractive in combination with explicit time integrators. However, if we define V := U h as the corresponding adaptive finite element space, then we can embed it into a coarse space U H and construct a 'multiscale space' U H in the usual manner. Again, the CFL condition is relaxed to τ H and the adaptive refinements on the fine mesh are no longer a restricting factor; see Peterseim and Schedensack (2017).

Further applications and generalizations
In this subsection we conclude with a list of further applications of the abstract framework, which fall in particular into the category of LOD-based methods. Successful applications include semi-linear elliptic problems (Henning, Målqvist and Peterseim 2014a), elliptic problems in porous/perforated media (Brown and Peterseim 2016) and problems with high contrast Scheichl 2016, Hellman and.
The framework is particularly suited to solving eigenvalue problems, since the eigenfunctions can be interpreted as H 1 0 -source terms (or something even smoother) so that superconvergence is achieved with respect to the coarse mesh size H. For linear eigenvalue problems we refer to Målqvist and Peterseim (2015); results for non-linear eigenvalue problems with the non-linearity in the eigenvalue can be found in Målqvist and Peterseim (2017) and non-linear eigenvalue problems with the non-linearity in the eigenfunction are treated in Henning, Målqvist and Peterseim (2014b). For Schrödinger-type eigenvalue problems with strong disorder potentials, it is possible to observe Anderson-type localization phenomena. These phenomena refer to an exponential decay of the eigenfunctions in the lower part of the spectrum. With abstract variational techniques it is possible to analytically predict this type of localization, as demonstrated in Altmann, Henning and Peterseim (2020c) for linear and in Altmann, Henning and Peterseim (2020b) for non-linear Schrödiner eigenvalue problems. A computational strategy for an efficient computation of such localized eigenstates was presented in .
In many applications the deterministic multiscale problems are realizations of problems with random coefficients at short correlation length. Thanks to its universal reliability and efficiency with respect to the possible sample coefficients, the variational framework is well suited to stochastic homogenization problems; see Peterseim (2019), Fischer, Gallistl andPeterseim (2021) and Feischl and Peterseim (2020).
Another surprising generalization of the framework is the numerical homogenization as well as the compression and fast inversion of general bounded elliptic operators, including higher-order differential operators that are covered by the multilevel operator-adapted wavelets of Owhadi and Scovel (2019). In this context, numerical homogenization can be extended to approximate Cholesky factorization of the differential operator and of its inverse (Schäfer, Katzfuss andOwhadi 2020, Schäfer, Sullivan andOwhadi 2021). In addition to the previously identified connection to the theory of preconditioned iterative solvers (Kornhuber and Yserentant 2016), this manifests a deep relation between numerical homogenization and numerical linear algebra.

Connections to other methodologies
In this section we show how the abstract framework of Section 3 covers and relates to earlier variational concepts. We also introduce the basic ideas of variational multiscale frameworks from a historical perspective. Starting from the local Green's functions approach (Hughes 1995), we then turn to the general VMS formulated in Hughes et al. (1998) and show how the Green's function approach can be seen as a specialization. After that, we discuss residual-free bubbles (Brezzi et al. 1997) and their relation to VMS and MsFEM (Hou and Wu 1997). This illustrates that variational approaches can be presented and interpreted in various ways. Ultimately they can also be connected to mathematical homogenization. For simplicity, we restrict our considerations to the elliptic model problem (4.1) introduced in Section 4 with V := H 1 0 (D). In what follows, we denote the operator representation of a by A : V → V * , that is, we set Au := a(u, •). The derivation and results in this section do not require A ∈ L ∞ (D, R d×d ) to be symmetric. We will not mention this again, and work with dual operators whenever it becomes relevant.

Local Green's functions approach
In this subsection we will make initial preparations for a later motivation of the general VMS. We work in a very simplified setting where we assume that the multiscale features of problem (4.1) are isolated in coarse elements. The subsequent method was proposed by Hughes (1995) and is known as the local Green's functions approach. Hughes was driven by the question of how one could stabilize Galerkin methods in the context of multiscale problems, so that they would yield accurate L 2 -approximations even in the under-resolved regime where the (coarse) mesh size is much larger than the characteristic scale of fine-scale variations. In order to achieve this, it was necessary to eliminate the dependence of the macroscopic equation on the unresolved subgrid scales.
The main purpose of this subsection is to demonstrate how to formulate a stabilized method for multiscale problems, and to clarify the idea of local Green's functions in a simple setting. To this end, we assume that the macroscopic scale is partitioned by a coarse mesh T H of feasible size. For simplicity, suppose that T H is a regular quasi-uniform triangulation of D. A generalization to other types of partitions is straightforward. By U H := P 1 (T H ) ∩ V we denote the corresponding N 57 P 1 -finite element space with functions vanishing on ∂D. In order to illustrate the crucial idea behind the construction of stabilized numerical methods for multiscale problems, we shall consider the following setting.
Setting 6.1. The exact solution u ∈ V to (4.1) can be written as where u H ∈ U H is the macroscopic part, and where u f T ∈ H 1 0 (T) describes the microscopic part of u, which only yields a negligible contribution to the L 2 -norm, i.e. u f L 2 (D) 1.
The assumption that u can be decomposed into a macroscopic (resolved) and a microscopic (unresolved) part is a natural assumption of scale separation. However, the assumption that the fine-scale part can be written as the sum of compactly supported functions in T ∈ T H is very strong (and to some extent unrealistic). Nevertheless, this particular case plays an essential role in relating the concepts of the classical VMS and the residual-free bubbles that we address later. More broadly speaking, it will help us to better understand the origins of the abstract variational framework that we formulated in Section 3.
Starting from the original problem in variational form and exploiting the assumed decomposition u = u H + u f , we can define the solution space as Since W is a closed subspace of V and since U H ∩ W = {0}, we conclude that we can characterize u = u H + u f (and its decomposition) uniquely as the solution to (4.1). Let us decompose a test function v ∈ V accordingly as v = v H + w, where v H ∈ U H and w ∈ W. This yields The question is: How can we describe the variational problem (6.1) as a problem purely formulated in U H ? The first step is to consider two coupled subproblems, one for w = 0 and one for v H = 0, which gives us and a(u H , w) + a(u f , w) = ( f , w) L 2 (D) for all w ∈ W. (6.2b) Using the dual pairing notation •, • , where we drop the spaces for brevity, we can 58 R. A , P. H D. P write (6.2b) in terms of the operator A as From this equation, we immediately conclude with w = T ∈T H w T , w T ∈ H 1 0 (T), that for all T ∈ T H the function u f T ∈ H 1 0 (T) is the solution to The corresponding local corrector Green's function G T (•, y) ∈ W 1,1 0 (T) for a point y ∈ T is characterized as the solution to where δ denotes the Dirac delta function, which satisfies δ, v = v(0). Hence the local Green's function fulfils a(G T (•, y), v) = v(y) for all v ∈ C ∞ 0 (T). For bounded domains and for general (not necessarily symmetric) coefficients A ∈ L ∞ (D, R d×d ), the existence of a corresponding (unique) Green's function was proved in Grüter and Widman (1982, Theorem 1.1) for the case d ≥ 3 and in Dong and Kim (2009, Theorem 2.12) for the case d = 2. Note that the case d = 1 is trivially covered by the Lax-Milgram theorem as the Dirac delta function is then an element of V * .
To formalize the next step, let us now assume additional regularity for A. More precisely, we assume that A |T ∈ W 1,∞ (T) for all T ∈ T H , (6.4) and, recalling that u H is a linear function on each T, we define We can write u f T in terms of the local (corrector) Green's function as Hence we have where the global corrector Green's function G f is given by With the bounded linear (integral) operator G f : L 2 (D) → V given by we can write u f in terms of f and the coarse part u H as u This formula for u f can be plugged into (6.2a) to obtain a macroscopic equation for u H that is independent of the subgrid scales.
Theorem 6.2. Consider Setting 6.1 and let (6.4) be fulfilled. Then, with the integral operator G f defined in (6.8) and A |T H defined in (6.5), we have that u H ∈ U H is the unique solution to the purely macroscopic equation , v H ) describes precisely the effect of the unresolved scales on the resolved scales. Furthermore, u H is a good L 2 -approximation to u in the sense that The variational formulation (6.9) shows that we can stabilize the conventional Galerkin formulation in such a way that we obtain an accurate L 2 -approximation in the coarse finite element space U H . This is typically not obtained with the standard formulation; see Example 1.1 or Figure 4.1. In this sense, we constructed a stabilized scheme for computing the approximations of elliptic multiscale problems. The macroscopic problem (6.9) is independent of the unresolved subgrid scales as it is purely formulated in U H . This is possible since the information about the subgrid scales is directly incorporated in the macroscopic problem via the local Green's functions G T . Remark 6.3. Even though the effect of the unresolved scales on the resolved scales is non-local (as directly seen from (6.7)), it is still confined within each element T, since the Green's function G f can be decomposed into T-local contributions; see (6.6). Note that this strongly depends on the assumptions made in Setting 6.1 and is not true in general. Equation (6.9) can also be formulated explicitly in terms of the local Green's functions G T defined in (6.3). To see this, note that Hence, in order to assemble the stabilization term in the Galerkin method, we need to compute the effect of the local Green's functions for each individual grid element T ∈ T H . More precisely, for every coarse basis function Λ z ∈ U H that has a support on the element T ∈ T H , we need to compute the quantity C T Λ z . This requires us to compute the element Green's functions G T in the first place.
The results in this subsection were obtained under quite restrictive assumptions. In the following subsection we explore how these assumptions can be relaxed and how a generalized methodology can be constructed. This then leads to the basic version of the VMS. Afterwards, we show its connection to the abstract variational framework introduced in Section 3.

Classical VMS
In this subsection we drop the assumption that the fine-scale behaviour only exists isolated within the elements of a coarse partition. Following the arguments in Section 6.1 closely, we see that they are still valid if we work with a global Green's function instead of a set of local Green's functions. This only appears to be an issue from the computational point of view. For particular realizations of the framework, however, there are efficient localization strategies that make the method feasible. This will become clear when we relate it to the abstract framework of Section 3. In the following, we leave the localization issues aside and introduce an idealized variational multiscale scheme, as suggested in Hughes et al. (1998).
Analogously to the local Green's functions approach in Section 6.1, we start from a low-dimensional coarse space U H ⊆ V, for example the space of piecewise linear elements with a mesh size H. At this point we only assume that the coarse space U H is rich enough to contain good L 2 -approximations of the exact solution u ∈ V to problem (4.1). In addition to U H , we consider a closed fine-scale space W ⊆ V in which the scales that are unresolved in U H are fully represented. As in the previous subsection, we assume that the coarse space and the fine space only have the trivial intersection, i.e. U H ∩ W = {0}. Recall that we constructed several such spaces in an abstract way in Section 3.4. Starting from a set of quantities of interest q j , j ∈ J, we define W as the corresponding kernel space (see (3.3)) and a basis of U H by a set of local functions u j with the property q j (u k ) = δ k j (hence dim U H = |J |). However, this abstract construction is not important here and we only need that U H ∩ W = {0} and the additional assumption that u can be (uniquely) represented as with u H ∈ U H and u f ∈ W. As in Section 6.1, the original problem is equivalent to finding the unique solution (u H , u f ) ∈ U H × W with for all v H ∈ U H and w ∈ W. Again, we decompose the problem into two coupled N 61 subproblems; see (6.2). With I f := ( f , •) L 2 (D) , i.e. the trivial embedding of L 2 (D) into V * , we define the residual R : U H → V * in a distributional sense by Ru H := I f − Au H , that is, Ru H = f + ∇ · (A∇u H ) for sufficiently smooth u H . We observe that the fine-scale problem (6.2b) can be reformulated to a(u f , w) = Ru H , w for all w ∈ W, (6.11) where we see that the fine-scale features u f are expressed in terms of the coarsescale residual. In the previous subsection we turned to a corresponding Green's function formulation of problem (6.11). However, identifying the global corrector Green's function (or fine-scale Green's function) associated with problem (6.11) is no longer as simple as in Section 6.1, where it was just given as a combination of classical local Green's functions. For that reason, we present a formulation of the variational multiscale method that is purely based on (Green's) operators and that does not use Green's functions.
Let us specify what we mean by the Green's operator associated with problem (6.11). Let u ∈ V be the solution to the multiscale problem Au = I f as given by (4.1). We consider the setting of Section 6.2, where the fine-scale space W ⊆ V is a closed subspace. The operator G f : V * → W is called the corresponding corrector Green's operator if, for F ∈ V * , the image G f F ∈ W is defined as the unique solution to a(G f F, w) = F(w) for all w ∈ W. (6.12) Since W is closed and a is elliptic, the operator is well-defined. In the literature, the corrector Green's operator G f defined above is also called the fine-scale Green's operator, or simply the corrector operator. Using this, we can now write u f as the corrector Green's operator applied to the coarse-scale residual, that is, with (6.11) we obtain This can be plugged into the coarse-scale equation (6.2a) to obtain that u H ∈ U H is the unique solution to the discrete problem Without structural assumptions, we derived an ideal stabilized Galerkin method that is fully formulated in the coarse space U H . Furthermore, due to the initially assumed decomposition u = u H + u f , we obtain the equality u = u H + (G f • R)u H . (6.14) This equality shows that we can use the valuable information provided by the 62 R. A , P. H D. P Green's operator G f to even recapture the full exact solution u from u H by adding the correction term (G f • R)u H . This motivates why G f is called the corrector operator, as it is used to correct u H to obtain a good H 1 -approximation from an approximation that is only accurate in L 2 (D). Since G f is a linear operator, we conclude from the definition of the residual that Consequently we can write (6.13) as Based on the findings in the previous paragraph, we can now formulate the framework of the VMS. Before we do that, note that none of the previous considerations require the actual solution space for problem (4.1) to be V = H 1 0 (D). Hence it could be any closed subspace of H 1 0 (D), as long as the decomposition V = U H ⊕ W is adapted accordingly.
Considering this setting, let G f : V * → W denote the corrector Green's operator introduced in (6.12) and define the full correction operator C : U H → W by C := G f • A. This means that for v H ∈ U H the image Cv H ∈ W is the unique solution to a(Cv H , w) = a(v H , w) for all w ∈ W. for all v H ∈ U H . Furthermore, using (6.14) and (6.15), the solution u ∈ V of (4.1) can be decomposed into Since (G f •I) f ∈ W, this gives an analogous characterization of the error, as noted earlier in the first part of Lemma 3.8. Clearly the quality of this decomposition depends on the choice of the coarse and the fine-scale space; see Hughes and Sangalli (2007) and Peterseim (2016). In Section 3 we saw that if W is, for example, selected as the kernel of the H 1 (D)-stable projection P H into the P 1 -finite element space U H , then we typically find that that is, (1 − C) u H is a good H 1 -approximation of u.

Remark 6.4 (link to local Green's functions).
We immediately see that the local Green's functions approach in Section 6.1 is a particular realization of the VMS. Therein, U H is a coarse finite element space subject to a simplicial triangulation T H and W = T ∈T H H 1 0 (T). The full space is defined as V = U H ⊕ W H 1 0 (D). For g ∈ L 2 (D), the local Green's operator G f is given by dy for x ∈ D and where G T denotes the local (element) Green's function defined in (6.3). In general, however, we cannot say if the reference solution in V (which we then approximate) is a good H 1 -approximation.
In order to remove the contribution (G f • I) f from (6.17), which we do not want to compute in practice, we can introduce the dual corrector C * : U H → W via a(w, C * v H ) = a(w, v H ) for all w ∈ W.
Using this and the definition of (G f • I) f ∈ W, we have Furthermore, it is straightforward to see that Plugging the last two equalities into (6.17), we observe that the variational formulation is equivalent to finding u H ∈ U H such that for all v H ∈ U H . Again, recalling Remark 3.4 and Theorem 3.5, we recover in (6.19) the abstract variational formulation (3.8). Analogously to these earlier findings we can obtain a (more compact) generalized finite element formulation by introducing the primal and dual multiscale spaces by We observe that (6.19) is equivalently described by a Petrov-Galerkin formulation that we fix in the following theorem.
Theorem 6.5 (ideal variational multiscale scheme). Consider the setting of this subsection. Then there is a unique functionũ H = (1 − C) u H ∈ U H with a(ũ H ,ṽ H ) = ( f ,ṽ H ) L 2 (D) for allṽ H ∈ V H . (6.20) Furthermore, we can characterize the error betweenũ H and the exact solution u to the multiscale problem (4.1) as Hence, to obtain error estimates, it is sufficient to study the source-dependent detail function (G f • I) f . This is a general variational approach to numerical homogenization. It leaves open how to pick the crucial spaces U H and W and how to localize the construction of the corrector operators C and C * . Nevertheless, we can draw connections to the abstract framework from Section 3, which contains the above ideal VMS as a global and conforming realization and which thus gives a comprehensive answer concerning what properties U H and W need to fulfil and how to localize the methodology without sacrificing the order of accuracy.
To be precise, for conforming subspaces U H ⊆ V it makes sense to consider an H 1 (D)-stable projection P H : V → U H in the spirit of (3.15) and to define W as the kernel of this projection. The abstract error estimates in Section 3.3 guarantee accurate numerical results for the ideal method, provided that U H is rich enough to contain good L 2 -approximations of u. In order to localize the approach, we essentially require that the kernel of the projection P H is quasi-local in the sense of property (3.21). If this is the case, the idealized VMS (6.20) can be well approximated by a localized problem. The resulting properties and the precise numerical homogenization strategy were discussed in Section 3.7.

Residual-free bubbles and classical MsFEM
It was shown in Brezzi et al. (1997) that Hughes' initial local Green's function approach from Section 6.1 can be reinterpreted in the context of residual-free bubbles (Franca and Russo 1996).
As already indicated before, bubble functions, i.e. smooth functions that are only supported in the interior of a single grid element, were originally used to stabilize Galerkin methods, such as advection-dominated problems or the Navier-Stokes equation (Brezzi et al. 1992, Brezzi andRusso 1994). This then motivated the consideration of similar stabilization strategies for multiscale problems. However, starting from a coarse finite element space U H , classical polynomial bubbles would not improve the situation, since they are still insensitive to fine-scale features. Consequently there was a need for new specifically designed bubble functions, which are referred to as residual-free bubbles. We restrict our considerations again to the standard P 1 -finite element space. Setting 6.6. We consider the elliptic multiscale problem (4.1). For the discretization, let U H denote the P 1 -finite element space arising from a regular simplicial triangulation T H . The set of interior nodes is denoted by N H , and Λ z ∈ U H equals the corresponding nodal basis function to z ∈ N H .
The aim is to construct multiscale element bubbles which solve local multiscale problems. These functions are added on top of the conventional finite element basis functions in order to enrich them with multiscale information. This heuristic idea leads to what we call the method of residual-free bubbles.
We consider Setting 6.6. Let T ∈ T H denote an arbitrary coarse grid element and let z ∈ N H ∩ T. We define the corresponding element bubble b T,z ∈ H 1 0 (T) to be the solution to a(b T,z , v) = −a(Λ z , v) for all v ∈ H 1 0 (T). (6.21) Furthermore, we define the source bubble b T, f ∈ H 1 0 (T) to be the solution to a(b T, f , v) = ( f , v) L 2 (T ) for all v ∈ H 1 0 (T).
( 6.22) The corresponding residual-free bubble space B T ⊆ H 1 0 (T) on T is then defined by B T := span{b T, f } ⊕ span{b T,z | z ∈ N H ∩ T } and a multiscale space is defined by Combining the definition of b T, f in (6.22) with (6.25), we obtain a(v H + b T, f − C T v H , v) = ( f , v) L 2 (T ) for all v ∈ H 1 0 (T). (6.26) Let v ∈ T ∈T H H 1 0 (T) be written as v = T ∈T H v T . By summing up over all T ∈ T H , we conclude from (6.26) that (6.27) Note that (6.27) specifically holds for any v ∈ B ⊆ T ∈T H H 1 0 (T). Now that we have derived a fine-scale coupling condition in the bubble space, we need to determine the coarse part.
Step 2. We introduce the linear space Let v H ∈ U H . Since ∇v H is a constant on T, we have (∇v H , ∇v) L 2 (T ) = 0 for all v ∈ H 1 0 (T). This implies (∇v H , ∇v H ) L 2 (T ) = (∇v H , ∇v H + ∇v) L 2 (T ) for all v ∈ H 1 0 (T) and hence, with v = −C T v H , we observe that This implies that the functional .
. In conclusion, we have shown that there exists a u H ∈ U H such that, for all v H ∈ U H , Together with (6.27) we conclude that u H + Su H ∈ U B H fulfils a(u H + Su H , v) = ( f , v) for all v ∈ U B H . As u B H is unique, we have u B H = u H + Su H . As a direct consequence of Proposition 6.8, we observe that the method of residual-free bubbles coincides with the local Green's function approach that we presented in Section 6.1 (i.e. the simplest realization of a variational multiscale scheme). We summarize this observation in the following conclusion.