Optimal approximants and orthogonal polynomials in several variables

We discuss the notion of optimal polynomial approximants in multivariable reproducing kernel Hilbert spaces. In particular, we analyze difficulties that arise in the multivariable case which are not present in one variable, for example, a more complicated relationship between optimal approximants and orthogonal polynomials in weighted spaces. Weakly inner functions, whose optimal approximants are all constant, provide extreme cases where nontrivial orthogonal polynomials cannot be recovered from the optimal approximants. Concrete examples are presented to illustrate the general theory and are used to disprove certain natural conjectures regarding zeros of optimal approximants in several variables.


Introduction
There are situations when understanding a space H consisting of analytic functions on some open subset of C d requires the analysis of functions of the form 1/f , where f ∈ H. In general, of course, 1/f / ∈ H and it becomes natural to look for substitutes p * ∈ H that approximate 1/f in some appropriate sense. One example of this type of investigation is the problem of determining cyclic vectors for the shift operator, or shift operators, in a Hilbert function space. In this context, f ∈ H is cyclic if the polynomial multiples of f form a dense subset of H. If the constant function 1 is assumed to be cyclic, then it is frequently the case that f ∈ H is cyclic if a constant function can be approximated in the norm of H by polynomial multiples of f . This in turn amounts, at least intuitively, to being able to approximate 1/f by polynomials.
The notion of an optimal approximant to 1/f appeared some time ago, both in the mathematical literature and previously in the engineering literature under the name least squares polynomial inverse (typically in the setting of the Hardy space H 2 ). Chui [14] attributes the notion to E.A. Robinson who apparently considered such approximation problems in the context of stationary stochastic processes [33]. In the 80s, Chui and others [25] obtained several important results for one-variable H 2 -approximants, in particular examining the location of their zeros. Least squares polynomial inverses were also studied systematically in the several complex variables setting by Delsarte, Genin, and Kamp in the late 70's. They were led to examine least squares polynomial inverses to functions in H 2 of the bidisk by problems in filtering theory [37].
In a series of recent papers by several authors, cyclic vectors in Dirichlettype spaces have been studied via polynomial substitutes to 1/f , appearing there under the name optimal approximants; at the time, the authors were not aware of the earlier works mentioned above. Optimal approximants were initially considered, and in some cases computed explicitly, in the onevariable setting [3], and were then used in [4] to exhibit non-cyclic polynomials in two-variable Dirichlet spaces. Subsequently, optimal approximants themselves have been studied in several papers, with a particular emphasis on the location of their zeros [8,7] and their boundary behavior and universality [6,10]. See [35] for a survey of optimal approximants. This present paper on optimal approximants has two complementary goals. One the one hand, we would like to draw the attention of the function theory and operator theory communities to some results and problems discussed in the engineering literature that, in our opinion, have not received enough attention. In some cases, we are also able to give simplified arguments and examples. On the other hand, we contribute to the theory in several ways. First, we explain how to extend the notion of optimal approximants to a more general several-variables setting: in principle, this part is straight-forward, but there are some technical points and choices that we need to pay particular attention to. We then show that many the nice finer properties exhibited by one-variable optimal approximants and related functions are lost in higher dimensions. Despite this, in some cases, particularly when examining orthogonal polynomials, we find a structure connected to the one variable case. Finally, natural conjectures for several variableoptimal approximants are disproved by examining specific examples.
Our paper is structured as follows. We begin, in Section 2, by setting down notation and giving a brief overview of the function spaces we are interested in. We then define optimal approximants in reproducing kernel Hilbert spaces defined in domains in C d and discuss ways of computing such approximants for a given target function. We also mention applications to the analysis of cyclic vectors and two-dimensional filters. In Section 3, we discuss weakly inner functions, which are singled out by their property of having constant optimal approximants, and their connections with classical inner functions. An idea from earlier papers in the one-variable setting is adapted to give an explicit construction of weakly inner functions. In Section 4 we examine how optimal approximants relate to orthogonal polynomials in weighted spaces, and investigate under what circumstances orthogonal polynomials can be recovered from optimal approximants. We also show that for a certain class of examples, orthogonal polynomials in two variables can be found from the known one-variable case. Section 5 is devoted to zero sets of optimal approximants and, in particular, to what is known in the engineering literature as the Shanks conjecture on regions where optimal approximants are zero-free. We review some of the existing results and then present several counterexamples to possible Shanks-type statements. Finally, Section 6 features explicit computation and discussion of optimal approximants and orthogonal polynomials for functions of the form f = 1 − a(z 1 + z 2 ). Throughout the paper, we attempt to give references to relevant previous work: we hope these sources will inspire further work even though it is likely we have overlooked some important contributions.

Optimal
furnishes a bounded linear functional. By standard Hilbert space theory, there exists an element K λ ∈ H with the reproducing property where ·, · H denotes the inner product in H(Ω). We call K λ the reproducing kernel at λ. For any orthonormal basis {φ j } ∞ j=0 for H, the reproducing kernel admits the series representation See [1] for a general introduction to Hilbert function spaces.
In this paper, we shall typically take Ω to be the unit disk, the unit bidisk, or the unit ball in C d . We shall also impose the standing assumptions that C[z 1 , . . . , z d ], the ring of polynomials in d complex variables, forms a dense subspace of H(Ω) and that the operators of multiplication by the coordinate functions, act boundedly on H.
Throughout, we will consider the following spaces of holomorphic functions to illustrate the general theory.
Dirichlet-type spaces in the disk and the bidisk. Let α ∈ (−∞, ∞) be fixed. The Dirichlet-type space D α consists of holomorphic functions f = ∞ k=0 a k z k on the unit disk D = {z ∈ C : |z| < 1} satisfying the norm boundedness condition When α = 0, we recover the standard Hardy space H 2 . The choice α = −1 corresponds to the Bergman space A 2 in the unit disk, while D = D 1 can be identified with the classical Dirichlet space consisting of functions having D |f (z)| 2 dA(z) < ∞, where dA is normalized area measure on the disk. The literature on these spaces is vast but basic introductions can be found in [18,24,19].
For α ∈ {−1, 0, 1}, explicit expressions for the reproducing kernels are known. In H 2 and A 2 , we have the usual Szegő and Bergman kernels For non-integer values of α, closed form expressions for the reproducing kernels K λ in terms of rational functions are in general not available.
We can define Dirichlet-type spaces D α 1 ,α 2 on the bidisk D 2 = (z 1 , z 2 ) ∈ C 2 : |z 1 | < 1, |z 2 | < 1 as tensor products of one-variable Dirichlet-type spaces; that is, we can take See [1] for more on this perspective. In concrete terms, D α 1 ,α 2 consists of holomorphic functions on the bidisk whose Taylor coefficients satisfy We write D α when α = α 1 = α 2 , and the norm in this case will be denoted by f α . By the general theory of reproducing kernel spaces [1], the kernel of D α 1 ,α 2 at λ = (λ 1 , λ 2 ) ∈ D 2 is a product of one-variable kernels, The Drury-Arveson space. Let B d = {z ∈ C d : z 2 < 1} denote the unit ball in C d and let S d = ∂B d be its boundary, the unit sphere, and let z, w = z 1 w 1 + · · · + z d w d denote the standard Euclidean inner product on C d . The Drury-Arveson space on B d is the reproducing kernel Hilbert space of holomorphic function of the ball determined by the kernel Basic structural properties of the Drury-Arveson space are discussed in, for instance, [36,32]; for instance, the Drury-Arveson norm is invariant under unitaries. For our purposes, it will be useful to note that the norm in H 2 d can be expressed in terms of the coefficients of f = k a k z k using standard multi-index notation: In particular, in two variables we have 2.2. Optimal polynomial approximants. Set χ 0 = 1 and let be an ordering of complex monomials z k = z k 1 1 · · · z k d d according to some chosen order. In several variables there are several natural ways to index With an ordering of monomials in place, we set (2.5) P n = span{χ j : j = 0, . . . , n}, n = 0, 1, 2, . . . .
Since χ 0 = 1 we have P 0 = span{1}, the constant polynomials. Note that P 0 ⊂ P 1 ⊂ · · · ⊂ P n ⊂ · · · is an exhaustion of C[z 1 , . . . , z d ] (viewed as a vector space) by finite-dimensional subspaces, that is, n P n = C[z 1 , . . . , z d ]. If d = 1, we typically order monomial by degree, in which case Definition 1. Let f ∈ H(Ω) be given. The nth-order optimal polynomial approximant to 1/f with respect to P n is defined as , where Proj f ·Pn : H → f · P n denotes the orthogonal projection onto the subspace f · P n .
In other words, p * n is the unique polynomial that minimizes p · f − 1 H among all p ∈ P n .
The existence and uniqueness of p * n , relative to a particular choice of {χ j }, follows immediately from Hilbert space theory: our assumption that multiplication by each variable acts boundedly on H implies that f · P n is a closed subspace of H for each n. Note that we obtain different sequences of optimal approximants depending on the contents of the P n .
The following notion of distance will also feature.
Definition 2. For a given f ∈ H and P n as above, the nth order optimal norm is defined as Note that, since the subspaces P n are nested, ν n (f ) is non-increasing as a function of n.

2.3.
Optimal approximants via Grammians. The following is a straightforward reinterpretation of previous methods of computing optimal approximants [3,20] to our present setting.
Proposition 1. Let f ∈ H\{0}. Then the coefficients of the n-order optimal approximant p * n = n j=0 c * j χ j are given by solution to the linear system where M is an (n + 1) × (n + 1) Grammian matrix with entries given by The proof is analogous to the one-variable case, see [20]; we sketch it for the reader's convenience.
Proof. By the definition of p * n , we have (p * n f −1) ⊥ P n . Thus p * n f −1, f χ i = 1, f χ i . This in turn can be rewritten as and, using linearity, we obtain the desired linear system. If 1, χ i f = 0 for all i ≥ 1, as is the case for most spaces we are interested in, then b = f (0)δ i,0 in the above proposition.
It is typically not straightforward to find {p * n } in closed form for a given f using the representation in Proposition 1. More sophisticated approaches to computation and fine analysis of optimal approximants are discussed in [15,10], for example, but are not needed for what we want to achieve in this paper.
Building on one-variable work in [3,20], we can obtain optimal approximants for some simple polynomial targets.
k=0 of strictly positive weights satisfying lim k→∞ ω(k + 1)/ω(k) = 1 and let H ω be the Hilbert function space consisting of analytic f : D → C whose power series f = ∞ k=0 a k z k satisfy Let us further assume that {z k / z k ω } is an orthonormal basis for H ω . In this setting, Fricain, Mashreghi, and Seco [20, Theorem 3.9] have found an explicit expression for the H ω -optimal approximants to 1/f for the function f = 1 − z. (See [3] for the case of Dirichlet-type spaces in the unit disk.) Indeed, we have In our discussion of higher-dimensional analogs of Example 2, we find it convenient to consider diagonal subspaces J n = span{(z 1 · · · z d ) k : k = 0, 1, . . . , n} and J = span{(z 1 · · · z d ) k : k ∈ N}.
Also, define P n using degree lexicographic order and let n denote the lowest index m for which the exponent (z 1 z 2 ) n belongs to P m . (Explicitly, 1 = 4, 2 = 12, and so on, and note that J n P n .) Example 3. We first consider optimal approximants to 1/(1 − z 1 z 2 ) in the Dirichlet-type spaces D α 1 ,α 2 in the bidisk. In [4,27], it was observed that there is an isometric isomorphism between J n , viewed as a closed subspace of D α 1 ,α 2 , and the set J n = span{z k : k = 0, 1, . . . , n} viewed as a closed subspace of D α 1 +α 2 , a Dirichlet-type space in the unit disk. Under this isomorphism, f = 1 − z 1 z 2 is mapped to F = 1 − z. Next, we note that D α 1 +α 2 can be viewed as H ω with weight sequence ω(k) = (k + 1) α 1 +α 2 . Finally, by orthogonality, the nth-order optimal approximants p * n = Proj f ·Pn [1] are polynomials in z 1 z 2 only. Thus the optimal approximants p * n change from J m to J m+1 , and stay the same for all P n containing J m and being strictly contained in P (m+1) . Now, using Example 2, we find that are the optimal approximants to 1/(1 − z 1 z 2 ) for n ≤ k < (n + 1).
Example 4. A similar analysis applies in the case of the d-dimensional Drury-Arveson space (or Dirichlet-type spaces in the unit ball, cf. [40]). By the arithmetic-geometric means inequality, the mapping sends the unit ball B d to the unit disk D. Next, we note that Together, these observations establish an isometric isomorphism between J k viewed as a closed subspace of H 2 d and the set J n sitting inside the space H ω of functions on the disk associated with the weight sequence Using this choice of weight sequence in the formula in Example 2, we obtain the polynomials and these are the optimal approximants to 1/ is the d-dimensional analog of in two variables.
For instance, in the two variable case, and so on.

Applications of optimal approximants: cyclic vectors.
Recall that a vector f ∈ H is said to be cyclic for the shift operators S 1 , . . . , S d if the invariant subspace Since the polynomials were assumed dense in all the Hilbert spaces we are considering, the function f = 1 is a cyclic vector. As is explained in [12,3], this is equivalent to having ν n (f ) → 0 as n → ∞.
One of the original applications of optimal approximants in [3,4] was to use the rate at which ν n (f ) decays to zero to not only distinguish between cyclic and non-cyclic vectors, but also to give finer distinctions between "how cyclic" different cyclic functions are.
Arguing as in Example 2, we can now use (2.8) to extract information about cyclicity of f = 1 − z 1 z 2 in the Dirichlet spaces D α 1 ,α 2 in the bidisk, and The sum in the right-hand side converges as n tends to infinity precisely when α 1 + α 2 ≤ 1. Thus, as was shown in [27], .
In particular, for the Dirichlet space Cyclic polynomials for D α 1 ,α 2 have been completely characterized, see [29,9,27], and the cyclicity/non-cyclicity part of Example 5 follows immediately from that characterization. What optimal approximants allow us to do, is to measure how far from cyclic 1 − z 1 z 2 is for different pairs of (α 1 , α 2 ). Example 6. We turn to the Drury-Arveson space H 2 d and the functions f = A short computation involving Stirling's formula shows that In This recovers an earlier result of Richter and Sundberg [31] who used the same embedding argument above, which also features in Arveson's work [2].

2.5.
Applications of optimal approximants: two-dimensional recursive filters. Another, older, application of optimal approximants relates to two-dimensional recursive filtering theory and was discussed by Shanks, Treitel, and Justice [37]. We give a brief description of their work here, and note that their work in turn was motivated by engineering applications including the study of seismic records and photographic data [37].
Given a data array D = (d j,k ) n j,k=1 , we form a two-variable polynomial . Then, in the notation of [37], a recursive filter algorithm is obtained as follows. We set is a rational function of two variables. After clearing fractions, (2.9) translates into is non-zero and dividing through, we obtain We thus have expressing the output coefficient r m,n in terms of output coefficients which are either assumed to have been previously computed or are set to zero.
In order for this scheme to be of practical use, it is desirable that the filter is stable, that is, that bounded inputs D are transformed into bounded outputs R. In light of (2.9), one expects that this would require that B(z 1 , z 2 ) = 0 for some subset of values (z 1 , z 2 ). Indeed, Justice and Shanks proved [26] that stability holds if and only if B(z 1 , z 2 ) = 0 on D 2 . Unfortunately, this need not hold for all potentially useful filters F = A/B.
To get around this difficulty, Shanks, Treitel, and Justice proposed replacing the two-variable function B by its H 2 -optimal approximants p * n . They argued that, intuitively speaking, p * n should retain "many" of the features of B. Moreover, in light of the one-variable case and numerical evidence in two variables, they conjectured [37] that two-variable optimal approximants should be non-vanishing in the closed bidisk. Thus 1/p * n would be a stabilizing filtering substitute for 1/B. Unfortunately, the Shanks-Treitel-Justice approach to stabilization does not work without additional assumptions on the target function B since there are polynomials B whose optimal approximants p * n vanish inside the bidisk, making the filter 1/p * n unstable as well. We discuss zero set problems for optimal approximants in Section 5.

Optimal approximants and weakly inner functions
3.1. Weakly Inner Functions. Certain functions in H(Ω) have the distinguishing property that their optimal approximants do not change as we increase P n . Following [16,5], we make the following definition. Corollary 8. If g ∈ H is weakly inner then ν n (g) = ν 0 (g) for all n = 1, 2 . . ..
One obvious class of weakly inner functions in the Hardy space is the class of classical inner functions: recall that a bounded holomorphic function θ : D d → C is said to be inner if |θ(ζ)| = 1 for almost every ζ ∈ T d .
When d = 1, inner functions and weakly H 2 -inner functions coincide. Weakly inner functions in the Bergman space of the unit disk are precisely the Bergman-inner functions [24,Chapter 3]. In higher dimensions, however, a new phenomenon manifests itself and the class of classically inner functions forms a subclass of all weakly H 2 -inner functions. This was originally observed by Delsarte, Genin, and Kamp, see [16,Section 8], who gave a power series example. In the next subsection, we give simpler examples.
3.2. Shapiro-Shields functions. By adapting a construction in [5], which in turn is based on an older idea of H.S. Shapiro and A.L. Shields [38], we can build weakly inner functions in any reproducing kernel Hilbert space with a finite prescribed zero set. See [13] and [28] for further generalizations.
Definition 4. Let Λ = {λ 1 , . . . , λ n } ∈ Ω\{0} be a given set of distinct points and let K H λ be the reproducing kernel of H at a point λ. Define K Λ to be the n × n matrix whose entries are given by (K Λ ) i,j = K λ i , K λ j and let 1 = (1, 1, . . . , 1) ∈ C n . The Shapiro-Shields function for H associated with Λ is defined as normalization is not essential for our purposes. Proof. The proof is a straight-forward adaptation of the one-variable proof in [5] and is sketched for the reader's convenience. First, the fact that s Λ vanishes at each λ j follows from the fact that the first column in the determinant defining s Λ is equal to the j + 1 column. To see that s Λ is non-trivial, it suffices to note that the kernels K λ 1 , . . . , K λn are linearly independent.
To establish that s Λ is weakly inner, we perform a cofactor expansion of the second argument of χ j s Λ , s λ along the first column, Finally, χ j s Λ , 1 = 0 for j ≥ 1 by orthogonality of monomials, while each χ j s Λ , K λm = χ j (λ m )s Λ (λ m ) is zero since s Λ (λ m ) = 0 for m = 1, . . . , n.
For Hardy and Bergman spaces in the unit disk, normalized Shapiro-Shields functions recover well-known inner functions, see [5]. Here, we examine such functions in the bidisk and the ball.
Example 11. The Shapiro-Shields function for H 2 (D 2 ) associated with a point (λ 1 , λ 2 ) ∈ D 2 is Several remarks are in order. As in one variable, the rational function s λ extends holomorphically to a bigger polydisk, whose radius depends on λ. Next, since s λ above is holomorphic of two variables, the function vanishes at points of the bidisk other than λ. If λ 1 = 0 or λ 2 = 0, we recover a multiple of a one-variable Blaschke factor, but in general s λ is not of product type. Finally, since s λ violates the Rudin-Stout description of rational inner functions in polydisks [34, Chapter 5], s λ is not inner in the classical sense.
Example 12. In the Bergman space A 2 (D 2 ), the Shapiro-Shields function associated with (λ 1 , λ 2 ) ∈ D 2 is Example 13. For d ≥ 1, let λ ∈ B d be a point in the unit ball. The Shapiro-Shields function for H 2 d associated with λ is It would be interesting to conduct a systematic study of weakly inner functions in general reproducing kernel Hilbert spaces.

Orthogonal polynomials
4.1. Optimal approximants and orthogonal polynomials. Another interesting aspect of optimal approximants is their connection to orthogonal polynomials of certain weighted spaces. This is discussed in one variable in [8], Section 3, where the authors write the optimal approximants in terms of orthogonal polynomials and exploit properties of orthogonal polynomials to show that, in the case of the Hardy space, optimal approximants are zero free in the unit disk. These connections were also observed by engineers in, e.g., [22], who also showed that they extend to the two variable Hardy space case and form the basis for the Shanks conjecture about the location of zeros of optimal approximants (Section 5.1).
This relationship can be generalized to a reproducing kernel Hilbert space H(Ω), with properties discussed in Section 2.1 and inner product · , · H .
Recall that for f ∈ H(Ω), the nth-order optimal polynomial approximant to 1/f with respect to P n is defined as p * n (z) = Proj f ·Pn [1](z). If we let {f φ j } be an orthonormal basis for f · P n , then we can consider the φ j to be orthonormal polynomials in a weighted space H f with inner product (4.1) g, h H f := gf, hf H .
To avoid trivialities, we assume that f is not identically zero, and does not vanish at the origin. Using the orthonormal basis {f φ j }, f p * n can be expanded as and we can cancel to get This in turn implies that In certain favorable circumstances, the relation (4.4) allows us to recover orthogonal polynomials from optimal approximants. When f is weakly inner, however, the optimal approximants p * n are all equal to the same constant. Therefore, the orthogonal polynomials cannot be extracted from the formula (4.4). In fact, we have the following.
The main example considered in the engineering applications is the weighted Hardy space of the bidisk with inner product given by where dθ 1 and dθ 2 are normalized Lebesgue measure on the circle. Similarly, for the Bergman space in the bidisk we have but for general pairs (α 1 , α 2 ), the inner product ·, · f is not expressible as a weighted integral of g and h over the bidisk. In all the D α spaces, however, and we obtain the following immediate consequence of Lemma 14.
Lemma 15. Suppose that for some f ∈ D α \ {0} and some n ∈ N, we have p * n = p * n−1 . Then φ n (0) = 0. In particular, if f is weakly inner in D α , then all orthogonal polynomials φ n in D α 1 ,α 2 ,f vanish at the origin for n ≥ 1.
Example 16. If f is a classical inner function in H 2 (D d ) then the weighted norm , ·, · f coincides with the usual H 2 norm, and the set of monomials {z k 1 z l 2 } k,l∈N yields orthogonal polynomials in the weighted space, all vanishing at the origin whenever (k, l) = (0, 0).
For weakly inner but not classically inner functions, one expects orthogonal polynomials to exhibit a more complicated structure.

A class of weighted orthogonal polynomials.
Our simple example f (z 1 , z 2 ) = 1 − az 1 z 2 (a = 1 in the bidisk a = √ 2 in the 2-ball) exhibits optimal approximants and orthogonal polynomials with interesting behavior. As discussed in Examples 3 and 4, the optimal approximants to 1/f contain only monomials of the form (z 1 z 2 ) n , that is, monomials on the main diagonal in Figure 1. Because of this, not all of the orthogonal polynomials for the weight f can be reconstructed from the optimal approximants for 1/f . This is similar to the case of a weakly inner function, but, in contrast, the differences of the optimal approximants do give non-constant polynomials that are orthogonal, just not all the polynomials needed to span the P n . For instance, the polynomial χ 1 = z 1 cannot be expressed as a linear combination of polynomials in z 1 z 2 .
Here, we assume the reproducing kernel Hilbert space H(Ω) discussed in Section 2.1 has the additional property that the monomials are pairwise orthogonal. (The Drury-Arveson space and each Dirichlet-type space have this property.) Exploiting the diagonal structure in the monomial ordering allows us express the full collection of orthogonal polynomials for a reproducing kernel Hilbert space weighted by a general polynomial in z 1 z 2 (as in (4.1)) in terms of one-variable polynomials. We begin with a lemma about the inner products of monomials in the weighted space H f .
Lemma 17. Let f (z 1 , z 2 ) = 1 + a 1 z 1 z 2 + a 2 (z 1 z 2 ) 2 + · · · + a N (z 1 z 2 ) N be a polynomial and let H be a reproducing kernel Hilbert space in which the monomials are orthogonal. Consider H f , the space weighted by f with inner product g, h H f := gf, hf H . For nonnegative integers 1 ≤ k 1 , 2 ≤ k 2 , and for an integer J such that 0 ≤ J ≤ min(k 1 , k 2 , N ), a n a n+J z k 1 +n Proof. Expanding the inner product gives Since 0 ≤ 1 ≤ k 1 and 0 ≤ 2 ≤ k 2 , there are integers J 1 , J 2 such that 0 ≤ J 1 ≤ k 1 and 0 ≤ J 2 ≤ k 2 with 1 = k 1 − J 1 and 2 = k 2 − J 2 . For each term of the sum, m − n is fixed, so for the nonzero terms, where so let J = J 1 = J 2 . Then, the conditions for the inner product to be non zero, 1 + m = k 1 + n and 2 + m = k 2 + n, become m = n + J, and because 0 ≤ m, n ≤ N , |m − n| ≤ N , so J ≤ N . Finally, we can rewrite (4.5) as N n=0 a n a m z k 1 +n a n a n+J z k 1 +n 1 z k 2 +n 2 2 f , when 1 = k 1 − J and 2 = k 2 − J. If these conditions do not hold, every term in the sum (4.5) will be zero.
We now give a structural description of the full family of orthogonal polynomials for weights of the form f = n k=0 a k (z 1 z 2 ) k . Here, we shall consider the monomials in degree lexicographic order: . . , and polynomial subspaces P n = span {χ 0 , . . . , χ n } . We let deg z j p denote the z j -degree of a multi-variable polynomial. We consider orthogonal polynomials {ϕ k } for H f , ordered so that span {ϕ 0 , . . . , ϕ n } = P n , and we assume that deg z 1 ϕ k = deg z 1 χ k and deg z 1 ϕ k = deg z 2 ϕ k = deg z 2 χ k , and that each ϕ k is monic.
There exists a unique r N ∈ C[x] such that . The bidegree of each r N is implicit from degree lexicographical ordering.
We proceed by induction on N : assume that the theorem holds for k < N , so that for each such k, we have ϕ k = z M k 1 r k (z 1 z 2 ) or ϕ k = z M k 2 r k (z 1 z 2 ). By the Gram-Schmidt process, By Lemma 17, the inner products in (4.6) are zero except when ϕ k contains a monomial of the form z A−J 1 z B−J 2 for some J ≤ min {A, B}. This can be rewritten Any ϕ k that contains a term of the form (4.7) contains only monomials that can be written as z M 1 (z 1 z 2 ) j (by the inductive hypothesis). Therefore, every term of ϕ N can be written as z M 1 (z 1 z 2 ) j , so ϕ N = z M 1 r(z 1 z 2 ).
Thus, determining two-variable orthogonal polynomials reduces to finding one variable-polynomials, one family for each row in Figure 1. It is also apparent that all off-diagonal orthogonal polynomials vanish at the origin, confirming what we had already seen from forming successive differences of the corresponding optimal approximants.
In the particular case H = H 2 with weight f (z 1 , z 2 ) = 1 − z 1 z 2 , we obtain orthogonal polynomials of a particularly attractive form: here, the r N (x) can be shown to be the orthogonal polynomials in the one variable weight 1 − x, as in [39, p.86]. N,n (z 1 , z 2 ) = z N 2 r n (z 1 z 2 ), with M, m, N, n ∈ N 0 , form an orthogonal basis for H 2 1−z 1 z 2 (D 2 ). Proof. It suffices to note that multiplication by z 1 and by z 2 is an isometry on H 2 (D 2 ), meaning that the orthogonality conditions along each row of Figure  1 reduce to a condition for the main diagonal, where orthogonal polynomials can be recovered from the optimal approximants to 1/(1 − z 1 z 2 ).

5.1.
Zero sets and the Shanks conjecture. We turn to a discussion of zero sets of optimal approximants in several variables. It is natural to ask whether optimal approximants in H(Ω) are zero-free in Ω. A variation of this question is whether the assumption that f (z) = 0 for z ∈ Ω implies that the optimal approximants to 1/f inherit the zero-free property.
The classical theory of orthogonal polynomials for L 2 can be used to show that optimal approximants in H 2 (D) are zero-free on the closed unit disk D for an arbitrary target function f : this problem was addressed by Chui in [14]. In [8], an analogous result was established for Dirichlet-type spaces D α for α ≥ 0: if f ∈ D α , f (0) = 0, then p * n (z) = 0 for all z ∈ D. By contrast, when α < 0, there are functions f ∈ D α whose optimal approximants vanish inside D; in fact, this can happen even for cyclic f , which in particular means that f (z) = 0 in D. However, the zero sets Z(p * n ) always omit a disk D(0, r(α)) whose radius is strictly smaller than 1: it was shown in [8] that this statement holds with r(α) = 2 α/2 . This was sharpened in the subsequent paper [7], and a sharp estimate on f (α) was given for the Hardy space H 2 (D) and the Bergman space A 2 (D).
As was explained in Section 2.5, non-vanishing of optimal approximants has ramifications for filter design, and zero set problems for optimal approximants in H 2 (D 2 ) have been investigated since the early 70s. In their 1972 paper [37], Shanks, Treitel, and Justice conjectured that optimal approximants to 1/f for any polynomial f would be zero-free in the bidisk: in subsequent papers in the engineering community, this became known as the Shanks conjecture.
A few years later, this strong version of the Shanks conjecture was disproved. In [21], Genin and Kamp exhibited a counterexample, and in [22], a method to construct polynomials yielding optimal approximants with zeros in the bidisk was presented. For completeness, we present a simplified version of their counterexample.
After the full Shanks conjecture had been disproved, efforts were made to prove a weaker versions of the Shanks conjecture where non-vanishing of optimal approximants in the bidisk is supposed to follow from additional assumptions on f . For instance, Delsarte, Genin, and Kamp state a "weakest form of Shanks' conjecture" in [16] where non-vanishing of the target polynomial f on the closed bidisk D 2 would guarantee that the optimal approximants to 1/f are zero-free in D 2 . An intermediate version might be to ask that the polynomial f be cyclic in H 2 (D 2 ) in order to ensure that the optimal approximants p * n have no zeros in D 2 ; this, as shown in [29], is equivalent to asking that f itself have no zeros in the open bidisk.
The paper [30] claimed to establish the weak Shanks conjecture of [16], but in [17], Delsarte, Genin, and Kamp show that this purported proof fails. As far as the authors are aware, the weak Shanks conjecture remains open for the Hardy space of the bidisk: Conjecture 21 (Weakest form of the Shanks conjecture). Suppose f ∈ C[z 1 , z 2 ] satisfies f (z) = 0 for z ∈ D 2 . Then the H 2 (D 2 )-optimal approximants to 1/f are zero-free in D 2 .
We have not been able to settle the Shanks conjecture in its weakest form in H 2 . However, we now demonstrate that it fails in other function spaces of the bidisk, including the Bergman space A 2 (D 2 ).
Example 22 (Counterexample to the Shanks conjecture for the Bergman space). Consider the irreducible polynomial This polynomial is the denominator of a rational inner function in the bidisk constructed in [11], and hence it follows that b has no zeros in the bidisk and, in particular, is a cyclic vector in the Bergman space A 2 (D), viz. [9]. However, b does have a single boundary zero at (1, 1) ∈ T 2 .
The second non-constant optimal approximant to 1/b can be computed, and has zeros inside the bidisk. We now dilate b tob Note that (as can be seen in Figure 3) the zeros ofb are now strictly outside the closed bidisk.
Remarks. The same function b produces optimal approximants which vanish in the bidisk for α 1 = α 2 = −.85. Similarly, choosing α 1 = 0, α 2 = −3, b also yields zeros in the bidisk for p * 2 . 5.2. Reproducing kernel methods. One faces several difficulties when seeking to extend the results of [14] and [8,7] on the location of zero set of optimal approximants to function spaces in several variables. Zeros of a polynomial, or indeed any holomorphic function of several complex variables, are never isolated, and we no longer have access to the fundamental theorem of algebra. We briefly revisit the reproducing kernel arguments in [8] in the multi-variable setting to see how these facts block a straight-forward extension of the proof.
Let p * n be an optimal approximant to 1/f in D α 1 ,α 2 . Then, as is explained in Section 2.1, we have K n (z, 0) = p * n (z)f (z), where K n (·, 0) is the reproducing kernel at 0 for f ·P n . Suppose for a moment that p * n is of the form p * n (z 1 , z 2 ) = (P (z 1 , z 2 ) − w 0 )Q(z 1 , z 2 ) for some w 0 ∈ C, some P ∈ C[z 1 , z 2 ] vanishing at the origin, and some Q ∈ C[z 1 , z 2 ]. We seek to determine some set K ⊂ Ω such that w 0 −P (z) = 0 for z ∈ K. As in [8, Section 4], we can write w 0 Q(z)f (z) = P (z)Q(z)f (z) − K n (z, 0) and since P Qf vanishes at the origin and is an element of f · P n , we get P Qf ⊥ K n (·, 0) by appealing to the reproducing property of K n (z, 0). This in turn implies that |w 0 | 2 Qf 2 H = P Qf 2 H + K n (·, 0) 2 H . Since K n (·, 0) ≥ 0, it follows that (5.5) |w 0 | 2 Qf 2 − P Qf 2 ≥ 0.
Up to this point, the argument is identical to that in [8]. Now, in one variable, the assumption that w 0 ∈ C is a zero of p * n allows us to take P (z) = z. In many function spaces of interest, such as the Dirichlet spaces, one has zf ≥ C(H) f for some easily computable constant C(H), and this allows us to conclude that from (5.5) that |w 0 | 2 −C(H) 2 ≥ 0, thus obtaining a lower bound on the location of zeros of the one-variable polynomial p * n (z). In several variables, there is no distinguished form of P and even if we restrict ourselves to some prescribed factor P , we are left with the task of estimating the ratio P Qf / Qf from below, and this does not seem like an easy task. Finally, assuming a lower bound on |w 0 | is obtained in this way, we would in addition need to analyze whether this lower bound places w 0 outside the range of P (z) on some subset D 2 .
We do obtain the following, again by leveraging one-variable arguments.
One can imagine variations of the above argument for other special factors such as P (z 1 , z 2 ) = z 1 , but it would clearly be desirable to find a general methods for analyzing zero sets of optimal approximants in several variables.
Question 24. Let {p * n } be optimal approximants to f ∈ D α \ {0}. Is there a compact set K ⊂ D 2 such that p * n (z) = 0 for z ∈ K and all n? Similarly, if {p * n } are optimal approximants to f ∈ H 2 d , is there a compact set K ⊂ B d such that p * n (z) = 0 for z ∈ K and all n?

6.
Explicit computations for f = 1 − a(z 1 + z 2 ) In this section, we record some observations concerning optimal approximants and orthogonal polynomials associated with a polynomial that vanishes at a single boundary point. More precisely, we consider f = 1 − a(z 1 + z 2 ) which can be viewed as a natural analog of the classical one-variable weight 1 − z. In the case of the Drury-Arveson space H 2 2 in B 2 , we take a = 1 √ 2 , and are able to exhibit closed formulas for some of the optimal approximants. Then, we turn to the bidisk, set a = 1, compute some lowdegree optimal approximants and orthogonal polynomials for Dirichlet-type spaces, and note that the situation is more complicated. This gives an example where the ball and bidisk theories are different. In Section 2, we were able to use a diagonal embedding to handle both B 2 and D 2 but here, we exploit the fact that the ball, unlike the bidisk, is invariant under unitary transformations.
Throughout, we use degree lexicographical ordering, as in Section 2.2.
6.1. Optimal approximants and orthogonal polynomials for H 2 2 . Consider f (z 1 , z 2 ) = 1 − 1 √ 2 (z 1 + z 2 ), which vanishes at ( 1 √ 2 , 1 √ 2 ) in the unit sphere S 2 . Using the Grammian method described in Section 2.3, we compute the first optimal approximants for 1/f : the one-variable Hardy norm, the norm on the right is bounded below by r N F −1 H 2 . In the above argument, we have equality throughout provided Q 2 = 0, and the result now follows.
As a corollary, we get from the one-variable results in [3,20] that f = 1 − 1 √ 2 (z 1 + z 2 ) is cyclic in the Drury-Arveson space, with distance estimate By contrast, in Example 6, we noted that