On the number of Hadamard matrices via anti-concentration

Many problems in combinatorial linear algebra require upper bounds on the number of solutions to an underdetermined system of linear equations $Ax = b$, where the coordinates of the vector $x$ are restricted to take values in some small subset (e.g. $\{\pm 1\}$) of the underlying field. The classical ways of bounding this quantity are to use either a rank bound observation due to Odlyzko or a vector anti-concentration inequality due to Hal\'asz. The former gives a stronger conclusion except when the number of equations is significantly smaller than the number of variables; even in such situations, the hypotheses of Hal\'asz's inequality are quite hard to verify in practice. In this paper, using a novel approach to the anti-concentration problem for vector sums, we obtain new Hal\'asz-type inequalities which beat the Odlyzko bound even in settings where the number of equations is comparable to the number of variables. In addition to being stronger, our inequalities have hypotheses which are considerably easier to verify. We present two applications of our inequalities to combinatorial (random) matrix theory: (i) we obtain the first non-trivial upper bound on the number of $n\times n$ Hadamard matrices, and (ii) we improve a recent bound of Deneanu and Vu on the probability of normality of a random $\{\pm 1\}$ matrix.


The number of Hadamard matrices
A square matrix H of order n whose entries are {±1} is called a Hadamard matrix of order n if its rows are pairwise orthogonal i.e. if HH T = nI n . They are named after Jacques Hadamard, who studied them in connection with his maximal determinant problem. Specifically, Hadamard asked for the maximum value of the determinant of any n × n square matrix all of whose entries are bounded in absolute value by 1. He proved [8] that the value of the determinant of such matrices cannot exceed n n/2 . Moreover, he showed that Hadamard matrices are the only ones that can attain this bound. Since their introduction, Hadamard matrices have been the focus of considerable attention from many different communities -coding theory, design theory, statistical inference, and signal processing to name a few. We refer the reader to the surveys [10,19] and the books [1,11] for a comprehensive account of Hadamard matrices and their numerous applications.
Hadamard matrices of order 1 and 2 are trivial to construct, and it is quite easy to see, by considering the first few rows, that every other Hadamard matrix (if exists) must be of order 4m for some m ∈ N. Whereas Hadamard matrices of infinitely many orders have been constructed, the question of whether one of order 4m exists for every m ∈ N is the most important open question on this topic, and remains wide open. Conjecture 1.1 (The Hadamard conjecture, [15]). There exists a Hadamard matrix of order 4m for every m ∈ N.
In this paper, we study the question of how many Hadamard matrices of order n = 4m could possibly exist for a given m ∈ N. Let us denote this number by H(n). Note that if a single Hadamard matrix of order n exists, then we immediately get at least (n!) 2 distinct Hadamard matrices by permuting all the rows and columns. Thus, if the Hadamard conjecture is true, then H(n) = 2 Ω(n log n) for every n = 4m, m ∈ N. On the other hand, the bound H(n) ≤ 2 ( n+1 2 ) is quite easy to obtain, as we will discuss in the next subsection.
This bound also appeared in the work of de Launey and Levin [3] on the enumeration of partial Hadamard matrices (i.e. k×4m matrices whose rows are pairwise orthogonal, in the limit as m → ∞) using Fourier analytic techniques; notably, while they were able to get a very precise answer to this problem (up to an overall (1 + o(1)) multiplicative factor), their techniques still did not help them to obtain anything better than the essentially trivial bound for the case of square Hadamard matrices. As our first main result, we give the only known non-trivial upper bound on the number of square Hadamard matrices. for all sufficiently large n that is a multiple of 4. Remark 1.3. In our proof of the above theorem, we have focused on the simplicity and clarity of presentation and have made no attempt to optimize this constant, since our proof cannot give a value of c H larger than (say) 1 2 whereas we believe that the correct value of c H should be close (as a function of n) to 1. We believe that proving a bound of the form H(n) = 2 o(n 2 ) will already be very interesting, and will likely require new ideas.

The approach
We now discuss the proof of the trivial upper bound H(n) ≤ 2 ( n+1 2 ) . The starting point is the following classical (and almost trivial to prove) observation due to Odlyzko. Lemma 1.5 (Odlyzko, [14]). Let W be a d-dimensional subspace of R n . Then, |W ∩ {±1} n | ≤ 2 d .
Sketch. As W is a d-dimensional space, it depends only on d coordinates. Therefore, it spans at most 2 d vectors with entries from {±1}.
The bound H(n) ≤ 2 ( n+1 2 ) is now immediate. Indeed, we construct the matrices row by row, and note that by the orthogonality of the rows, the first k rows span a subspace of dimension k to which the remaining rows are orthogonal. In particular, once the first k rows have been selected, the (k + 1) st row lies in a specified subspace of dimension n − k (the orthogonal complement of the vector space spanned by the first k (linearly independent) rows), and hence, by Lemma 1.5, is one of at most 2 n−k vectors. It follows that H(n) ≤ n−1 i=0 2 n−i = 2 ( n+1 2 ) . The weak point in the above proof is the following -while Odlyzko's bound is tight in general, we should expect it to be far from the truth in the average case. Indeed, working with vectors in {0, 1} n for the moment, note that a subspace of dimension k spanned by vectors in {0, 1} n has exactly 2 n−k vectors in {0, 1} n orthogonal to it viewed as elements of F n 2 . However, typically, the inner products will take on many values in 2Z \ {0} so that many of these vectors will not be orthogonal viewed as elements of R n . The study of the difference between the Odlyzko bound and how many {±1} n vectors a subspace actually contains has been very fruitful in discrete random matrix theory, particularly for the outstanding problem of determining the probability of singularity of random {±1} matrices. Following Kahn, Komlós and Szemerédi [13], Tao and Vu [20] isolated the following notion. Definition 1.6 (Combinatorial dimension). The combinatorial dimension of a subspace W in R n , denoted by d ± (W ), is defined to be smallest real number such that Thus, Odlyzko's lemma says that for any subspace W , its combinatorial dimension is no more than its dimension. However, improving on another result of Odlyzko [14], Kahn, Komlós and Szemerédi showed that this bound is very loose for typical subspaces spanned by {±1} n vectors: [13]). There exists a constant C > 0 such that if r ≤ n−C, and if v 1 , . . . , v r are chosen independently and uniformly from {±1} n , then In other words, they showed that a typical r-dimensional subspace spanned by r vectors in {±1} n contains the minimum possible number of {±1} n vectors i.e. only the 2r vectors consisting of the vectors spanning the subspace and their negatives.
Compared to the setting of Kahn, Komlós and Szemerédi, our setting has two major differences: (i) We are interested not in the combinatorial dimension of subspaces spanned by {±1} n vectors but of their orthogonal complements.
(ii) The {±1} n vectors spanning a subspace in our case are highly dependent due to the mutual orthogonality constraint -indeed, as the proof of the trivial upper bound at the start of the subsection shows, the probability that the rows of a random k × n {±1} matrix are mutually orthogonal is 2 −Ω(k 2 ) ; this rules out the strategy of conditioning on the rows being orthogonal when k = Ω( √ n), even if one were to prove a variant of the result of Kahn, Komlós and Szemerédi to deal with orthogonal complements.
Briefly, our approach to dealing with these obstacles is the following. For k < n, let H k,n denote a k × n matrix with all its entries in {±1} and all of whose rows are orthogonal. We will show that there exist absolute constants 0 < c 1 < c 2 < 1 such that if k ∈ [c 1 n, c 2 n] and if n is sufficiently large, then H k,n must have a certain desirable linear algebraic property; this is the only way in which we use the orthogonality of the rows of H k,n , and takes care of (ii). Next, to deal with (i), we will show that for any k × n matrix A which has this linear algebraic structure, the number of solutions x in {±1} n to Ax = 0 is at most 2 n−(1+C)k , where C > 0 is a constant depending only on c 1 and c 2 . Using these improved bounds with the same strategy as for the trivial proof, we see that for n sufficiently large, , which gives the desired improvement. We discuss this in more detail in the next subsection.

Improved Halász-type inequalities
As mentioned above, our goal is to study the number of {±1} n solutions to an underdetermined system of linear equations Ax = 0 possessing some additional structure. This question was studied by Halász, who proved the following: Theorem 1.8 (Halász,[9]). Let a 1 , . . . , a n be a collection of vectors in R d . Suppose there exists a constant δ > 0 such that for any unit vector e ∈ R d , one can select at least δn vectors a k with | a k , e | ≥ 1. Then, where ǫ 1 , . . . , ǫ n are independent Rademacher random variables i.e. they take the values ±1 with probability 1/2 each.
The constant c(δ, d), which is crucial for our applications, was left implicit by Halász. However, explicit estimates on this constant may be obtained, as was done by Howard and Oskolkov [12]. 12]). Let a 1 , . . . , a n be a collection of vectors in R d . Suppose that there exists some m ∈ N such that for every unit vector e ∈ R d , one can select at least m vectors a i 1 , . . . , a im with Remark 1.10. When a 1 , . . . , a n and u belong to Z d , as will be the case in our applications, the event ' n i=1 ǫ i a i −u ∞ < 1/2' is equivalent to the event ' n i=1 ǫ i a i = u'. In this case, it was noted by Tao and Vu (Exercise 7.2.3 in [21]) that the condition | a i j , e | ≥ 1/(2 √ d) may be relaxed to | a i j , e | > 0. However, as stated, their proof still gives a constant C(d) = Θ(d) d due to a 'duplication' step, which we will show is unnecessary.
There are two drawbacks to using the results mentioned above for the kinds of applications we have in mind. Firstly, a constant of the form C(d) = Θ(d) d does not give any non-trivial information when d = Ω(n), whereas as discussed in the proof outline, we require an improvement over the Odlyzko bound for d = Θ(n). Secondly, the hypotheses of these theorems, which involve two quantifiers ('for all' followed by 'there exists'), are quite stringent and not easy to verify; in fact, we were unable to find any direct applications of Theorem 1.8 in the literature.
Our key structural observation is that a 'pseudorandom' rectangular matrix contains many disjoint submatrices of large rank. This motivates replacing the double quantifier hypothesis by a (weaker) hypothesis involving just one existential quantifier which, as we will see, is readily verified to hold in pseudorandom situations. Moreover, while our hypothesis is weaker, we are able to obtain conclusions with asymptotically much better constants, since our structural setting allows us to efficiently leverage the existing rich literature on anti-concentration of sums of independent random variables and anti-concentration of linear images of high dimensional distributions. In particular, we are able to give short and transparent proofs of our inequalities for very general classes of distributions; in contrast, Theorems 1.8 and 1.9 hold only for (vector)-weighted sums of independent Rademacher variables, and their proofs involve explicit trigonometric manipulations. We discuss this in more detail in Sections 2.2 and 3.1.
Our first inequality is a strengthening of Theorem 1.9 in the setting of Remark 1.10, both in terms of the hypothesis and the conclusion. A more general statement appears in Theorem 3.1. Theorem 1.11. Let a 1 , . . . , a n be a collection vectors in R d which can be partitioned as A 1 , . . . , A ℓ with ℓ even such that dim R d (span{a : a ∈ A i }) =: r i . Then, Remark 1.12. This inequality is tight, as can be easily seen by taking (assuming n is divisible by d) a i to be e i mod d , where e 1 , . . . , e d denotes the standard basis of R d , in which case we can take ℓ = n/d and r 1 = · · · = r ℓ = d.
To see how Theorem 1.11 strengthens Theorem 1.9, note that the assumptions of Theorem 1.9 guarantee that there exist ℓ := ⌊m/d⌋ disjoint subsets A 1 , . . . , A ℓ such that r 1 = · · · = r ℓ = d. Such a collection of disjoint subsets can be obtained greedily by repeating the following construction ℓ times: let v 1 ∈ {a 1 , . . . , a n } be any nonzero vector that has not already been chosen in a previous iteration. Having chosen v 1 , . . . , v s for s < d, let u s ∈ (span{v 1 , . . . , v s }) ⊥ , and let v s+1 be any vector satisfying | v s+1 , u s | > 0 which has not already been chosen in a previous iteration -such a vector is guaranteed to exist since there are at least m choices of v s+1 by assumption, of which at most (ℓ − 1)d < m could have been chosen in a previous iteration. It follows that under the assumptions of Theorem 1.9, when a 1 , . . . , a n ∈ Z d , we have: Our second inequality is a 'small-ball probability' version of Theorem 1.11. In order to state it, we need the following definition.
where A HS denotes the Hilbert-Schmidt norm of A, and A denotes the operator norm of A.
. . denote the singular values of A arranged in non-increasing order. Hence, in particular, for any non-zero matrix A, 1 ≤ r s (A) ≤ rank(A), with the right inequality being an equality if and only if A is an orthogonal projection up to an isometry. We can now state our inequality. A more general version appears in Theorem 3.2.
Theorem 1.15. Let a 1 , . . . , a n be a collection vectors in R d . For some ℓ ∈ 2N, let A 1 , . . . , A ℓ be a partition of the set {a 1 , . . . , a n }, and for each i ∈ [ℓ], let A i denote the d × |A i | dimensional matrix whose columns are given by the elements of A i . Then, for every M ≥ 1 and ε ∈ (0, 1), where r s (A i ) denotes the stable rank of A i and C is an absolute constant.
For illustration, consider a situation like above where the set of vectors a 1 , . . . , a n can be partitioned into m/d subsets, each of rank d. Assume further that each a i has norm at least one, so that each of the m/d matrices has Hilbert-Schmidt norm at least √ d. Then, if the stable rank of each of these matrices is at least δd for some δ > 0, it follows that which is a big improvement over the bound coming from Theorem 1.2 provided that δ is not too small and d is large. Conjecture 1.16 (Deneanu-Vu, [4]). There are 2 (0.5+o(1))n 2 n × n {±1}-valued normal matrices.

Counting {±1}-valued normal matrices
As a first non-trivial step towards this conjecture, they showed the following. [4]). The number of n × n {±1}-valued normal matrices is at most 2 (c DV +o(1))n 2 for some constant c DV < 0.698.
The problem of counting normal matrices also boils down to the problem of counting the number of solutions to some underdetermined system of linear equations, and using our framework, it is very easy to obtain an upper bound on the number of such matrices of the form 2 (1−α)n 2 , for some α > 0. Unfortunately, it does not seem that one can get 1 − α < c DV using this simple method. However, the proof of Theorem 1.17 in [4] itself uses the Odlyzko bound at a certain stage; therefore, by using their strategy as a black-box, with the application of the Odlyzko bound at this stage replaced by our better bound, we obtain: Theorem 1.18. There exists some δ > 0 such that the number of n×n {±1}-valued normal matrices is at most 2 (c DV −δ+o(1))n 2 , where c DV denotes the constant in [4].

The Fourier transform
where x, ξ := x 1 ξ 1 + · · · + x d ξ d denotes the standard inner product on R d . For the reader's convenience, as well as to establish notation, we summarize the following basic properties of the Fourier transform which may be found in any standard textbook on analysis (see, e.g., [18]).
Then, the Fourier transforms f , g are also in .
Then, for all ξ ∈ R d , The notion of Fourier transform extends more generally to finite Borel measures on R d . For such a measure µ, the Fourier transform is a function from R d to C given by: To see the connection with the Fourier transform for functions in L 1 (R d ), note that if the measure µ is absolutely continuous with respect to the Lebesgue measure λ, then the density (more precisely, the Radon-Nikodym derivative) The only finite Borel measures we will deal with are those which arise as distributions of random vectors valued in R d . For a d-dimensional random vector X, let µ X denote its distribution. Then, we have (see, e.g., [5]): • (Fourier transform of independent random variables) Let X 1 , . . . , X ℓ be independent d-dimensional random vectors, and let S ℓ := X 1 + · · · + X ℓ denote their sum. Then, for all ξ ∈ R d , • (Inversion at atoms) Let X be a d-dimensional random vector. For any x ∈ R d , • (Fourier transform of origin-symmetric random vectors) Let X be a d-dimensional, originsymmetric random vector i.e. µ X (x) = µ X (−x) for all x ∈ R d . Then, µ X is a real-valued function.

Anti-concentration
Definition 2.1. For a random vector X valued in R d , its (Euclidean) Lévy concentration function L(X, ·) is a function from R ≥0 to R defined by: Anti-concentration inequalities seek to upper bound the Lévy concentration function for various values of δ. In the discrete setting, a particularly important case is δ = 0, which corresponds to the size of the largest atom in the distribution of the random variable X. The proofs of our Halász-type inequalities will exploit two very general anti-concentration phenomena.
The first principle states that sums of independent random variables do not concentrate much more than sums of suitable independent Gaussians. In particular, for the weighted sum of independent Rademacher variables, Erdős gave a beautiful combinatorial proof to show (improving on a previous bound of Littlewood and Offord) the following.
Theorem 2.2 (Erdős, [6]). Let a = (a 1 , . . . , a n ) be a vector in R n all of whose entries are nonzero. Let S a denote the random sum ǫ 1 a 1 + · · · + ǫ n a n , where the ǫ i 's are independent Rademacher random variables. Then, Up to a constant, this was subsequently generalized by Rogozin to handle the Lévy concentration function of sums of general independent random variables. [16]). There exists a universal constant C > 0 such that for any independent random variables X 1 , . . . , X n , and any r > 0, we have where S n := X 1 + · · · + X n .
The second anti-concentration principle concerns random vectors of the form AX, where A is a fixed m × n matrix, and X = (X 1 , . . . , X n ) is a random vector with independent coordinates. It states roughly that if the X i 's are anti-concentrated on the line, and if A has large rank in a suitable sense, then the random vector AX is anti-concentrated in space [17].
As a first illustration of this principle, we present the following lemma, which may be viewed as a 'tensorization' of the Erdős-Littlewood-Offord inequality.
Lemma 2.4. Let A be an m × n matrix (where m ≤ n) of rank r, and let X be a random vector distributed uniformly on {±1} n . Then for any ℓ ∈ N, Proof. By relabeling the coordinates if needed, we may write A as a block matrix r }. Then for any u ∈ R m , we have: where the third line follows from the law of total probability; the fourth line follows from the explicit form of BA mentioned above; the fifth line follows from the independence of the coordinates of X (j) ; and the sixth line follows from the Erdős-Littlewood-Offord inequality (Theorem 2.2). Taking the supremum over u ∈ R m completes the proof.
Remark 2.5. By using Rogozin's inequality (Theorem 2.3) instead of the Erdős-Littlewood-Offord inequality, we may generalize the lemma to handle any random vector X = (X 1 , . . . , X n ) with independent coordinates X i , provided we replace the conclusion by where C is a universal constant. For the Lévy concentration function for general δ, a version of Lemma 2.4 was proved by Rudelson and Vershynin in [17]. Theorem 2.6 (Rudelson-Vershynin, [17]). Consider a random vector X = (X 1 , . . . , X d ) where X i are real-valued independent random variables. Let δ, ρ ≥ 0 be such that for all i ∈ [d], Then, for every m × n matrix A, every M ≥ 1 and every ε ∈ (0, 1), we have where C ε = C/ √ ε for some absolute constant C > 0.
More general statements of a similar nature may be found in [17].

The replication trick
In this section, we present the 'replication trick', which allows us to reduce considerations about anticoncentration of sums of independent random vectors to considerations about anti-concentration of sums of independent identically distributed random vectors. This will be useful since the 'correct' analog of Rogozin's inequality for general random vectors with independently coordinates is not available; to our knowledge, the best result in this direction is due to Esseen [7], who proved an inequality of this form for such random vectors satisfying additional symmetry conditions, which will not be available in our applications. The statement/proof of the 'atomic' version of the replication trick (Proposition 2.7) is similar in spirit to Corollaries 7.12 and 7.13 in [21] with an important difference: we have no need for the lossy 'domination' and 'duplication' steps in [21]; instead, we ensure the non-negativity of the Fourier transform at various places by using the previously stated simple fact that the Fourier transform of the distribution of an origin-symmetric random vector is real valued, and restricting ourselves to even powers thereof.
Proposition 2.7. Let X 1 , . . . , X n be independent random vectors valued in R d . For each i ∈ [n], is an independent copy of X i . Let S n := X 1 + · · · + X n , and for any i ∈ [n], m ∈ N, letS i,m :=X for any a 1 , . . . , a n ∈ 4 · N such that a −1 1 + · · · + a −1 n = 1. Here, 4 · N denotes the subset of natural numbers given by {4m : m ∈ N}.
Proof. As before, we let µ X denote the distribution of the d-dimensional random vector X. We have: where the first line follows from the Fourier inversion formula at atoms; the second line follows from the independence of X 1 , . . . , X n ; the third line follows from Hölder's inequality; the fourth line follows from the fact that µX i (t) = | µ X i (t)| 2 (since the distribution ofX i is the autocorrelation of the distribution of X i ); the fifth line follows from the independence ofX (1) i . . . ,X (a i /2) i ; and the last line follows again from the Fourier inversion formula at atoms.
Remark 2.8. The same proof shows that when X 1 , . . . , X n are independent origin symmetric random vectors, then for any v ∈ R d for any a 1 , . . . , a n ∈ 2N such that a −1 1 + · · · + a −1 n = 1, where S i,a i denotes the sum of a i independent copies of X i . The next proposition is a version of Proposition 2.7 for the Lévy concentration function. Essentially the same proof can also be used to prove variants for norms other than the Euclidean norm.
Proposition 2.9. Let X 1 , . . . , X n be independent random vectors valued in R d . For each i ∈ [n], letX i := X i − X ′ i , where X ′ i is an independent copy of X i . Let S n := X 1 + · · · + X n , and for any i ∈ [n], m ∈ N, letS i,m :=X are independent copies ofX i . Then for any δ > 0, for any a 1 , . . . , a n ∈ 4N such that a −1 1 + · · · + a −1 n = 1.
Proof. Let 1 B δ (0) denote the indicator function of the ball of radius δ centered at the origin. We will make use of the readily verified elementary inequality By adding to each X i an independent random vector with distribution given by a 'bump function' with arbitrarily small support around the origin, we may assume that the distributions of all the random vectors under consideration are absolutely continuous with respect to the Lebesgue measure on R d , and thus have densities. For such a random vector Y , we will denote its density with respect to the d-dimensional Lebesgue measure by f Y . Then, for any v ∈ R d , we have: where the second line follows from (1); the third line follows from Parseval's formula; the fourth line follows from the convolution formula and the independence of X 1 , . . . , X n ; the sixth line follows from Hölder's inequality, along with the fact that 1 B(2δ) (ξ) is real valued for all ξ ∈ R d ; the seventh line follows from the fact that | f X i (ξ)| 2 = fX i (ξ) for all ξ ∈ R d ; the ninth line follows again from Parseval's formula; and the tenth line follows from (1). Taking the supremum over all v ∈ R d gives the desired conclusion. Remark 2.10. As in Remark 2.8, if X 1 , . . . , X n are origin-symmetric, then the same conclusion holds withS i,a i /2 replaced by S i,a i , for any a 1 , . . . , a n ∈ 2N with a −1 1 + · · · + a −1 n = 1.

Proofs of Halász-type inequalities
By combining the tools from Sections 2.2 and 2.3, we can now prove our Halász-type inequalities. All of them follow the same general outline. We begin by proving Theorem 1.11.
Proof of Theorem 1.11. Let A 1 , . . . , A ℓ be the partition of {a 1 , . . . , a n } as in the statement of the theorem. For each i ∈ [ℓ], let A i denote d × |A i | dimensional matrix whose columns are given by the elements of A i . With this notation, we can rewrite the random vector Since the random vectors X 1 := A 1 Y 1 , . . . , X n := A n Y n are origin-symmetric, and since ℓ ∈ 2N, it follows from Proposition 2.7 and Remark 2.8 that for any u ∈ R d , j are i.i.d. copies of X j . Further, since rank(A j ) = r j by assumption, it follows from Lemma 2.4 that Pr X (1) j + · · · + X (ℓ) Substituting this bound in the previous inequality completes the proof.
By using Remark 2.5 instead of Lemma 2.4, we can use the same proof to obtain the following more general statement. Let a 1 , . . . , a n be a collection vectors in R d which can be partitioned as A 1 , . . . , A ℓ such that dim R d (span{a : a ∈ A i }) =: r i . Let x 1 , . . . , x n be independent random variables, and for . We now state and prove the general small-ball version of our anti-concentration inequality.
Theorem 3.2. Let a 1 , . . . , a n be a collection vectors in R d . Let A 1 , . . . , A ℓ be a partition of the set {a 1 , . . . , a n }, and for each i ∈ [ℓ], let A i denote the d × |A i | dimensional matrix whose columns are given by the elements of A i . Let x 1 , . . . , x n be independent random variables, and for each i ∈ [n], let Then, for every M ≥ 1 and ε ∈ (0, 1), where r s (A i ) denotes the stable rank of A i , C is an absolute constant, and Proof. As before, we begin by rewriting the random vector n i=1 x i a i as ℓ i=1 A i Y i . From Proposition 2.9, it follows that for any (b 1 , . . . , b ℓ ) ∈ B, , it follows from Theorem 2.3 that where C is an absolute constant. In particular, all of the (independent) coordinates of the random vectorỸ have δ-Lévy concentration function bounded by C/ √ b i λ. Hence, it follows from Theorem 2.6 that where C is an absolute constant. Substituting this in the first inequality completes the proof.
When the x i 's are origin symmetric random variables, we may use Remark 2.10 instead of Proposition 2.9 to obtain a similar conclusion -with the infimum now over the larger set -under the assumption that min i∈ [n] (1 − L(x i , δ)) = λ. In particular, if ℓ is even, then taking b 1 = · · · = b ℓ = ℓ gives Theorem 1.15.

Proof of Theorem 1.2
As in Section 1.1.1, let H k,n denote a k × n matrix with all its entries in {±1} and all of whose rows are orthogonal. For convenience of notation, we isolate the following notion.
Definition 3.4. For any r, ℓ ∈ N, a matrix M is said to admit an (r, ℓ)-rank partition if there exists a decomposition of the columns of M into ℓ disjoint subsets, each of which corresponds to a submatrix of rank at least r.
Note that the existence of an (r, ℓ)-rank partition is a uniform version of the condition appearing in Theorem 1.11. The next proposition shows that any H k,n with k admits an (r, ℓ)-rank partition with r and ℓ sufficiently large.
Proof. The proof proceeds in two steps -first, we show that H k,n contains many non-zero k × k minors, and second, we apply a simple greedy procedure to these non-zero minors to produce an (r, ℓ)-rank partition for the desired values of r and ℓ.
The first step follows easily from the classical Cauchy-Binet formula (see, e.g., [2]), which asserts that: where M k denotes the set of all k × k submatrices of H k,n . In our case, H k,n H T k,n = nId k , so that det(H k,n H T k,n ) = n k . Moreover, since each A ∈ M k is a k × k {±1}-valued matrix, det(A) 2 ≤ k k (with equality attained if and only if A is itself a Hadamard matrix). Hence, it follows from the Cauchy-Binet formula that H n,k has at least (n/k) k non-zero minors.
Next, we use these non-zero minors to construct an (r, ℓ)-rank partition in ℓ steps as follows: In Step 1, choose r columns of an arbitrary non-zero minor -such a minor is guaranteed to exist by the discussion above. Let C k denote the union of the columns chosen by the end of Step k, for any 1 ≤ k ≤ ℓ − 1. In Step k + 1, we choose r linearly independent columns which are disjoint from C k . Then, the ℓ collections of r columns chosen at different steps gives an (r, ℓ)-rank partition of H k,n . Therefore, to complete the proof, it only remains to show that for each 1 ≤ k ≤ ℓ − 1, there is a choice of r linearly independent columns which are disjoint from C k . Since |C k | = rk, this is in turn implied by the stronger statement that there is a choice of r linearly independent columns which are disjoint from any collection C of at most rℓ columns. In order to see this, we note that the number of k × k submatrices of H k,n which have at least k − r columns contained in C is at most: where the first inequality uses 2 ≤ ℓ and the final inequality follows by assumption. Since there are at least (n/k) k non-zero minors of H k,n , it follows that there exists a k × k submatrix A k+1 of H n,k of full rank which shares at most k − r columns with C k . In particular, A k+1 contains r linearly independent columns which are disjoint from C k , as desired.
The previous proposition essentially completes the proof of Theorem 1.2. Indeed, recall from Section 1.1.1 that it suffices to show the following: there exist absolute constants 0 < c 1 < c 2 < 1 and C > 0 such that for all k ∈ [c 1 n, c 2 n], the number of solutions x ∈ {±1} n to H k,n x = 0 is at most 2 −(1+C)k . The previous proposition shows that H k,n admits an (r, ℓ)-rank partition with r = ⌊k/2⌋ and ℓ = ⌊ n/ke 4 ⌋. Hence, from Theorem 1.11, it follows that for k ∈ [1, n/15000], the number of solutions x ∈ {±1} n to H k,n x = 0 is at most 2 n−(1+1/10)k , which completes the proof. Remark 3.6. For our problem of providing an upper bound on the number of Hadamard matrices, we could have used the somewhat simpler Proposition 3.8 (instead of Proposition 3.5), which shows that there are very few H k,n which do not admit an (r, ℓ)-rank partition for sufficiently large r, ℓ. However, we used Proposition 3.5 to show that it is easy to find such a rank partition even for a given k × n system of linear equations A -indeed, the proof of Proposition 3.5 goes through as long as det(AA T ) is 'large' (which is indeed the case for random or 'pseudorandom' A), and all k × k minors of A are uniformly bounded (which is guaranteed in settings where A has restricted entries, as in our case).

Proof of Theorem 1.18
In this section, we show how to obtain a non-trivial upper bound on the number of {±1}-valued normal matrices using our general framework. As mentioned in the introduction, this bound by itself is not stronger than the one obtained by Deneanu and Vu [4]; however, it can be used in their proof in a modular fashion to obtain an improvement over their bound, thereby proving Theorem 1.18. As the proof of Deneanu and Vu is quite technical, we defer the details of this second step to Appendix A.
Following Deneanu and Vu, we consider the following generalization of the notion of normality: For any n × n matrix N , we let N (N ) denote the set of all n × n, {±1}-valued matrices which are N -normal. In particular, N (0) is the set of all n × n, {±1}-valued normal matrices. The notion of N -normality is crucial to the proof of Deneanu and Vu, which is based on an inductive argumentthey show that the quantity 2 (c DV +o(1))n 2 in Theorem 1.17 is actually a uniform upper bound on the size of the set N (N ) for any N . While this general notion of normality is not required to obtain some non-trivial upper bound on the number of normal matrices, either using our framework or theirs, we will state and prove the results of this section for N -normality, since this greater generality will be essential in Appendix A.
We begin by introducing some notation, and discussing how to profitably recast the problem of counting N -normal matrices as a problem of counting the number of solutions to an underdetermined system of linear equations. Given any matrix X, we let r i (X) and c i (X) denote its i th row and column respectively. With this notation, note that for a given matrix M , being N -normal is equivalent to satisfying the following equation for all i, j ∈ [n]: In particular, writing M in block form as: where A k is a k × k matrix, we see that (2) amounts to the following equations: : (iii) For all i, j ∈ [n − k]: We now rewrite this system of equations in a form that will be useful for our application. Following Deneanu and Vu, we will count the size of N (N ) by constructing N -normal matrices in n + 1 steps, and bounding the number of choices available at each step. The steps are as follows: in Step 0, we select n entries d 1 , . . . , d n to serve as diagonal entries of the matrix M ; in Step k for 1 ≤ k ≤ n, we select 2(n − k) entries so as to completely determine the k th row and the k th column of M -of course, these 2(n − k) entries cannot be chosen arbitrarily, and must satisfy some constraints coming from the choice of entries in Steps 0, . . . , k − 1.
More precisely, let M k denote the structure obtained at the end of Step k. Then, where the * 's denote the parts of D k which have not been determined by the end of Step k. Observe that the matrix A k , together with the first column of B k , the first row of C k , and the diagonal element d k+1 forms the matrix A k+1 ; in particular, the matrix A k+1 is already determined at the end of Step k. Moreover, both B k+1 and C k+1 are determined at the end of Step k up to their last row and last column respectively. In Step k + 1, we choose r k+1 (B k+1 ) and c k+1 (C k+1 ). In order to make this choice in a manner such that the resulting M k+1 admits even a single extension to an N -normal matrix, it is necessary that for all i ∈ [k]: Since A k+1 is completely determined by the end of Step k, and since N is fixed, we can rewrite the above equation as: for all i ∈ [k], for some N ′ k+1,i which is uniquely determined at the end of Step k. Let N ′ k be the k-dimensional column vector whose i th entry is given by N ′ k+1,i , let T k := [U V ] be the k × 2(n − k − 1) matrix formed by taking U to be the matrix consisting of the first k rows of B k+1 and V T to be the matrix consisting of the first k columns of C k+1 , and let x k be the 2(n − k − 1)-dimensional column vector given by . With this notation, (4) can be written as: The next proposition is the analogue of Proposition 3.5 in the present setting.
Proposition 3.8. Let 0 < γ < 1 be fixed, and let M be a random m × n ′ {±1}-valued random matrix. Let E γ,ℓ denote the event that M does not admit a (γm, ℓ)-rank partition. Then, The proof of this proposition is based on the following lemma, which follows easily from Odlyzko's lemma (Lemma 1.5).
By Lemma 3.9, we have for each i ∈ [t] that Therefore, since the entries of the different A i 's are independent, the probability of having more than t − ℓ indices i ∈ [t] for which rank(A i ) ≤ γm is at most: which completes the proof.
We need one final piece of notation. For 1 ≤ k ≤ n, we define the set of k-partial matricesdenoted by P k -to be {±1, * }-valued matrices of the form (3). For any n × n {±1}-valued matrix M , let M k denote k-partial matrix obtained by restricting M . For any 1 ≤ k ≤ n and any n × n matrix N , we define: In words, S k (N ) denotes all the possible k-partial matrices arising as restrictions of N -normal matrices. The following proposition is the main result of this section. Proposition 3.10. There exist absolute constants β, δ > 0 such that for any n × n matrix N , |S βn (N )| ≤ 2 (2β−β 2 )n 2 −δn 2 +o(n 2 ) .
Proof. For any m-partial matrix P and for any 1 ≤ k ≤ m, let T k (P ) denote the k × 2(n − k − 1) matrix obtained from P as in (5). We will estimate the size of S βn (N ) by considering the following two cases.
First, we bound the number of partial matrices P in P βn such that for some βn/2 ≤ k ≤ βn, T k (P ) does not admit a (γk, ℓ k )-rank-partition, where ℓ k = n ′ /2k, n ′ = 2(n − k − 1), and 0 < γ < 1 is some constant to be chosen later. For this, note that Proposition 3.8 shows that there are at most choices for such a T k (P ), provided k < n ′ /4, which holds for (say) β < 1/4. Since the remaining unknown entries of P which are not in T k (P ) are {±1}-valued, this shows that the number of βn partial matrices satisfying this first case is bounded above by for all β < 1/4.
By definition, it is clear that for any N and for any M ∼ M ′ , M is N -normal-equivalent if and only if M ′ is N -normal equivalent. On the other hand, as we will see below, one can find a permutation ρ M for any matrix M such that for the matrix M ′ := M ρ M , the ranks of many of the matrices T k (M ′ ), 1 ≤ k ≤ n are large, where T k (M ′ ) denotes the matrix from (5). Therefore, by Odlyzko's lemma, we will be able to obtain good upper bounds on the probability of the random matrix M ′ := M ρ M being C-normal, for any fixed C, which then translates to an upper bound on the probability of N -normality of M as follows: for any fixed N , where M n×n denotes the set of all n × n matrices, and we have used the fact that n! = 2 o(n 2 ) . Hence, it suffices to provide a good uniform upper bound on the probability that the random matrix M ρ M is N -normal for any fixed N .
To make the special property of the matrix M ρ M precise, we need the following functions, defined for all integers 1 ≤ s ≤ t ≤ n: The next proposition is one of the key ideas in the proof of Deneanu and Vu. Proposition A.3 (Permutation Lemma, Lemma 3.5 in [4]). Let M be any (fixed) n × n matrix. Then, there exist s, t ∈ N and ρ M ∈ S n such that M ρ M satisfies: rank(T i (M ρ M )) = R s,t (i) for all 1 ≤ i ≤ n.
For a fixed matrix N , let N s,t (N ) denote the set of {±1}-valued n × n matrices M such that M is N -normal, and rank(T i (M )) = R s,t (i) for all i ∈ [n]. Then, it follows from the previous proposition that M ρ M is N -normal if and only if M ρ M ∈ 1≤s≤t≤n N s,t (N ). This, in turn, can happen only if M itself is one of the at most n! s,t |N s,t (N )| matrices obtained by permuting the rows and columns of N s,t (N ). Hence, it suffices to provide a good upper bound on |N s,t (N )| uniformly in N, s and t.
Hence, we have showed that Case 6 can be replaced by the following two cases: Case 6.1 f (α, n, s, t) = g 1 (n, s, t) and s ≤ n/10 Case 6.2 f (α, n, s, t) = h(n, s, t) and n/10 ≤ s ≤ n/2, each of which place a restriction on β which must be larger than the constant c DV obtained in [4]. This completes the proof of Theorem 1.18.