Integrating Data Across Misaligned Spatial Units

Theoretical units of interest often do not align with the spatial units at which data are available. This problem is pervasive in political science, particularly in subnational empirical research that requires integrating data across incompatible geographic units (e.g., administrative areas, electoral constituencies, and grid cells). Overcoming this challenge requires researchers not only to align the scale of empirical and theoretical units, but also to understand the consequences of this change of support for measurement error and statistical inference. We show how the accuracy of transformed values and the estimation of regression coefficients depend on the degree of nesting (i.e., whether units fall completely and neatly inside each other) and on the relative scale of source and destination units (i.e., aggregation, disaggregation, and hybrid). We introduce simple, nonparametric measures of relative nesting and scale, as ex ante indicators of spatial transformation complexity and error susceptibility. Using election data and Monte Carlo simulations, we show that these measures are strongly predictive of transformation quality across multiple change-of-support methods. We propose several validation procedures and provide open-source software to make transformation options more accessible, customizable, and intuitive.

Misaligned units of analysis present challenges for users of geospatial data. Researchers studying legislative elections in the United States might observe data for variables at the electoral district level (e.g., campaign strategies) and at the county level (e.g., crime). To understand how, for example, local crime influences campaign strategies, one must integrate two datasets, using measured values at the county level to estimate levels of crime in each legislative district. Statistically, this represents a change-of-support (CoS) problem: making inferences about a variable at one geographic support (destination units) using measurements from a different support (source units). Changes of support entail information loss, potentially leading to consequential measurement error and biased estimation. Substantively, this is a general problem of mismatch between needing data at theoretically relevant levels and the reality that data may be unavailable at those levels.
How prevalent are such problems in social science? They are quite common, routinely appearing in studies of subnational (within-country) variation, where data are accessible at disparate levels of analysis, reflecting varying geographic precision, or different definitions of the same units across data sources. Units and scales (e.g., administrative areas, postal codes, and grids) do not always correspond to theoretical quantities of interest. Nor do they always relate in straightforward ways. While some units perfectly nest (e.g., U.S. counties and states), others overlap only partially (e.g., counties and legislative districts). 1 If testing the implications of theory requires data at one geographic support (e.g., electoral constituency), but theoretically relevant data exist at another support (e.g., administrative unit), 1 While CoS problems can include data transformations involving points, lines, and continuous surfaces (rasters), we limit our scope to areal units (polygons).

Problem Setup
The geographic support of a variable is the area, shape, size, and orientation of its spatial measurement. CoS problems emerge when making statistical inferences about spatial variables at one support using data from another support. One general case occurs when no data for relevant variables are available for desired spatial units. For example, theory may be specified at the level of one unit (e.g., counties), but available data are either at smaller levels (e.g., neighborhoods), at larger levels (e.g., states) or otherwise incompatible (e.g., grid cells, legislative districts, and police precincts). The second case arises when multiple data sources define the same units differently. Data on geographic areas vary in precision and placement of boundaries, with few universally accepted standards for assigning and classifying units (e.g., handling disputed territories). A third case involves variation in unit geometries over time. Historical changes in the number of units (e.g., splits, consolidations, and reapportionment), their boundaries (e.g., annexation, partition, and redistricting), and their names can all occur. Changes of support involve potentially complex and interdependent choices, affecting data reliability and substantive inferences. Yet a CoS is often unavoidable, since the alternative-using data from theoretically inappropriate units-is itself feasible only if all other data are available for those units.
CoS problems relate to others: ecological inference (EI)-deducing lower-level variation from aggregate data (Robinson 1950)-the modifiable areal unit problem (MAUP)-that statistical inferences depend on the geographical regions at which data are observed (Openshaw and Taylor 1979)-and Simpson's paradox-a more general version of MAUP, where data can be grouped in alternative ways affecting inference. In each case, inferential problems arise primarily due to segmenting of data into different units (e.g., in geographic terms, "scale effect" in MAUP, or "aggregation bias" in EI), or due to differences in unit shape and the distribution of confounding variables (e.g., "zoning effect" in MAUP, or "specification bias" in EI) (Morgenstern 1982).
Geostatisticians view EI and MAUP as special cases of CoS problems (Gotway and Young 2002). In political science, cross-level inference problems have bedeviled research into micro-level attitudes and behavior. Because information is inevitably lost in aggregation, using aggregate data to infer information about lower-level phenomena likely introduces error (see survey in Cho and Manski 2008). We focus on more general transformations from one aggregate unit to another, involving not only disaggregation (as in EI), but also possible combinations of disaggregation and aggregation across non-nested units. As with EI and MAUP, no general solution to CoS problems exists. But we can identify conditions under which these problems become more or less severe.

A General Framework for Changes of Support
Destination unit here refers to the desired spatial unit given one's theory, and source unit refers to the unit at which data are available. Consider two dimensions. The first, relative nesting, captures whether source units fall completely and neatly inside destination units. If perfectly nested, CoS problems become computationally simpler, and can sometimes be implemented without geospatial transformations (e.g., aggregating tables by common ID, like postal code). If units are not nested, CoS requires splitting polygons across multiple features, and reallocating or interpolating values. Nesting is a geometric concept, not a political one: even potentially nested units (e.g., counties within states) may appear non-nested when rendered as geospatial data features. Such discrepancies may be genuine (e.g., historical boundary changes), or driven by measurement error (e.g., simplified vs. detailed boundaries), or differences across sources.
The second dimension, relative scale, adds additional, useful information. It captures whether source units are generally smaller or larger than destination units. If smaller, transformed values will represent aggregation of measurements taken at source units. If larger, transformed values will entail disaggregation-a more difficult process, posing nontrivial EI challenges (Anselin and Tam Cho 2002;King 1997). Many practical applications represent hybrid scenarios, where units are relatively smaller, larger, or of similar size, depending on location. For instance, U.S. congressional districts in large cities are smaller than counties, but in rural areas they are larger than counties.
To illustrate, consider relative nesting and scale among three sets of polygons in Figure 1: (a) electoral precincts in the U.S. state of Georgia in 2014, (b) Georgia's electoral constituencies (congressional districts) in 2014, and (c) regular hexagonal grid cells (half-degree in diameter) covering the state. Georgia is an interesting case study due to its diverse voting population and judicial history surrounding elections. Legal challenges and court rulings require that the state's data on electoral boundaries be accurate and publicly available at granular resolution (Bullock III 2018).
Precincts report to constituencies during the administration of elections, so units in Figure 1a should be fully nested within and smaller than those in Figure 1b. The intersection in Figure 1d confirms that every precinct falls inside a larger constituency, and-except for small border misalignments on the Atlantic coast-the transformation does not split precincts into multiple parts. If constituency IDs are available for precincts, a change of support from (a) to (b) could be reduced to calculating group sums, a straightforward procedure.
The hexagonal grid in (c) presents more difficulty. The grid cells were drawn independently of the other two layers, and are not nested. A transformation from (b) to (c) requires splitting constituencies across multiple grid cells, and vice versa. The overlay in Figure 1e suggests that cells are smaller than most, but not all, constituencies (e.g., Atlanta metro area). Changes of support to and from (c) therefore require complex spatial operations, accounting for differences in scale and misalignment of boundaries.
While visual assessments of nesting and scale can be informative, they introduce subjectivity and are often infeasible. Small geometric differences are difficult to detect visually, and partial degrees of nesting are hard to characterize consistently. Visual inspections are also slow and not scalable for batch processing, which requires automated subroutines.
We thus propose two nonparametric measures of relative nesting and scale. Let G S be a set of source polygons, indexed i = 1, . . ., N S , and let G D be a set of destination polygons, indexed j = 1, . . ., N D . Let G S XD be the intersection of these polygons, indexed i X j = 1, . . ., N S XD : N S XD ě max(N S , N D ). Let a i be the area of source polygon i, let a j be the area of destination polygon j, and let a i Xj be the area of i X j : a i Xj ď min(a i , a j ). 3 Let M S XD be an N S XDˆ3 matrix of indices mapping each intersection i X j to its parent polygons i and j. M i XD is a subset of this matrix, indexing the N i XD intersections of polygon i (see Section A2 of the Supplementary Material for examples). Let 1(¨) be a Boolean operator, equal to 1 if "¨" is true. Our first measure-relative nesting (R N )-captures how closely source and destination boundaries align, and whether one set fits neatly into the other: which reflects the share of source units that are not split across destination units. Values of 1 indicate full nesting (no source units are split across multiple destination units), and a theoretical lower limit of 0 indicates no nesting (every source unit is split across many destination units). R N has similarities with the Herfindahl-Hirschman Index (Hirschman 1945) and the Gibbs-Martin index of diversification (Gibbs and Martin 1962), which the electoral redistricting literature has used to assess whether "communities of interest" remain intact under alternative district maps (Chen 2010). A value of R N = 1, for example, corresponds to redistricting plans in which every source unit is assigned to exactly one destination unit (e.g., "Constraint 1" in Cho and Liu 2016).
The second measure-relative scale (R S)-captures whether a CoS task is generally one of aggregation or disaggregation: which is the share of intersections in which source units are smaller than destination units. Its range is 0 to 1, where 1 indicates pure aggregation (all source units are smaller than intersecting destination units) and 0 indicates no aggregation (all source units are at least as large as destination units). Values between 0 and 1 indicate a hybrid (i.e., some source units are smaller, others are larger than destination units). Table 1 reports pairwise R N and R S measures for the polygons in Figure 1, with source units in rows and destination units in columns. The table confirms that precincts are always smaller (R S = 1) and almost fully nested within constituencies (R N = 0.98), with the small difference likely due to measurement error. Precincts are also smaller than (R S = 1) and mostly nested within grid cells (R N = 0.92). At the opposite extreme, obtaining precinct-level estimates from constituencies or grid cells would entail disaggregation (R S = 0) into non-nested units (0.01 ď R N ď 0.05). Other pairings show intermediate values: hybrids of aggregation and disaggregation, where changes of support require splitting many polygons.
Before proceeding, let's briefly consider how R N and R S relate to each other, with more details in the Supplementary Material. Note, first, the two measures are not symmetric (i.e., R S S,D 1´R S D,S , R N S,D 1´R N D,S ), except in special cases where the outer boundaries of G S and G D perfectly align, with no "underlapping" areas as in Figure 1e. Section A2 of the Supplementary Material considers extensions of these measures, including symmetrical versions of R S and R N , conditional metrics defined for subsets of units, measures of spatial overlap, and metrics that require no area calculations at all. Second, as Table 1 suggests (and Section A2 of the Supplementary Material shows), the two measures are positively correlated. Relatively smaller units mostly nest within larger units. Yet because R N is more sensitive to small differences in shape, area, and orientation, unlike R S, it rarely reaches its limits of 1 or 0. Even pure aggregation (e.g., precinct-to-grid, R S = 1) may involve integrating units that are technically not fully nested (R N = 0.92). One barrier to reaching R N = 1 is that polygon intersections often result in small-area "slivers" due to minor border misalignments, but these can be easily excluded from calculation. 4 R N and R S can diverge, within a limited range, like in the precinct-to-grid example in Table 1. As Section A3 of the Supplementary Material shows using randomly generated maps, such divergence reflects the fact that the distributions of R N and R S have different shapes: R S is bimodal, with peaks around R S = 0 and R S = 1, while R N is more normally distributed, with a mode around R N = 0.5. The relationship between the two measures resembles a logistic curve, where numerical differences are largest in the tails and smallest in the middle. Differences between the measures tend to be numerically small. We observe no cases, for instance, where R S ą 0.5 and R N ă 0.5 (or vice versa) for the same destination and source units.

How Nesting and Scale Affect Transformation Quality
How do the accuracy and bias of transformed values vary with relative nesting and scale? We evaluated the performance of several CoS algorithms in two applications: (1) transformations of electoral data across the polygons in Figure 1, and (2) a Monte Carlo study of CoS operations across randomly generated synthetic polygons. A comprehensive review of CoS methods, their assumptions and comparative advantages, is beyond the scope of this paper (see summary in Section A4 of the Supplementary Material). Instead, we focus on how relative scale and nesting affect the reliability of spatial transformations in general, holding one's choice of CoS algorithm constant. Specifically, we compare transformed values in destination units to their "true" values across multiple CoS operations.
Let K be a set of CoS algorithms. Each algorithm, indexed k P {1, . . ., K }, specifies a transformation f k (¨) between source units G S and destination units G D . These transformations range from relatively simple operations that require no data beyond two sets of geometries, to more complex operations that incorporate information from covariates. Let x GS be an N Sˆ1 vector of observed values in source units G S , and let x GD be the N Dˆ1 vector of "true" values in destination units G D . Let x GD (k ) = f k (x GS ) be a vector of estimated values for x GD , calculated using CoS algorithm k. These transformed values are typically point estimates, although some methods provide uncertainty measures. Consider the following CoS algorithms: • Simple overlay. This method requires no re-weighting or geostatistical modeling, and is standard for the aggregation of event data. For each destination polygon, it identifies source features that overlap with it (polygons) or fall within it (points and polygon centroids), and computes statistics (e.g., sum and mean) for those features. If a source polygon overlaps with multiple destination polygons, it is assigned to the destination unit with the largest areal overlap. Advantages: speed, ease of implementation. Disadvantages: generates missing values, particularly if N S ! N D . • Area weighted interpolation. This is a default CoS method in many commercial and opensource GIS. It intersects source and destination polygons, calculates area weights for each intersection, and computes area-weighted statistics (i.e., weighted mean and sum). It can also handle point-to-polygon transformations through an intermediate tessellation step (Section A4 of the Supplementary Material). Advantages: no missing values, no ancillary data needed. Disadvantages: assumes uniform distribution in source polygons.
• Population weighted interpolation. This method extends area-weighting by utilizing ancillary data on population or any other covariate. It intersects the three layers (source, destination, and population), assigns weights to each intersection, and computes population-weighted statistics. Advantages: softens the uniformity assumption. Disadvantages: performance depends on quality and relevance of ancillary data. 5 • TPRS-Forest. This method uses a nonparametric function of geographic coordinates (thinplate regression spline [TPRS]) to estimate a spatial trend, capturing systematic variation or heterogeneity (Davidson 2022b). It uses a Random Forest to model spatial noise, reflecting random, non-systematic variation. For each destination unit, it calculates a linear combination of trend and noise. Advantages: needs no ancillary data, provides estimates of uncertainty. Disadvantages: computationally costly. • TPRS-Area weights. This is a hybrid of TPRS-Forest and areal interpolation. It decomposes source values into a non-stationary geographic trend using TPRS, and performs areal weighting on the spatial residuals from the smooth functional output (Davidson 2022a).
Advantages: provides estimates of uncertainty for area weighting; can optionally incorporate ancillary data. Disadvantages: computationally costly. • Ordinary (block) kriging. This model-based approach is widely used as a solution to the CoS problem in the natural and environmental sciences (Gotway and Young 2007). It uses a variogram model to specify the degree to which nearby locations have similar values, and interpolates values of a random field at unobserved locations (or blocks representing destination polygons) by using data from observed locations. Advantages: provides estimates of uncertainty. Disadvantages: can generate overly smooth estimates, sensitive to variogram model selection, assumes stationarity. • Universal (block) kriging. This extends ordinary kriging by using ancillary information. It interpolates values of a random field at unobserved locations (or blocks), using data on the outcome of interest from observed locations and covariates (e.g., population) at both sets of locations. Advantages: relaxes stationarity assumption. Disadvantages: ancillary data may not adequately capture local spatial variation. • Rasterization. This is a "naive" benchmark against which to compare methods. It converts source polygons to raster (i.e., two-dimensional array of pixels), and summarizes the values of pixels that fall within each destination polygon. Advantages: no modeling, re-weighting or ancillary data. Disadvantages: assumes uniformity.
We employ two variants of the first three methods, using (a) polygon source geometries, and (b) points corresponding to source polygon centroids. These approaches represent different use cases: a "data-rich" scenario where full source geometries are available, and a "data-poor" scenario with a single address or coordinate pair. We also implement two variants of TPRS-Forest: (a) spatial trend only, and (b) spatial trend plus residuals.
We use three diagnostic measures: (1) root mean squared error,

Illustration: Changing the Geographic Support of Electoral Data
Our first illustration transforms electoral data across the polygons in Figure 1. We demonstrate generalizability through a parallel analysis of Swedish electoral data (Section A5 of the Supplementary Material). 6 5 A more general disadvantage of CoS methods reliant on ancillary data is that using a covariate z to impute values of x will bias (predetermine), by construction, any later estimate of association ofx with z. 6 We used data on Georgia's 2014 elections to the U.S. House of Representatives, and Sweden's 2010 elections to the Riksdag, due to the availability of high-precision boundary information and vote tallies.
The variable we transformed was Top-2 Competitiveness, scaled from 0 (least competitive) to 1 (most competitive): Top-2 Competitiveness = 1´winning party vote share margin (3) = valid votes´(votes for winner´votes for runner-up) valid votes .
We obtained "true" values of competitiveness for precincts ( Figure 1a) and constituencies (Figure 1b) from official election results, measuring party vote counts and votes received by all parties on the ballot (Kollman et al. 2022). For grid cells (Figure 1c), we constructed aggregates of valid votes and their party breakdown from precinct-level results.
Our analysis did not seek to transform Top-2 Competitiveness directly from source to destination units. Rather, we transformed the three constitutive variables in Equation (4)-valid votes, and votes for the top-2 finishers-and reconstructed the variable after the CoS. In Section A6 of the Supplementary Material, we compare our results against those from direct transformation.
Because the purpose of much applied research is not univariate spatial transformation, but multivariate analysis (e.g., effect of x on y), we created a synthetic variable, y i = α + β x i + i , where x i is Top-2 Competitiveness in unit i, α = 1, β = 2.5, and " N (0, 1). We assess how changes of support affect the estimation of regression coefficients, in situations where the "true" value of that coefficient (β = 2.5) is known. Figure 2 shows transformed values of Top-2 Competitiveness ( x GD (k ) ) alongside true values in destination units (x GD ), where darker areas are more competitive. Figure 2a reports the results of precinct-to-constituency transformations (corresponding to "a X b" in Figure 1d). Figure 2b reports constituency-to-grid transformations ("b X c" in Figure 1e). Of these two, the first set of transformations (where R N = 0.98, R S = 1) more closely resembles true values than the second set (R N = 0.29, R S = 0.12), with fewer missing values and implausibly smooth or uniform predictions. Figure 3 reports fit diagnostics for the full set of CoS transformations across spatial units in Georgia (vertical axes), as a function of the transformations' relative nesting and scale (horizontal axes). Each point corresponds the quality of fit for a separate CoS algorithm. The curves represent fitted values from a linear regression of each diagnostic on source-to-destination R N and R S coefficients. Gray regions are 95% confidence intervals.
The results confirm that the accuracy of CoS transformations increase in R N and R S. RMSE is lower, correlation is higher, and OLS estimation bias is closer to 0 where source units are relatively smaller ( Figure 3a) and more fully nested (Figure 3b). OLS bias is negative (i.e., attenuation) when R N and R S are small and shrinks toward 0 as they increase.
Our analysis also reveals substantial differences in relative performance of CoS algorithms. Overall, simpler methods like overlays and areal interpolation produce more reliable results. For example, median RMSE-across all levels of R N and R S-was 0.22 or lower for both types of simple overlays, compared to 0.43 for universal kriging. Median correlation for simple overlays was 0.85 or higher, compared to 0.12 for universal kriging. Simple overlays also returned the smallest OLS bias, with a median of´0.15, compared to a quite severe underestimation of OLS coefficients for universal kriging (´2.4). Similar patterns emerge in our analysis of Swedish electoral data (Section A5 of the Supplementary Material).
Because the Top-2 Competitiveness variable is a function of other variables, we considered how transformation quality changes when we transform this variable directly versus reconstructing it from transformed components. The comparative advantages of these approaches depend on the relative nesting and scale of source and destination units: indirect transformations perform better when R N and R S are closer to 1, direct transformations are preferable as R N and R S approach 0 (Section A6 of the Supplementary Material).

Illustration: Monte Carlo Study with Synthetic Polygons
To generalize, we performed Monte Carlo simulations with artificial boundaries and variables on a rectangular surface. This analysis compares fit diagnostics from the same CoS algorithms, over a broader set of transformations covering the full range of R N and R S.
We consider two use cases. First, we change the geographic support of an extensive variable, like population size or number of crimes. Extensive variables depend on the area and scale of spatial measurement: if areas are split or combined, their values must be split or combined accordingly, such that the sum of the values in destination units equals the total in source units (i.e., satisfying the pycnophylactic, or mass-preserving, property). Second, we change the support of an intensive variable, like temperature or elevation. Intensive variables do not depend on the size of spatial units; quantities of interest in destination units are typically weighted means. Two or more extensive variables can combine to create a new intensive variable, like population density or electoral competitiveness. In Section A6 of the Supplementary Material, we consider the merits of (re-)constructing these variables before versus after a CoS.
While real-world data rarely conform to a known distribution, we designed the simulated geographic patterns to mimic the types of clustering and heterogeneity that are common in data on political violence and elections (see examples in Section A7 of the Supplementary Material).   with the sequential simulation algorithm (Goovaerts 1997). 7 Figure 4b,c illustrates examples of each. In both cases, while we sought to imitate the spatially autocorrelated distribution of real-world social data, we also simulated spatially random values as benchmarks (see Section A7 of the Supplementary Material). 8 As before, we also created a synthetic variable Y = α + X β + , with α = 1, β = 2.5, " N (0, 1). 3. Change the geographic support of X from G S to G D , using the CoS algorithms listed above. We then compared transformed values x GD to the assigned "true" values x GD , and calculated the same summary diagnostics as before, adding a normalized RMSE b We ran this simulation for randomly generated polygons with N S P [10, . . ., 200] and N D P [10, . . ., 200], covering cases from aggregation (N S = 200, N D = 10) to disaggregation (N S = 10, N D = 200). We repeated the process 10 times, with different random seeds.
To facilitate inferences about the relationship between nesting and the diagnostic measures, we estimated semi-parametric regressions of the form: where k indexes CoS algorithms, and m indexes simulations. M k m is a diagnostic measure for CoS operation k m (i.e., [N]RMSE, correlation, OLS bias), f (R N k m ) is a cubic spline of R N (or R S), Method k is a fixed effect for each CoS algorithm, and k m is an i.i.d. error term. This specification restricts inferences to the effects of nesting and scale within groups of similar operations, adjusting for baseline differences in performance across algorithms. Figure 5 reports predicted values of [N]RMSE, Spearman's correlation, and OLS bias across all methods at different levels of R N , for both (a) extensive and (b) intensive variables. Section A8 of the Supplementary Material reports results for the R S coefficient, which generally align with these. As source units become more nested and relatively smaller, [N]RMSE and OLS bias trend toward 0, while correlation approaches 1. The primary difference between extensive and intensive variables is in the estimation of OLS coefficients. For extensive variables, we see attenuation bias, which becomes less severe as R N approaches 1. For intensive variables, we see attenuation bias as R N approaches 1, but inflation bias as R N approaches 0. Figure 6 shows the Monte Carlo results by CoS algorithm, for (a) extensive and (b) intensive variables. In each matrix, the first, left-most column reports average statistics for each algorithm, pooled over all values of R N . The remaining columns report average statistics in the bottom decile of R N (0%-10%), the middle decile (45%-55%), and the top decile (90%-100%). The bottom row presents median statistics across algorithms. Darker colors represent better-fitting transformations (i.e., closer to 0 for [N]RMSE and bias; closer to 1 for correlation).
First, the relative performance of all CoS algorithms depends, strongly, on R N (and R S, see Section A8 of the Supplementary Material). In most cases, the largest improvements in performance occur between the lowest and middling ranges. Where values of R N are low (bottom 10%), most algorithms fare poorly. Performance improves as R N increases, especially in the first half of the range. For example, median RMSE (intensive variables) is 1.02 for CoS operations with low R N , 0.63 for intermediate values, and 0.35 for the top decile. Most algorithms will perform better even at middling levels of R N -0.4 and up-than at lower levels.
Second, while no CoS algorithm clearly stands out, some perform consistently worse than others. For example, population weighting offers no discernible advantages in transforming intensive variables (but seemingly plenty of disadvantages) relative to simple area weighting. Ancillary data from covariates, these results suggest, do not always improve transformation quality. Centroidbased simple overlays also fare poorly throughout.
Third, some CoS algorithms are more sensitive to variation in R N than others. For example, simple overlays produce credible results for extensive variables when R N is high, while areal and population weighting results are more stable.
Are R N and R S redundant? After we condition on R N , for example, does R S add any explanatory value in characterizing the quality of CoS operations (i.e., measurement error and bias of transformed values)? As we show in Section A3 of the Supplementary Material, R N is more strongly predictive of transformation quality than R S when considering the two metrics separately. But R S is not redundant; including information about both R N and R S accounts for more variation in transformation quality than does information about R N alone. 9 In Section A3 of the Supplementary Material, we consider how divergence between R N and R S affects the relative performance of CoS methods. Not much, we conclude. The absolute proximity of R S and R N to 0 or 1 is far more predictive of transformation quality than their divergence.
Our simulations confirm that patterns from our analysis of election data-higher R N and R S are better-hold in more general sets of cases. These include changes of support between units of highly variable sizes and degrees of nesting, and transformations of variables with different properties and distributional assumptions.

What Is to Be Done?
Changes of support with medium to high relative nesting and scale tend to produce higher quality transformations, in terms of lower error rates, higher rank correlation, and lower OLS estimation bias. These patterns persist across CoS algorithms, in applications involving both extensive and 9 Because the purpose of our analysis is not to maximize model fit, but to illuminate how each measure relates to transformation quality, we used the simpler specifications here.
intensive variables. While some CoS methods do perform better than others in specific contexts, no method stands out unconditionally dominating the rest. We were unable to find a "winner" in our comparison of a dozen algorithms, using data from elections and Monte Carlo simulations that mimic different geospatial patterns. What does this mean for analysts performing CoS operations? We recommend reporting R N and R S coefficients for all CoS operations as ex ante measures of transformation complexity. This requires no data beyond the geometries of source and destination polygons, and enables readers to assess risks of poor inference: the higher the numbers, the more reliable the potential resultsand midrange numbers typically give one far more confidence than lower numbers. Also, for face validity of transformed values, good practice is to map the new distribution, and visually inspect it for strange discontinuities, missingness, "unnatural" smoothness or uniformity, and other obvious errors (as in Figure 2). We urge researchers to implement sensitivity analyses with alternative CoS methods, to show that their results do not rest on assumptions of particular algorithms.
Beyond this, advice depends on the availability of two types of "ground truth" data relevant to changes of support: atomic-level information on the distribution of variables being transformed (e.g., precinct-level vote tallies and locations of individual events), and information on the proper assignment of units (e.g., cross-unit IDs).
If researchers have access to irreducibly lowest-level (ILL) data-in addition to aggregate values in source units-we recommend using ILL data to validate spatial transformations directly. Examples of ILL data include precinct-level votes, point coordinates of events, ultimate sampling units, individual-level (micro) data, and other information that cannot be disaggregated further. With such data, one can implement CoS methods as in the above analyses, and select the algorithm that yields the smallest errors and highest correlation with aggregates of true values in destination units. Alternatively, researchers may simply use the ILL data as their source features, since they are likely to have high R N and R S scores with destination units.
If IDs for destination polygons are available for source units and R N and R S are sufficiently high, then spatial transformations via CoS algorithms may be unnecessary. For each destination polygon, one needs only to identify the source features that share the common identifier (e.g., county name and state abbreviation), and compute group statistics for those features. The ID variable must exist, however, and provide a one-to-one or many-to-one mapping. A source feature assigned to more than one destination unit needs additional assumptions about how source values are (re-)distributed.
What if there are no lower-level data for validation, and no common IDs for unit assignment? Our general advice-reporting R N and R S, visually inspecting the results, and performing sensitivity analyses-still holds. Yet the third of these steps has pitfalls. Since we cannot know which set of estimates is closest to true values, rerunning one's analysis with alternative CoS algorithms can create temptations either to be biased and choose the numbers one likes best, or to give equal weight to all algorithms, including some that could be wildly off the mark.
While cross-validation without ground truth data is a difficult topic that is beyond the scope of this article, we briefly illustrate one potential path in Section A9 of the Supplementary Material. Specifically, one can report the results of multiple CoS methods, along with a measure of how divergent each set of results is from the others, using outlier detection tests. As an analogy, this is like using multiple, imperfect instruments to detect the amount of oil underground. We may never know the true amount. But learning (for instance) that only one of the instruments has detected the presence of hydrocarbons is useful, both in the search for oil-that instrument might get it right-but also in evaluating the outlier instrument for future efforts. We caution that no set of results should be included or excluded from analysis solely on the basis of an outlier detection test. It is possible for an outlier to be more accurate than the average, and alternatively, an algorithm giving results close to average may be quite inaccurate. Yet if a CoS method frequently gives output that differs systematically from other CoS methods, further investigation may be warranted into the deviant algorithm. At the very least, this would help contextualize one's results.
To provide researchers with routines, documentation, and source code to implement these and other procedures with their own data, we developed an open-source software package (SUNGEO), available through the Comprehensive R Archive Network and GitHub. It includes functions to calculate R N , R S and related metrics, as well as functions to execute-and compare-most of the CoS methods discussed in this article. These tools should enable researchers to explore options, elucidate the consequences of choices, and design CoS strategies to meet their needs.

Conclusion
When integrating data across spatial units, seemingly benign measurement decisions can produce nontrivial consequences. The accuracy of spatial transformations depends on the relative nesting and scale of source and destination units. We introduced two simple, nonparametric measures that assess the extent to which units are nested and the range from aggregation to disaggregation. We have shown that the two measures are predictive of the quality of spatial transformations, with higher values of R N and R S associated with lower error rates, higher correlation between estimated and true values, and less severe OLS estimation bias. These measures can serve as ex ante indicators of spatial transformation complexity and error-proneness, even in the absence of "ground truth" data for validation. We also provide open-source software to help researchers implement these procedures.
Because changes of support entail information loss, the consequences of these problems will depend in part on whether one uses spatially transformed estimates for description or inference. Researchers have leeway when using transformed measures for mapping and visualization, so long as these transformed estimates correlate with the (unobserved) ground truth. The situations become more precarious when using interpolated measures for inference. Both Type II and Type I errors are possible. In the case of extensive variables, transformations with lower R N and R S scores generally result in the under-estimation of OLS coefficients, increasing the chances of false negatives. Yet there are also cases where estimation bias is in the opposite direction (e.g., intensive variables with low R N and R S), increasing the chances of false positives. More research is needed on these situations.
We reiterate that CoS problems are ignored at the peril of accurate inference, and there is no silver bullet. Researchers should document their measurement choices. This includes reporting relative nesting and scale, checking the face validity of the output, and avoiding reliance on a single CoS algorithm. We encourage future research to explore new methods for integrating spatially misaligned data.