On the development and analysis of coupled surface–subsurface models of catchments. Part 1. Analysis of dimensions and parameters for UK catchments

Abstract The objective of this three-part work is to formulate and rigorously analyse a number of reduced mathematical models that are nevertheless capable of describing the hydrology at the scale of a river basin (i.e. catchment). Coupled surface and subsurface flows are considered. In this first part, we identify and analyse the key physical parameters that appear in the governing formulations used within hydrodynamic rainfall–runoff models. Such parameters include those related to catchment dimensions, topography, soil and rock properties, rainfall intensities, Manning's coefficients and river channel dimensions. Despite the abundance of research that has produced data sets describing properties of specific river basins, there have been few studies that have investigated the ensemble of typical scaling of key physical properties; these estimates are needed to perform a proper dimensional analysis of rainfall–runoff models. Therefore, in this work, we perform an extensive analysis of the parameters; our results form a benchmark and provide guidance to practitioners on the typical parameter sizes and interdependencies. Crucially, the analysis is presented in a fashion that can be reproduced and extended by other researchers and, wherever possible, uses publicly available data sets for catchments in the UK.


Introduction
A key problem in hydrology concerns the prediction of water flow into the river.Here, the basic geometry considered is a drainage basin or a catchment site, defined by the area of land from which flow induced by precipitation will reach a specified outlet.As one would expect, the complete physics of this fluid transport problem involves a host of complex multiscale effects related to the precise geological, ecological, urban, and climate-related features of the environment.
The objective of this work, divided into three parts, is the formulation and rigorous analysis of a number of relatively simple mathematical models that, following the basic assumptions of coupled surface-subsurface models, are nevertheless capable of describing the hydrology at the scale of a typical catchment site.In particular, we wish to study the following questions: What is a relatively simple mathematical model that describes catchment-scale dynamics of subsurface and surface flow?What are the key non-dimensional quantities that govern the physics of such phenomena?What scaling laws can be predicted based on asymptotic analysis of these models?(ii) Are these reduced models justifiable given available data of catchments in the United Kingdom?What are the typical parameter values to use in such models?† Email address for correspondence: piotr.morawiecki@bath.edu‡ Email address for correspondence: p.trinh@bath.ac.uk The above forms a set of general questions of interest.We may pose much more specific ones of fluid mechanical origin.For instance: Given a typical catchment geometry in the United Kingdom with typical length scales, typical terrain slope, typical soil conductivity, and so forth, what are the predicted scaling laws characterising the flow rates into the river during a rainfall of typical intensity and duration?
One challenge is to define the notion of what constitutes a "typical" parameter; this concerns the second question (ii) we have asked above.
We wish to study the behaviour of mathematical models and to do so, we require some notion of reasonable parameter values.However, in the context of hydrology, determination of typical parameters is not always possible: real-world catchments are characterised by a wide range of shapes and topography, with hydraulic properties that are highly heterogeneous and may significantly vary across different scales (see discussion by Clauser 1992).With such high natural variability in catchment properties, it is not possible to formulate one benchmark scenario that reflects the properties of all existing catchments.In this three-part work, our overarching aim to formulate a mathematical model of a catchment that oversimplifies the complexity of real-world catchments, but is characterised by plausible/typical physical parameters.
The task of the present manuscript is to study this issue of parameter values.We wish to obtain the parameter choices in a transparent way via the statistical analysis of real-world data or via references to other works.In general, we shall estimate the median and interquartile range of each parameter considered for a wide range of UK catchments, and then study their correlations and dependencies.Whenever possible, we use publicly available spatial data.However, some parameters (e.g.soil parameters, such as hydraulic conductivity) are hard to measure (or average) at the catchment scale; in these cases, we based our estimates on existing models or individual experimental case studies.

On mathematical and computational models
We have observed that in standard practices of industrial hydrological research on coupled surface-subsurface flows at the catchment-scale, the emphasis is often on obtaining the exact prediction of flow quantities given available data at specific catchments [see e.g. the review by Furman (2008) and references therein].While this approach allows site-specific predictions, there seems to have been less work on the study of the general properties of coupled surface-subsurface flow models.This is a challenging task because of the complexity of real-world catchments, which are characterised by multiple temporal and spatial scales.
We are interested in the development of minimal mathematical catchment models that are simple enough to allow the development of universal scaling laws, but at the same time, are complex enough to represent the key physical processes characterising real-world systems.The study of such fundamental properties for simple benchmark models may help in understanding the limitations of such models when applied to the real world catchments-beyond what can be learned from standard numerical approaches.
We shall present a more thorough literature review of mathematical and computational models in Part 2, but here we review some relevant strands of investigation.First, there are classic references concerning both subsurface flow [by e.g.Bear (1972) and Anderson et al. (2015)] and surface flow [by e.g.Chow (1959) and Chow and Ben-Zvi (1973)].These texts introduce equations of e.g.Boussinesq, Richards, Darcy, Saint Venant, and so forth.However, the theories in many of these classical references are not so easily adapted to direct comparison with statistical catchment data in a given location (such as the United Kingdom).Their representation is typically one-dimensional, and in constrained in simplified domains, and without explicit specification of parameters; the analysis is more so qualitative and presents a generalised theory of physics-based modelling.The result, however, is that it is not-at-all obvious how the required scaling laws, raised above in the introduction, can be derived from these isolated theories.
Generally, modern implementations of the fundamental theory of surface and subsurface flow do not treat the governing equations in isolation-that is to say, as applied to a simplified mathematical model of a catchment site.Instead, they often take the form of extensive three-dimensional computational models and software (see e.g. the introduction by Beven (2011) and reviews by Shaw et al. (2010) and Blöschl (2006)).In the computational era, this approach has led to the development of codes such as TOP Model (Beven and Kirkby 1977), MIKE SHE (Abbott et al. 1986), HydroGeoSphere (Brunner and Simmons 2012), ParFlow (Kollet and Maxwell 2006), OpenGeoSys (Kolditz et al. 2012), and many others.One of our core questions is whether the typical output of a large-scale physics-based model can be explained by a simplified fluid mechanical model.
In addition to physics-based models, many modern references have tended towards statistical or phenomenological modelling.These approaches include predictions of flow rates based on statistical methods, such as multidimensional linear regression (Calver et al. 2009), as well as so-called conceptual rainfall-runoff models (Sitterson et al. 2018).A detailed comparative analysis between these three classes of models (statistical, conceptual, and physics-based) is an interesting topic we have highlighted for future work-in some sense, we anticipate that this challenge of inter-model comparison must first begin by agreeing on the minimal mathematical model to consider.
There are challenges to estimating the typical parameters as it is required for further mathematical modelling.In Part 2 of our work, it will be argued that under certain conditions, catchment dynamics can be modelled in terms of simplified geometries where the subsurface and surface flow travels towards the river channel in a transverse direction to the channel flow.Such reduced-dimensional flows will be governed by non-dimensional parameters that involve, for instance, a typical catchment width, say L x , measured in a specific direction.However, given the complex network of streams, rivers, and land topography in any location, it is not clear how L x should be estimated.Moreover, what is the proper definition of L x that provides consistency with the underlying assumptions of the model?These and similar questions do not seem to have yet been addressed by existing research.

On the development of a reproducible framework for parameter estimation
During the course of this work, we have discovered that it is an entirely non-trivial task to seek such typical parameters required for mathematical modelling.In many cases, the parameters used by modern software are determined through a black-box calibration of a complex computational model; the details of these procedures are not often published, or their reproduction may be impossible without access to the original codes (see in addition Hutton et al. 2016).Consequently, it is important to develop a reproducible framework so that scientific researchers without access to specialised datasets can reproduce our methodology.To this end, we have focused, as much as possible, on the use of publicly available datasets.Furthermore, all numerical algorithms used in this paper are available in a readily applicable form.
For UK catchments, notable examples of datasets include the publicly available National River Flow Archive (NRFA) (Fry and Swain 2010), the 3D Soil Hydraulic Database of Europe created by the European Soil Data Centre (Tóth et al. 2017), the Bedrock Geology Model by the British Geological Society (Waters et al. 2016) and detailed spatial datasets shared by Ordnance Survey (Lilley 2011).Boundaries of gauged catchments, for which flow at the outlet is regularly monitored, are defined in the aforementioned NRFA.
In this report, we identify key physical parameters in section 2, and in section 3 we use the above datasets to extract typical values of these parameters for all gauged catchments in the United Kingdom.Mean values of parameters, as well as their correlations and spatial distribution, are investigated in section 4, followed by discussion in section 5.The goal is to build a foundation for formulation of benchmark scenarios and further dimensional analysis, continued in further parts of our work.

Fundamentals of catchment modelling
Three flows are associated with a general catchment.First, subsurface flow occurs beneath the ground; second, overland flow occurs on what we refer to as the hillslope; third, channel flow occurs within a system of rivers and streams.In this section, we shall review some of the accepted governing equations for these three flows.A more detailed formulation, nondimensionalization, and analysis of simplified catchment models will be presented in Part 2 of our work.Here, our objective is to extract those key dimensional parameters that are expected to be relevant.
Below, we shall consider a three-dimensional system with a general position vector x = (x, y, z).The equations presented correspond to general geometries, but for consistency with later mathematical modelling, it will be convenient to associate the y direction as generally oriented along the channel direction; the x direction as generally oriented along the steepest gradient of the typical hillslope; and the z direction as oriented in the vertical direction.Hence, we shall annotate the catchment dimensions with the typical channel length L y , the typical hillslope width L x , and the typical aquifer depth of L z .This geometry is shown in fig. 1.The subsurface flow that governs the pressure head, h g (x, t), is commonly modelled using a three-dimensional Richards equation (Dogan and Motz 2005;Weill et al. 2009):

Subsurface flow
(2.1) Here, K s > 0 is the saturated soil conductivity and C(h) = dθ(h) dh is the so-called specific moisture capacity.The pressure head is equal to zero (h g = 0) at the surface of the groundwater table, which separates the fully and partially saturated region of the soil.We assume that the volumetric water content, θ(h), and relative hydraulic conductivity, K r (h), are given by the Mualem-van Genuchten (MvG) model (Van Genuchten 1980): In essence, the MvG model describes the key hydraulic properties of the soil, such as hydraulic conductivity and saturation, as nonlinear functions of the pressure head, h, as well as other parameters θ r , θ s , α MvG , n, and m = 1 − 1 n .The residual water content, θ r , and saturated water content, θ s , represent the lowest and the highest water content, respectively.The parameter α MvG m −1 is a scaling factor for pressure head h [m].Finally, the coefficient, n, characterises the distribution of pore sizes.Since the MvG model parameters describe the soil/rock properties at a given location, in general, they are functions of the spatial coordinates (x, y, z).

Overland flow
If rainfall exceeds the infiltration capacity, water can accumulate on the surface and form an overland flow.Typically, following e.g.Chow and Ben-Zvi (1973), Tayfur and Kavvas (1994) and Liu et al. (2004), this flow is described by the two-dimensional Saint Venant equations, which govern the overland water height, z = h s (x, y, t).The first equation is the continuity equation for the overland flow, written as where the source term includes the precipitation rate R = R(x, y, t), the infiltration rate I = I(x, y, t), and the evapotranspiration rate E = E(x, y, t).The flow q can be expressed as q = h s u s , where u s = u s (x, y, t) is the mean flow speed.Therefore, (2.4) is a continuity equation with two unknowns, h s and u s (or q s ).The second equation required to close this system is provided by momentum conservation.In the general form of the Saint Venant equations, the following equation is used: where g is the gravitational acceleration, S 0 is the elevation gradient (bed slope), and S f is the friction slope, i.e. the rate at which energy is lost along the x-and y-directions.
One can model S f by specifying the shear stress between the overland flow and the surface.However, in the computational hydrological models, rather than solving eqn (2.5) directly, the flux, q = hu is often obtained using an empirical relationship known as Manning's equation [see Shaw et al. (2010, chap. 14.3) for more details].This equation, originally formulated to describe a turbulent channel flow, is also commonly applied to the two-dimensional turbulent overland flow over a rough terrain.In the vector form, this equation can be written as where n s is an empirically determined value known as Manning's coefficient and describes the land surface roughness.Following (2.5), the friction slope S f is expressed as Often in practice, the last two terms of (2.7) can be neglected; this forms the socalled diffusion approximation.An additional common simplification is the kinematic approximation, in which the second term in (2.7) is also neglected, giving: Note that in general, the gradient varies spatially, therefore S 0 is a function of spatial coordinates (x, y).Similarly, n s varies not only spatially, but also depends on time through seasonal variations in vegetation (Song et al. 2017).
There is an ongoing discussion whether using the above two-dimensional model (2.4)-(2.7) is an appropriate model for the overland flow, since the actual overland flow reaches the channel through a series of channels (natural or artificial ones), rather than being a uniform sheet of surface water h s (x, y) (see an overview by Leibowitz et al. 2018).Nevertheless, we use this formulation as the current standard used in computational physical catchment models.

Channel flow
Finally, consider the channel flow illustrated in fig. 1.We ignore the flow dynamics in the transverse directions of the channel and consider a channel located along (x(s), y(s)), for a parameterisation parameter, s, defined as the distance from the catchment outlet measured along the channel.Then, the channel water height is described by the Saint Venant equation applied to the height z = h c (s, t).There are three main differences between the channel flow equations and the Saint Venant equations used in the twodimensional formulation of overland flow in section 2.2: (i) the direction of flow is given by the direction of the channel; (ii) precipitation adds a negligible contribution to the channel flow -instead, the channel flow is primarily affected by both the surface flow passing through the channel perimeter and the overland flow passing over the river banks; and (iii) the roughness is introduced over the entire perimeter of the channel (e.g., in the case of a rectangular channel, its bottom and the submerged part of its walls).
Mathematically, these assumptions lead to the following governing equation (Vieira 1983;Chaudhry 2007): where A = A(s, h c ) is the channel cross-section, w = w(h c , s) = dA dhc is the channel width (constant in the case of a rectangular channel), and q = q(s, t) is the mean flow in the channel.The source term, q in , is given by considering the total surface and subsurface inflow into the river; hence q in is linked to the boundary values of the solutions of Richards (2.1) and 2D Saint Venant equations (2.4).As for the overland equations, the flux, q = area×velocity = Av, rather than being solved using the full momentum equation, is again assumed to be given by the empirical Manning's law, which in the case of a channel takes the form: , (2.10) where P = P (s, h c ) is the channel wetted perimeter, and n c is Manning's coefficient dependent on the banks and channel bed roughness.The quantity S f is the friction slope, as defined by (2.7) (or its simplified forms), with S 0 representing the elevation gradient along the river, denoted here as S y .In the case of a rectangular channel, A = wh c and P = w + 2h c .Additionally, we denote the surface water height (h c ) at the outlet as d.As before, the kinematic approximation (2.8) is often assumed.We use the above Manning's law since it is a standard approach in computational hydrological models.However, there may be some scenarios in which this assumption may not be a valid approach.In supercritical flows (such as in the case of flash floods), the flow may rapidly vary along the y direction, and other approaches should be considered instead (see e.g.Mujumdar 2001).
This completes our formulation of the system of three coupled PDEs used to describe the key water flow components at the catchment scale, namely Richards equation (2.1), the 2D Saint Venant equations for overland flow (2.4), and the 1D Saint Venant equation for channel flow (2.4).

Boundary and initial conditions
The solution of the governing partial differential equations (2.1), (2.4), (2.9) for the respective subsurface h g , surface h s , and channel h c flows must be accompanied by appropriate boundary and initial conditions.For instance, boundary conditions are required to specify the mass exchange between flow components, and these conditions may introduce additional catchment-dependent parameters such as the channel depth and the particular geometry at the river outlet.Computational models also require the specification of initial conditions, which are generally unknown.Typically, the simulations can be run for a burn-in period to allow the results to be independent of the initial conditions.Example formulations of boundary and initial conditions will be studied in Part 2 of our work, where we focus on the mathematical and numerical analysis of model catchments.

Typical values of model parameters reported in the literature
All parameters appearing in the equations are hence summarised in table 1.Firstly, let us provide a brief review of the typical values of these parameters, known from the literature: The range of saturated soil conductivity K s can vary from 10 0 to 10 −3 ms −1 for very productive aquifers (for well-sorted sand and gravel, and highly fractured rocks) to below 10 −9 ms −1 for impervious rocks (Bear 1972).(ii) According to Chow (1959), Manning's roughness coefficient for channels, n c , can vary from 0.01 sm −1/3 for artificial (e.g.cement) channels to over 0.1 sm −1/3 for channels with dense vegetation.Its value for flood plains, n s , can vary from 0.03 sm −1/3 for pastures and cultivated areas without crops to 0.1 sm −1/3 in densely forested areas.In addition, the value of n c for a specific area can also vary with time due to seasonal variation in vegetation.(iii) The evapotranspiration rate, E, in computational physics-based catchment models is usually estimated using models such as the evapotranspiration model by Kristensen and Jensen (1975) and the Two-Layer UZ/ET model by Yan and Smith (1994), both of which are for example used in the MIKE SHE integrated water balance terms (section 3.4) Table 1: List of parameters appearing in the formulation of the integrated catchment model, and where they are discussed in this work.As discussed in the text, these parameters vary spatially, and in some cases also temporally.
model.However, the formulation of these models is beyond the scope of this report.Cole et al. (1991) reports the mean monthly precipitation and evaporation values for different regions of the United Kingdom.Precipitation R highly depends on the UK region, varying from 645.5 mm/year in Central and East England to 1601.9 mm/year in Northwest and North Scotland, with the highest precipitation levels observed between October and January.According to Faulkner and Prudhomme (1998), the highest daily precipitation measured throughout the year varies from 25 mm (during a single day) in the East of England to over 80 mm in some sites in mountainous regions of Wales and Scotland.The evapotranspiration rate E is similar for all regions, but highly varies in time, from 6 − 12 mm/month in January to 63 − 78 mm/month in July.(iv) Apart from specifying the precipitation, one also needs to specify catchment geometry -its size, terrain topography, and aquifer depth.The publications on integrated catchment models mostly focus on one or a few real-world catchments.
There are also works aiming to understand the characteristic properties of catchments and channel networks.Many catchment characteristics were described by Horton (1932), including drainage density, D D , defined as the ratio of total stream length, L, and catchment area, A (the authors found typical values for investigated catchments to be A = 0.64-1.367km −1 ), average distance between streams (0.73-1.56 km), average overland flow distance (0.38-0.80 km), slope along the streams (0.014-0.038), and slope along the land (0.072-0.177).The cited values were estimated manually based on topographic maps.Together with the development of computing power and the collection of Digital Terrain Models (DTM), the methods for estimation of the above quantities were automated (Tarboton and Ames 2001).(v) Grieve et al. (2016) compared three different estimation methods for the hillslope width, L x , by estimating its distribution for a number of catchments in the USA.
The first estimate was obtained by dividing the catchment area, A, by twice the total stream length, 2L, which is equivalent to the drainage density (2D D ) −1 .Note that the factor of 2 accounts for each stream having a hillslope on either side.The second method used a DTM model to find streamlines following the direction of the steepest descent.The lengths of the streamlines are interpreted as a hillslope width.The last method applied by the authors used a slope-area plot to separate areas dominated by channels from hillslopes.The hillslope width, L x , defined using the streamline (flow routing) method is higher than estimates using the drainage density method, however both have a similar order of magnitude ranging from 30 to 130 m. (vi) A systematic study of the typical parameter ranges and their effect on model predictions was done by Doummar et al. (2012).The authors used the MIKE SHE software for a karst system (the Gallusquelle spring in the Southwest Germany).
The model parameters were calibrated to minimise the root-mean-square error of the daily observed discharge, and the relative importance of each parameter was numerically investigated using sensitivity analysis.The most significant parameters include the hydraulic conductivity, the moisture content of the unsaturated rock matrix and van Genuchten parameters.

Data sources and processing methods
As we have noted in section 1, there have been a lack of studies that have attempted to collate and analyse the collection of dimensional parameters listed in table 1 as a whole, particularly with the intention of further mathematical modelling.Our focus in this section is to describe the data processing techniques that we have used, in order to extract typical values of the physical parameters characterising UK catchments.Crucially, we have made these tools available publicly in a GitHub repository via Morawiecki (2022).
The data sources are primarily from openly-sourced data on catchments in the United Kingdom, but we expect that the methodology can be applied similarly to data from other locations.
Capturing catchment dimensions and topography is challenging, as real-world river networks have a fractal-like structure (Rodriguez-Iturbe and Rinaldo 2001).An ambiguous quantity concerns the estimation of the river length characterising the catchment length, L y , since there are multiple ways in which it can be defined, and furthermore, its value depends on the precision of the dataset used for estimation.Similarly, the estimate of the catchment/hillslope length, L x , is challenging on account of the high spatial variation and ambiguous definition.The aquifer thickness, L z , which depends on the properties of the soil, will be discussed in section 3.3.
In our work, we use three Ordnance Survey datasets, providing different data formats and levels of detail: OS Open Rivers only consists of major rivers represented as spatial lines.(ii) OS VectorMap District includes all surface water bodies -wide rivers and standing bodies of water (e.g.lakes and ponds) are represented using spatial polygons, while narrow streams, artificial channels, and ditches are represented using spatial lines.(iii) OS Water Network includes all rivers/streams forming the drainage network, but data can be downloaded only for small user-specified regions, not for the entire UK like in the case of the other two datasets.3.1.1.Estimating the catchment length, L y In order to estimate the characteristic scale of the catchment length (L y ), we propose the following four measures, graphically represented in fig. 3. Note that depending on their size, the OS Open Rivers dataset represent rivers in the form of polygons (we refer to them as major rivers) or lines (minor rivers).
(i) Length of main rivers (L main y ) -the total length of rivers as defined in the OS Open Rivers dataset; we can use this measure to estimate the total flow into the drainage network.(ii) Length of all rivers (L all y ) -the total length of major rivers and minor streams as defined in the OS VectorMap District dataset; it serves the same function as the previous measure, but at a higher spatial resolution; we also use this measure later as one way of measuring the catchment/hillslope width.(iii) Length of the longest river (L long y ) -the length of the longest river measured from the spring to the catchment's outlet, extracted from the OS Open Rivers dataset (this measure is approximately the same regardless of whether the minor rivers are included or not); it may be used to investigate the characteristic length of the channel when studying the channel flow, given by (2.9).(iv) Distance between river tributaries (L trib y ) -the average distance between river tributaries estimated from the OS Open Rivers dataset, including the first-order streams (see fig. 3d); it is the only intensive quantity in this list (i.e. it does not scale with the catchment size) and therefore can be used as the characteristic length of a river in which hillslope flow is not disturbed by the presence of tributaries.Calculating each of the above measures poses some challenges.Firstly, the river banks and other natural boundaries have a fractal structure, which means that their length can be very sensitive to the chosen spatial resolution.Here, we have used the data in the original resolution to maintain high accuracy in the case of estimates (i)-(iii).In the case of estimate (iv), we are interested in the typical lengths of hillslopes rather than river streams.In this case, the length of the former should not be affected by small-scale local meanders.Therefore, in case (iv), before measuring the river length, we simplify the geometry using the Douglas-Peucker algorithm (Douglas and Peucker 1973) with tolerances of 1000 m.This algorithm effectively smooths river meanders shorter than the chosen tolerance and follows a similar methodology as in Stuetzle et al. (2009).Secondly, when using the VectorMap District dataset for calculating the total river/stream length, data corresponding to standing bodies of water (lakes and ponds) should not be taken into account.This cannot be easily implemented since such standing bodies are represented in the same way as for wide rivers within the dataset.Nevertheless, since lakes have much shorter total boundary length than rivers for the majority of UK regions, including such data does not significantly impact the estimates.Therefore, we include all surface water bodies when estimating (ii), which we obtain by adding the sum of the lengths of all spatial lines and the sum of the perimeters of all spatial polygons divided by two (since each of the two banks is counted separately).
For each UK catchment specified in the National River Flow Archive, we calculated the total length of all major rivers (L main y ), and all major and minor rivers (L all y ), as well as the length of the longest stream (L long y ), and the mean distance between major tributaries (L trib y ).Note that from their definition for each catchment we have, L all y ⩽ L main

Estimating the catchment width, L x
Estimating the characteristic width of the catchment/hillslope, L x , is also challenging due to the high spatial variation and the ambiguous definition of this distance.We use the following two alternative measures for L x : (i) the ratio of total stream length to catchment area (L area x ); and (ii) the length of overland streamlines reaching a given river or river network (L stream x ).
In (i), the first estimate of L area x is suggested by e.g.Rodriguez-Iturbe and Rinaldo (2001).Here, we express the catchment area as A = 2L area x L all y , where the factor of two represents the hillslope on the left-and right-hand bank.This provides an estimate of L area x given the area, A, and L all y (see section 3.1.1).We use L all y since it approximates all streams in the catchment.This estimate is easy to compute, but is sensitive to river boundary roughness.For example, in the case of meandering streams, L area x computed in this fashion can be very large compared to a catchment of identical proportions, but with smooth river banks.
In (ii), the second estimate of L stream x in (ii) is obtained by estimating the average stream-flow distance between points in the catchment to the nearest stream.Here, we use a sampling method in which a large number of spatial points are distributed over a given area.For each point, we find a streamline from the given point to the stream, following the steepest descent path (which approximates the direction of the overland flow).We use the Digital Terrain Model (DTM) from the OS Terrain 50 dataset is used in order to find the steepest descent directions, while the OS VectorMap dataset is used to determine when such a streamline reaches the river or another surface water body.
The implementation details of (ii) are described in appendix A.2. Since even in catchments with a constant hillslope width L stream x , the streamline lengths are distributed uniformly between 0 and L stream x , the mean (and median) distance ⟨dist⟩ is equal to ⟨dist⟩ = L stream x /2.Therefore, we estimate the catchment width as L stream x = 2⟨dist⟩.This approach, unlike the previous one, is not sensitive to the roughness of the river boundary; however, it is much more computationally demanding.Using this approach, we found distances to the nearest stream for over 57,000,000 points uniformly distributed over the entire United Kingdom (excluding Northern Ireland).By finding the median value of this distance for each investigated catchment, we estimated the catchment width.
Summarising the above measures for the UK: using estimate (i), we estimate a median value of L area x ≈ 683 m when using the ratio of catchment area to the total stream length.Interestingly, this average is close to the median value of estimate (ii), L stream x ≈ 616 m, which uses the stream-flow definition.Further discussion appears in appendix B where we remark that L x significantly varies across different regions of the UK.Denser drainage networks can be found in areas with higher precipitation rates and significant overland flow.

Catchment topography (S x , S y )
The gradient of the terrain is important in order to estimate the size of the surface flow, as given by Manning's law (2.6).In order to estimate the typical values of the elevation gradient perpendicular to the river (S x ) and along the river (S y ), we use the Digital Terrain Model (DTM) from the OS Terrain 50 dataset.

Estimating the gradient perpendicular to the river, S x
One way to estimate S x is by plotting the valley elevation cross-sections perpendicular to the channel at random locations in the region of interest.However, river shapes and hillslope topography are highly irregular, and therefore instead of taking straight crosssections, we will investigate the topography profile along the line of the steepest descent.For this purpose, we use the streamlines previously generated to estimate catchment width (L stream x in section 3.1).We estimate the mean gradient for each catchment by dividing the total elevation difference for all streamlines by their total length; this is equivalent to calculating the average of a slope over all streamlines weighted by their length.Note that this method is not heavily affected by very high local gradients (e.g. if cliffs are present in a given catchment), which would be the case if an arithmetic mean of the gradient for each streamline was calculated.We obtained values of S x ranging from as low as 0.01 in the lowlands up to 0.3 to 0.45 in some highlands and mountainous regions.

Estimating the gradient along the river, S y
The gradient S y can be estimated similarly to S x , but the points are taken along the river streams instead of streamlines.We thus estimate S y by dividing the total elevation difference across all channels in the OS Open Rivers dataset by their total length; this method is equivalent to taking a weighted average of the slope for each individual channel, weighted by their length.The typical values of S y range from as low as 0.0005 in the lowlands up to as high as 0.1 in highlands and mountainous regions.

Soil and rock properties (L
In this section, we focus on determining the hydraulic properties of soil and rock, which are important to estimate the saturated and relative hydraulic conductivities, K s and K r , appearing in (2.1).
The geological structure and hydrological properties of soils differ significantly across the UK.In some areas, such as highly productive chalk aquifers in South-East England, the soil has a very high conductivity, and almost all the rainwater reaches the groundwater table.
On the other end of the spectrum, there are aquifers with essentially no groundwater, where the entire rainfall reaches the river either as subsurface flow through the soil or forms an overland flow.Based on the 625K digital hydrogeological map of the UK developed by the British Geological Society (BGS), 15% of the UK area is classified as a highly productive aquifer, 26% as moderately productive, 47% as low productive, and the remaining 12% as rock with essentially no groundwater.The geographical distribution of aquifers of different productivity levels is presented in fig.4a.The typical depth of the conductive layer of soil can be estimated based on the UK3D dataset, which was constructed by the British Geological Survey based on measurements from 372 deep boreholes (Waters et al. 2016).The authors of this dataset interpolated vertical profiles, which were then interpolated to form a national network, or 'fence diagram model', of bedrock geology cross-sections.Each segment of a cross-section is assigned to one of the following qualitative aquifer designations (see EA 2017): (i) Principal Aquifers: layers with high intergranular and/or fracture permeability that can support river base flow on a strategic scale.

(ii)
Secondary Aquifers A: permeable layers capable of forming an important source of base flow at a local rather than strategic scale.(iii) Secondary Aquifers B: predominantly lower permeability layers.(iv) Secondary Undifferentiated: assigned in cases where it has not been possible to attribute either category A or B to a rock type.(v) Unproductive Strata: layers with low permeability and negligible significance for river base flow.
We estimate the typical depth of each layer by calculating the total area of each type of rock layer and dividing it by the total length of the cross-sections, which are available in the dataset (approximately 20, 000 km).More details about the data format and our processing methods are presented in appendix A.4.Our analysis yields the following mean thicknesses, rounded to the nearest metre, for the different types of aquifers presented in the classification (i)-(v) above: Thus, the mean principal aquifer thickness is L z = 405 m.However, in some catchments in the UK, this layer may be much smaller-even practically reaching zero thickness, when the groundwater flow is insignificant (notice the classification of Aquifer with no groundwater sites in fig.4b).Therefore, such catchments need to be studied separately.
In these low-productive aquifers, the subsurface flow takes place dominantly through the thin layer of soil confined from the bottom by impenetrable bedrock.Detailed quantitative data on soil thickness is not available; however, according to the maps shared by the UK Soil Observatory (Lawley 2009), over a half of the UK consists of deep soils with thickness exceeding 1 metre, and nearly half of the soils are less than 1 metre thick (see fig. 5).Therefore, in the limiting case of catchments with no groundwater, we propose L z = 1 m as a reasonable choice for the typical soil thickness in regions with shallow impenetrable bedrock.

Estimating the Maulem-van Genutchen (MvG) parameters
The necessary quantitative data on rock hydraulic properties (hydraulic conductivity and MvG parameters) would ideally be estimated directly through detailed experiments.However, they can also be calculated indirectly based on known soil composition.The European Soil Data Centre (ESDAC) shares the 3D Soil Hydraulic Database Europe, which uses European pedotransfer functions to estimate all MvG parameters (Tóth et al. 2017).The data is available at 1 km and 250 m resolutions for seven different depths (0, 5, 15, 30, 60, 100, and 200 cm).However, these estimates are not very precise since each property in the original database takes only one of a few discrete values, corresponding to specific soil types (see fig. 6).In order to provide a single estimate of these parameters, we use data corresponding to 30 cm and extract their mean value for individual catchments.
soil depth shallow Soil and subsoil can be dug to depths of only 0.5m, sometimes less.

intermediate-shallow
Soil and subsoil can be dug to depths of greater than 0.5m but is less than 1m.
intermediate Soil and subsoil can be easily dug to a depth of 1 metre, sometimes more in places.

deep-intermediate
Soil and subsoil can be easily dug to a depth of 1m, sometimes more in places.
deep Soil and subsoil can be easily dug to a depth of more than 1m.
data not available

Estimating hydraulic conductivity K s
The values of the conductivity from the 3D Soil Hydraulic Database, which range from around 10 −8 to 10 −6 m/s (or approx.0.001 − 0.1 m/day).However, as we shall discuss below, the direct measurements conducted in several highly-productive chalk aquifers in England give higher estimates, likely because of the flow through macropores and fractures, which dominates over the intergranular flow in many areas in the UK (see fig. 4b).Many studies, therefore, focus not only on analysing the hydraulic conductivity of the rock matrix, but also the bulk hydraulic conductivity, which includes the effect of both micropores forming the matrix and naturally occurring macropores/fractures.In table 2, we cite several prior studies that have shown that the ratio of field-to-lab hydraulic conductivity values can be even greater than 1:1000.For instance, measurements conducted by Robins and Buckley (1988) on several boreholes in the Permian and Triassic aquifers in south-west Scotland showed that hydraulic conductivity measured in the field varies from 0.1 m/day (Solway basin) to 20 m/day (Dumfries basin).In the study of chalk aquifers in the South Down, Jones and Robins (1999) measured hydraulic conductivity values ranging from 0.15 to 0.67 m/s.Gardner et al. (1990) observed that the hydraulic conductivity of English chalk aquifers increases from 1 − 6 mm/day for low hydraulic potential, up to 100 − 1000 mm/day, as the potential is increased above 5kPa.This large    increase in conductivity is caused by the fractures becoming saturated after increasing the potential.Therefore, for highly conductive aquifers, we take typical values of conductivity ranging from K s = 10 −6 m/s (0.09 m/day) to K s = 10 −4 m/s (9 m/day), but these values can be significantly lower in low-productive aquifers.In the limiting scenario of catchments with no groundwater flow, the typical conductivity is given by the conductivity of the top layer of soil.
According to Kirkham (2014), the hydraulic conductivity of natural soils can vary from 0.05 m/day for a clay to 30 m/day for a silty clay loam.The variation is higher for disturbed soil materials, ranging from 0.02 m/day for silt and clay to 600 m/day for gravel.Therefore, for the soil, we may assume the same range.Even though the variation is higher than in the aforementioned studies for rock bulk conductivity, the typical values remain similar -between 10 −6 and 10 −4 ms −1 .

Water balance terms (R, Q, E)
The source term of the Saint Venant equation for the overland flow (2.4) can be found by estimating the mean value of terms included in the water balance.It is given by: where R is precipitation over a given catchment, Q is river flow (runoff) at the outlet, E is total evapotranspiration, and ∆S is the change in the water volume stored within the catchment (e.g. in groundwater and reservoirs).Here, we not only estimate the values of R, Q, and E for the UK, but also their annual peak values, which can provide a good parametrisation to model extreme precipitation events (and are used in many flood estimation models, e.g. by Kjeldsen et al. 2008).We define these quantities in terms of the mean flow per unit area of the catchment (measured in ms −1 ).
The precipitation and river flow data for all UK catchments are available in the National River Flow Archive (NRFA).We use the Standardised Annual Average Rainfall (1961-90) to estimate P , and the mean of Gauged Daily Flow to estimate Q (divided by the catchment's area, also provided by NRFA).By averaging all terms over a long period of time (several years), we should expect ∆S to tend to 0, which allows us to estimate the mean actual evapotranspiration as The relationship between Q and R is presented in fig.7a.The mean runoff is smaller than the mean rainfall, as expected (especially for catchments with low mean rainfall).However, there are some outliers, especially among dry catchments.They correspond to situations where the mean gauged river flow exceeds the mean rainfall rate in a given catchment.Several possible reasons for this observation include the inflow of water from neighbouring catchments, a long-term trend of decreasing groundwater storage (∆S), or the mismatch between the time for which the precipitation and runoff data were available (in some catchments, the runoff data is available only for a limited period).
Based on this graph, we can deduce that the evapotranspiration is approximately independent of the other parameters R and Q. Rainfall R values vary from 10 −8 m/s to 10 −7 m/s, runoff Q from 10 −9 to 10 −7 , while mean evapotranspiration E is usually around 1 − 1.5 • 10 −8 m/s.
Finally, we estimate annual peak values of precipitation, as a measure of how intensive rainfall should be considered in rainfall-runoff models.Following Faulkner and Prudhomme (1998), the standard quantity we can use is the median of annual maximum precipitation time series, RMED.As shown in fig.7b, it is closely correlated with mean precipitation R, and is approximately 8 − 10 times higher.

Manning's coefficients (n s , n c )
Manning's roughness coefficient, which appears in the overland flow equation of (2.6), is not measured directly at the catchment scale, but is either obtained by physical model calibration or in controlled small-scale experiments.Therefore, we take typical values from engineering tables.The Manning's coefficient of the surface, n s , depends on the type of the surface Chow (1959); Brunner (2016).The NRFA includes information about the area of each catchment covered by arable lands and grasslands (n s ≈ 0.035 sm −1/3 ), mountains (n s ≈ 0 0.2 0.4 0.6 0.8 1 show the relationship between the median of the annual maximum rainfall, RMED, and the mean rainfall, R. The thick solid lines in both plots correspond to the line of best fit; in (b) the fitted line confirms that RMED scales approximately linearly with R.
0.025 sm −1/3 ), urban areas (n s ≈ 0.015 sm −1/3 ), and woodlands (n s ≈ 0.16 sm −1/3 ).Following these estimates, for each investigated catchment, we calculated a mean value of n s weighted by the area covered by each of the above terrain types.
Similarly, Manning's coefficient for the channel, n c , depends on the type of the channel.Since there is no UK-wide database on channels types, we took the typical range of n c values from Chow (1959).They vary from n c = 0.01 sm −1/3 (for smooth cement channels) to n c = 0.1 sm −1/3 (for natural channels with very weedy reaches), with typical values around n c = 0.035 sm −1/3 .

Channel dimensions (w, d)
Interestingly, estimating the typical channel dimensions (width w and depth d), which appear in Manning's law for the channel flow (2.10), turns out to be quite challenging.Firstly, there is no database that provides a complete set of data on channel dimensions for UK rivers.Secondly, channel dimensions vary greatly depending on the river discharge, which can itself vary greatly depending on the catchment (though mainly on the catchment area and average rainfall).Instead, we perform the analysis via an indirect route.
Our central reference is the work by Nixon (1959), who measured the width and depth of channels of several major rivers at 29 gauging stations in England and Wales.Values ranged from a width of w = 35 ft (11 m) and a depth of d = 3.1 ft (0.94 m) for the River Blackwater at the Shallowfield station to a width of w = 257 ft (78 m) for the Thames at the Kingston Day's Weir station, and a depth of d = 16.37 ft (4.99 m) for the Wye River at the Cadora station.The data, together with similar measurements done for American rivers, were used to fit power laws describing the dependence between the channel dimensions and the full-bank discharge Q.From Nixon (1959), the derived power laws are: where Q is the full-bank discharge, while C 1 and C 2 are fitted constants.These scaling laws can also be justified by analysis of mechanical equilibrium of the sediment bed, see e.g.Henderson (1963) and Devauchelle et al. (2011).
Because data on channel dimensions is not available for most UK catchments, we use the above power laws to estimate the channel dimensions at the outlet of each studied UK catchment.We note that this is a reasonable approach: Nixon (1959) derived such laws based on empirical relations for a representative sample of UK catchments.Equally, Wharton (1992) used similar scaling laws to estimate the peak annual flows in 70 UK catchments based on the channel dimensions.

A summary of results on characteristic values
Table 3 summarises the typical values of catchment model parameters for all gauged catchments in the UK, with boundaries defined as in the National River Flow Archive.A sample of raw data and their spatial distributions is included in appendix B. In cases where no spatial data was available, an estimate from the literature was obtained (these entries are marked with †).Where available estimates obtained with more than one method are compared.The median value of the parameters presented in table 3 will be used in Part 2 to formulate a simple benchmark scenario characterised by similar physical properties to the UK catchments.
Based on table 3, we can make the following notes: (i) The total stream length in the catchment is approximately twice as high when including all surface water streams and bodies from the OS VectorMap (L all y ) than when looking only at the main rivers, which are included in OS Open Rivers dataset (L main y ), cf.fig. 2. The longest stream (L long y ) typically contributes to from 20% to 50% of the total length of the rivers in OS Open River, with a higher percentage corresponding to shorter rivers (with fewer and shorter tributaries).The estimated lengths greatly vary as a result of the wide range of gauged catchments areas used in the study.The exception is the distance between tributaries (L trib t ) ranging from 725 to 1212 m.As an intensive quantity, this quantity does not scale with the catchment area.(ii) Hillslope width estimates in the majority of catchments are consistent between the two proposed methods.A difference can be observed in catchments with the lowest density of streams, for which streamlines turn out to be much shorter than the width between streams.The reason is that the data includes numerous streamlines   that start far from the river, but terminate at a local elevation minimum, and hence do not reconnect with the river.In our study, such streamlines cover over 18% of the UK's area.We do not account for such streamlines when estimating the hillslope width.(iii) Table 3 provides an indication that the slope along the hillslope S x is typically much larger than the slope along the channel S y , which is a general feature of the topography of river valleys.By the two-dimensional Manning's equation (2.6), the flow follows the direction of the steepest descent.Therefore, we would then expect that the rainfall water is transferred towards the main river through streams mainly directed along the hillslopes (with S x slope).Only after it reaches the channel, it is transferred down the river with S y slope.The consequences of this observation will be explored in detail in Part 2. (iv) The highest disagreement can be observed between the soil conductivity as reported in the ESDAC dataset and the literature.The latter measured the bulk soil conductivity of the soil (and bedrock), which is a better macroscopic estimate than the one provided in the ESDAC dataset.(v) The last two columns of table 3 include example values from the literature.The first column includes parameter ranges suggested by Doummar et al. (2012) to model a real-world Karst catchment in Southwest Germany.The catchment width is higher than typical values for the UK.Notable differences between the suggested parameter ranges and estimations for UK catchments are in soil conductivity, where higher values come from measurements of observed flow velocities using artificial tracers in highly conductive areas of the investigated system.These velocities can be very high in well-fractured Karst systems, as is also observed in some measurements in the UK [cf. the chalk aquifer of Yorkshire studied by Ward et al. (1997)].(vi) The last column includes parameter values suggested in two simple benchmark scenarios by Maxwell et al. (2014) for integrated catchment model intercomparison.
The first scenario includes a single hillslope tilted in the x-direction, ending in a flat river of constant depth.Only subsurface flow in a thin 5-metre-deep soil layer is modelled.The second scenario includes a two-dimensional tilted V-shaped catchment with a different elevation gradient along and perpendicular to the river.However, this latter one is only used to model surface flow-subsurface flow is not involved.
One should note a few differences between parameter values used in these scenarios.
The rainfall values are higher, since they correspond to typical intensive rainfall, not to an annual mean estimated for UK catchments.The next difference is a much lower aquifer thickness, since in benchmark scenarios, it represents only the soil layer (L * z = 5 m), ignoring the subsurface water percolation to the usually much deeper permeable bedrock layer (on average L z = 684 m based on the UK3D dataset).Another important difference is the low soil conductivity in the scenarios: the saturated hydraulic conductivity, K s , between 1.16 • 10 −7 to 1.16 • 10 −5 m/s is lower than the reported typical values ranging from 10 −4 to 10 −6 m/s (estimated based on large-scale studies in the UK, see section 3.3.3).Note that the combination of the two previous factors (thin aquifer with low conductivity) typically leads to model results, in simple benchmarks, where the majority of land is covered with surface water even for small rainfalls (cf.modelling and numerical simulations in Part 2).

Further investigation of distribution and correlation
In the previous section, we presented the results of an extensive analysis of the typical scaling of those key parameters expected to play a role in the dynamics of hydrology at the level of a catchment site.Naturally, an important question relates to the issue of which parameters are expected to be the most important in the description of physical observations or in numerical simulations.Such questions will form the basis of our investigation in Parts 2 and 3, where we develop scaling laws involving non-dimensional parameters.In this section, we focus on a statistical interpretation of such issues, and study the distribution and correlation of parameters, as well as the subsequent catchment classification.Our analysis is divided into the application of three statistical methods: analysis of cross-correlation of parameters (section 4.1), grouping of catchments using kmeans clustering (section 4.2), and Principal Component Analysis for catchment properties (section 4.3).

Regional variability and correlation of parameters
Note that the estimates presented in table 3 refer to mean values across the UK.However, regional averages can significantly differ, as is represented by the interquartile range.As an example, consider the distribution of certain parameter mean values for UK catchments, as listed in the National River Flow Archives and presented in fig.16.One can observe that highland areas, characterised by a higher elevation gradient (both along rivers and hillslopes), are also accompanied by higher rainfalls and lower soil conductivity (and usually low aquifer thickness).These observations are confirmed by the correlation matrix presented in the fig.8.The higher precipitation in mountainous regions was already observed by Duckstein et al. (1973).Its impact on the aquifer properties (such as soil capacity and infiltration rate) was used by Bell and Moore (1998) to formulate the Grid Model and its successor, the Grid-to-Grid model (Bell et al. 2007).The high correlation of catchment area and river length comes from the fractal structure of drainage networks (Rodriguez-Iturbe and Rinaldo 2001), which develop to uniformly cover available space.Finally, there is a strong correlation between the α MvG , n and θ S parameters from the Mualem-Van Genuchten model parameters, since all were estimated by Tóth et al. (2017) based on the same soil composition dataset.

Classification of catchments using cluster analysis
In order to better understand the similarities between different types of catchments, characterised by different combinations of physical parameters, we perform a cluster analysis.The goal is to construct a specified number of typical parameter combinations that reflect the spatial variation of UK catchments.Here, we use k-means clustering on the parameters summarised in table 3 to find a set of three clusters [see e.g.Kaufman and Rousseeuw (2009)].Parameters not directly associated or extracted with specific individual catchments in mind are removed from consideration (w, d, n c , L * z ).Also ignored are quantities that scale with the catchment areas, since these parameters are dependent on the location of the catchment outlet, rather than the physical properties characterising a given region (L all y , L main y and L long y ).Finally, the residual water content, θ r , was removed from consideration since it is almost always zero.This leaves us with the remaining 15 variables.Only catchments for which all these parameters are available are used for clustering.
The distribution of parameter values for each cluster is presented in fig.9.The figure highlights some key features of each cluster: From fig. 9(f) and (g), we note that the first cluster is characterised by the highest 1.0 -0.1 -0.1 0.0 -0.0 0.7 0.7 0.0 -0.0 1.0 -0.1 0.9 -0.0 0.0 0.1 -0.0 -0.0 -0.1 -0.0 1.0 0.9 -0.1 -0.1 0.2 0.3 -0.3 0.9 -0.1 0.9 0.0 -0.1 -0.6 -0.Finally, in Fig. 9(b) and (c), we note that the first cluster, representing the highlands, has significantly higher mean precipitation, R, and runoff, Q, values, which is related to the fact that precipitation is higher in eastern parts of the UK, coinciding with many of the highland and mountainous areas.

Catchment characterisation using PCA
In order to better understand the separation between the class properties, one can use principal component analysis (PCA), which allows us to find the linear combination of the original parameters with the highest variation in the dataset.Table 5 summarises the linear coefficients (eigenvectors) representing the first four principal components.
Based on the parameters with the highest eigenvalues, our intuitive interpretation of the components is as follows.One can interpret the first principal component as strongly related to the topography of the terrain.The highland terrain with higher slopes and higher rainfall/runoff values is characterised by high values of the first principal components, while the lowlands correspond to low values.The second principal component is high for catchments with high aquifer thickness, high soil saturation, low α MvG and/or high Manning's n roughness coefficient.Typically, such a configuration of parameters represents catchments, in which groundwater flow has a relatively high importance.On the contrary, catchments with a low value of the second component will be characterised by a higher contribution of the overland flow component.Fig. 10 presents the clusters in a two-dimensional space spanned by the first two principal components.Here the separation between classes is apparent-as observed before, they vary in terrain topography.Additionally, the second cluster represents the most productive aquifers with the highest importance of groundwater.In the other two classes, the lower importance of groundwater is balanced by higher overland flows.In the case of cluster 1, the groundwater flow is limited by low hydraulic conductivity of the soil  11a, the measure seems to be highly spatially autocorrelated.This is due to the UK's distinct lowland, midland and highland regions, characterised by low, medium, and high values of the first principal component, respectively.The spatial autocorrelation of the second principal component can also be observed in fig.11b; however, this component varies much more locally-we expect it to be dependent on the local geological structure of the aquifer and the local soil composition.

Discussion
In order to produce minimal mathematical models of surface-subsurface flows in catchments, a crucial step is to establish the typical size and distribution of the key parameters, which has been the main goal of this work.As we have noted, a comprehensive treatment of this topic does not seem to have been previously undertaken in the style we have presented here.
In this paper, we compiled a set of data processing methods that allow the extraction of typical values of physical parameters used in PDE-based catchment models.The methods were applied to extract properties of UK catchments based on publicly available datasets from the National River Flow Archive, European Soil Data Centre, British Geological Society and Ordnance Survey.The parameter values for each catchment and the source  The presented results can be used for verification of parameter values calibrated by numerical catchment models and for preparing realistic benchmark scenarios.Ideal catchment model(s) should thus provide quantitative or qualitative agreement with catchments belonging to any of the identified clusters.We expect that this is also an important criterion when creating and validating simplified (conceptual and statistical) catchment models, which often have a tendency to become overfitted to the training data (Beven 2018(Beven , 2019)).Identifying scenarios where a given model may fail can lead to further model improvement.
We shall continue our work in Parts 2 and 3, where the parameter estimates are crucial in allowing for the development and analysis of physical models of surface and subsurface flows.the area from which the streamlines reach a given outlet (typically following the direction of steepest descent).The resultant catchment has an intersection with three tiles shown in (c).When analysing each of these subtiles, all streams within a given subtile are overlapped with the catchment boundary, as illustrated in (c).Adding the lengths of these streams L 1 , L 2 and L 3 , allows us to calculate the total stream length for the given catchment.
for each catchment, we aggregate the values assigned to all raster tiles located inside it and ending in rivers (terrain types 2 or 3).The streamlines ending in the tidal boundary, local elevation minima, or leaving through the subtile boundary, are not taken into account.
To find the catchment width, we calculate the mean streamflow path and multiply it by 2, as discussed in section 3.1.2.To find the elevation gradient along the hillslope, we divide the total elevation difference along all paths by the total path length, as discussed in section 3.2.1.

A.3. Extracting gradient along a river (S y )
To estimate the typical value of the gradient along the rivers, we use the OS Terrain 50 and OS Open Rivers datasets.For each subtile represented in the OS Terrain dataset, we extract rivers that overlap with the given subtile.Then, for each node of the river, we find its elevation.The mean gradient S y is estimated as the total elevation difference along the river segments divided by their total length, as discussed in section 3.2.2.As in the case of the S x estimation method, this measure is not highly sensitive to local terrain drops along the river.
A.4. Extracting aquifer thickness (L z ) Extracting aquifer thickness at the catchment scale is a challenging task, since the Bedrock Geology Model by the British Geological Society represents it in the form of threedimensional polygons located sparsely across the UK, and the data is not appropriately aligned in the xy-plane.A sample of this dataset is presented in fig.14.The idea of extracting the mean aquifer thickness is to calculate the total area of polygons within a given catchment extent, and then divide it by the estimated length of these cross-sections projected onto the xy-plane.To find the total aquifer area, we divide each polygon into triangular parts.Then, each triangle is further divided into two triangular parts with a common edge directed along the z-axis, as illustrated in fig.15.We refer to the point on the xy-plane that corresponds to the common edge as the 'midpoint', while the other two points are referred to as 'endpoints'.For each catchment, we find all triangular polygons, with at least one of these three points located inside the given catchment.
We need to find the area of each polygon within the catchment boundary.If both endpoints are inside the catchment, we take the total polygon area A. If any endpoint is outside the catchment, we calculate the area of the part of this polygon that is located inside the catchment.This area, denoted as A * , is expressed either as A if the endpoint is inside the catchment.

Figure 1 :
Figure1: On the left, parameters describing catchment shape: hillslope width L x , aquifer depth L z , elevation gradient along the hillslope S x and along the river S y .On the right, three types of flow are presented; subsurface flow includes both flow through the unsaturated zone and groundwater flow through the saturated zone.

Fig. 2 Figure 2 :
Fig. 2 demonstrates a comparison of the typical information provided by the three types of datasets.
averaged over all river segments

Figure 3 :
Figure 3: Illustration of the different length scales characterising the drainage network, presented for the river Frome catchment, with the outlet located at Bishop's Frome.Each L y estimate is equal to the total length of all highlighted streams.Note this total length gradually decreases from (a) to (d), therefore L all y ⩽ L main y y , (see fig. 3).Their median values taken over all considered catchments are: med L main y = 71.6 km, med L all y = 130 km, med L long y = 22.8 km, and med L trib y = 945 m.

Figure 4 :
Figure 4: The hydrodynamic properties of aquifers in the UK as provided by the 625K digital hydrogeological map by the British Geological Survey (BGS).Map (a) presents the productivity of the aquifer, while map (b) presents the dominating mechanisms for the groundwater flow.

Figure 5 :
Figure 5: Thickness of soil according to the BGS Soil Parent Material Model, which provides a wide range of physical and chemical characteristics of the top layer of soil over the UK.

Figure 6 :
Figure 6: Mualem-Van Genuchten model parameters (from left K S , θ R , θ S , α MvG , and n) at a depth of 15 cm (top row) and 1 m (bottom row) according to the 3D Soil Hydraulic Database Europe.As the legend indicates, each parameter in this dataset can only have one of a few discrete values.

Figure 7 :
Figure 7: Dependencies between water balance terms obtained from the National River Flow Archive.The top plot (a) shows the relationship between the mean rainfall, R, and the mean runoff, Q.The difference, R − Q, which is a result of evapotranspiration, is approximately constant, E ≈ 1.4 • 10 −8 ms −1 for the UK catchments.The bottom plot (b)show the relationship between the median of the annual maximum rainfall, RMED, and the mean rainfall, R. The thick solid lines in both plots correspond to the line of best fit; in (b) the fitted line confirms that RMED scales approximately linearly with R.
(a) Hydraulic conductivity of both bedrock and soil estimated based on various studies, (b) Hydraulic conductivity of the soil as given in Hydraulic Database of Europe by ESDAC † Numbers represent a range of values obtained from the literature; These quantities are not estimated individually on the catchment level.

Figure 9 :
Figure 9: Number of catchments belonging to each cluster (top left) and box plots showing the distribution of parameters among the catchments belonging to each graph.

Figure 10 :
Figure 10: The illustration of the first two principal components.The graph on the left shows the values of principal components for catchments belonging to each of the three clusters.The map on the right shows the geographic distribution of these clusters.

Figure 11 :
Figure 11: Spatial distribution of values of the first (a) and second (b) principal components of the catchment parameters.Note that visually the value of the first component seems to coincide with the main lowland, midland, and highland regions in the UK.

Figure 12 :
Figure12: Illustration showing the division of the UK into tiles.The left-most map shows 9 National Grid tiles, each divided into 25 subtiles.The dark gray area in (b) represents the area from which the streamlines reach a given outlet (typically following the direction of steepest descent).The resultant catchment has an intersection with three tiles shown in (c).When analysing each of these subtiles, all streams within a given subtile are overlapped with the catchment boundary, as illustrated in (c).Adding the lengths of these streams L 1 , L 2 and L 3 , allows us to calculate the total stream length for the given catchment.

Figure 13 :
Figure 13: Streamline analysis steps.Fig.(a) presents the input datasets: digital terrain model (DTM) in raster form with surface water bodies (black lines and polygons).These datasets are used to construct the terrain-type raster presented in fig.(b).Then, from each land tile a streamline is generated until reaching another terrain type, which is presented in fig.(c).For each starting point, the length of each streamline is presented in fig.(d).

Figure 15 :
Figure 15: On the left: Method of dividing triangular polygons in order to calculate their area within the catchment.On the right: Method of estimating the cross-section length L inside the catchment.

Table 2 :
(Robins and Ball 2006)ry (matrix) and field (fractures and matrix) hydraulic conductivity [m/day] for three different types of aquifers discussed by(Robins and Ball 2006).Field and lab values can greatly differ.

Table 3 :
Summary of catchment model parameters values for UK catchments.

Table 4 :
A sample of dataset consisting of estimates of physical parameters for all UK catchments included in NRFA.The first column represents the names of the parameters used in our dataset.The top row represents the ID numbers assigned to the given catchments in the NRFA database.The detailed description of all catchments can be found in the NRFA (2022) dataset.wi,kx i where x i is the i th physical parameter, and w i,k is the weight of the i th parameter and k th principal component.The weights for the first five principal components are presented in table 5. Additionally, the spatial distribution of the values of the selected catchment parameters from table 4 is presented in fig.16.
i = 1 n