The Spanish railway network, 1848–2023

Abstract GIS data on the evolution of railway networks facilitate the study of the role played by the expansion of transport infrastructure since the industrial revolution. The arrival of the railway transformed economic and social activity and the distribution of population within the territory. Given their importance, we have reconstructed and digitised the layout of the railway lines and the location of the stations and halts that existed from the opening of Spain's first railway line, in 1848, until 2023. We have also added indicators of the quality of the network, more specifically, the dates of its electrification and when the track was doubled to allow two-way traffic. The potential of this database lies in its capacity to analyse the interrelationship between the railway infrastructure and a wide range of elements located in the territory, amongst which it is necessary to highlight other modes of transport, urban expansion and socio-economic development.


Introduction
Historical data on transport infrastructure are raising increasing interest due to their tremendous power to stimulate economic and social change (Banerjee et al., 2020).This type of data is especially relevant when used to track long-term changes at a high level of geographic detail.When this is the case, such data allow us to understand the impact of this type of investment from a long-term perspective and at different territorial scales: local, national and even continental.
Interest in the reconstruction of data on historical road networks is currently booming and includes -among many others-initiatives such as the digitisation of the Roman road network (De Soto and Carreras Monfort, 2009;Åhlfeldt, 2013), the French (Perret et al., 2015) and Spanish (Pablo-Martí et al., 2021) road networks of the 18 th century, and the streets and main road networks of several cities (Casali and Heinimann, 2019;Wang et al., 2019).However, the mode of transport that has attracted the greatest interest is the railway.There is, in fact, a long tradition in the reconstruction and digitisation of this type of infrastructure, which has manifested itself in both the increase in the number of territories studied and in improvements in the approaches used in the field of the Geographical Information Systems (GIS).
With regard to the territories studied, the first attempts were made by Knowles and Healey (2005), who digitised the Pennsylvania railway network in GIS format.This made it possible to link this mean of transport to the development of the steel and mining industry in this US state.The potential of this type of analysis was soon apparent, and other reconstruction work, digitisation and geo-historical analyses were carried out for the US MidWest (Atack et al., 2008), Britain (Marti-Henneberg et al., 2018) and in Europe (Martí-Henneberg, 2013).It is also relevant to highlight the digitisation of the railway networks of colonial India (Donaldson 2018) or Canada (Cartography Office, 2020), and the inclusion of the stations and improvements in the detail of the digitisation of the 19 th century European network in the cases of France (Thévenin et al., 2013), Italy (Groote, 2020), Galicia and Austrian Silesia (Kaim et al., 2020) and Switzerland (Büchel and Kyburz, 2020).
Secondly, we must refer to the methodologies used, which have depended to a great degree on the tools available.In a first stage, the authors manually digitised historical maps, using the newly created GIS (Martí-Henneberg, 2013).Later, the arrival of PostGIS and the popularisation of Open Street Maps (OSM) facilitated the creation and use of collaborative maps.Their production was often coordinated by various researchers and government entities, although on many occasions they were produced by a large number of different people (Perret et al., 2015).Finally, and only very recently, there is a tendency towards the digitisation and use of historical maps through the application of artificial intelligence and various automated processes, although this method is still in its early stages (Turner et al, 2023).
The usage of this data for different fields has implied a methodological revolution, as it has facilitated the uncovering of the multiple effects of this infrastructure.The phenomena investigated have included its influence on urbanisation processes (Hornung, 2015;Berger and Enflo, 2017), on industrialisation in the 19 th century (Berger, 2019;Bogart et al., 2021), and its ability to influence the creation of knowledge (Andersson et al., 2023) or rural depopulation (Esteban-Oliver, 2023).
In addition, in the fields of urban studies and geography, this type of data has also been used to analyse the impact of tracks and railway stations on the morphology of cities and on urban planning.This is a particularly relevant field, as the introduction of the railway coincided with a period of major urban expansion.The consolidation of the liberal state, in the middle of the 19 th century, and its drive for the construction of infrastructure, including railway stations as symbols of progress, coincided with the destruction of city walls that allowed the expansion of the urban space.The location of these stations and of the tracks and roads that interconnected them therefore became conditioning factors for urban development.The confluence of these two themes: the development of railways and cities, is a field that offers great analytical potential for the use of GIS, but which, for the moment, has only really been explored either in very general terms (Ganges and Martí-Henneberg, 2023), or with reference to very specific locations, as in the cases of Portugal (Pinho and Oliveira, 2009) and Romania (Purcar, 2010).Bearing this in mind, in order to consolidate the study of this type of subject, a new methodology has recently been developed which quantifies how urban expansion was affected by the construction of a new railway station (Alvarez-Palau et al., 2019).This line of work is still, however, in its initial stages.
This article aims to contribute to the study of the influence of railways at any territorial scale and for any period.To this end, we present the first railway database that includes the digitisation of the railway lines and the location of the stations and halts that existed in Spain from the opening of the first railway, in 1848, until 2023.The database shows the annual evolution (dates of opening and closure) of the different sections of track and of railway stations and halts at the highest possible level of spatial detail.It also includes attributes that identify the dates of electrification and when tracks were doubled (to allow two-way traffic) for the different sections of the network.In addition, we have distinguished between Iberian gauge (1,668 mm), narrow gauge (1,435 mm, 1,062 mm, 1,000 mm, 915 mm and 750 mm), and also high-speed lines (from 1992 onwards), which have a standard gauge of 1,435 mm.This is because, in Spain, as well as standard gauge, which is mainly used in the high-speed network, two other track gauges (Iberian and narrow) are still in use.Iberian gauge lines form the core of the network, whereas the narrowgauge lines are mainly used for metropolitan connections; although in the 19 th century, narrow-gauge lines were also used to channel primary materials -such as coal and ironfrom extraction points to either the broad-gauge network or to ports, from where they were subsequently redistributed.1

Materials and methods
The reconstruction and digitisation of the Spanish railway network has been carried out in GIS using several primary and secondary sources.The following subsections describe collecting data, and the steps for lines and stations inventory, digitisation, and inclusion of opening, closing, electrification and doubling dates.

Inventory and geolocation of the stations in the network
First of all, we carried out an inventory and geolocated all the stations that came into service after the opening of Spain's first railway, in 1848, and through until 2023.It should be noted that this database only includes railway stations and halts that met the following conditions: (1) they were open to the public; and (2) they served commercial purposes, transporting both goods and passengers.As a result, all "Goods and freight stations" were excluded.Some examples of left out stations would include those whose names contained terms such as: "Factory of…", "Industry of…", "Society of…" etc.One example of a point of access to the network that would not form part of our database is the "Portland Cement Factories" station, as its use was exclusively related to that company's activity.We took this decision because the amount of available information regarding the opening and closing dates of these types of stations and halts in the early 19 th century was very limited and, in many cases, contradictory.
Of all the different sources used to reconstruct the Spanish network of railway stations and halts, the most relevant input was provided by ADIF (Administrator of Railway Infrastructure in Spain), in 2007.This source included: the name of each station, its location (X/Y coordinates), and the province in which it was located.However, while that database contained more than 1,765 network access points, it only included those that were in service in 2007.
To complement this information, other fundamental sources have been historical maps of Spanish territory.These were produced throughout the 19 th and 20 th centuries by the Instituto Geográfico Nacional (IGN-National Geographic Institute) and cover the whole of Spain.The most relevant to our case was the Mapa Topográfico Nacional (MTN50-National Topographic Map), which was originally drawn at a scale of 1:25,000 and published at that of 1:50,000.However, the MTN50 presented some problems.First of all, the topographical task began in 1857 and the first sheets were published in 1875.Given that the Spanish railway network was still under construction at that time, there are territories for which the publication of the map preceded the creation of the infrastructure.In addition, the MTN50 did not include all railways halts within the network, which made it difficult to geolocate them.We therefore had to turn to other sources to complete the process.
The most significant sources were found in the archives of the Fundación de los Ferrocarriles Españoles (Spanish Railways Foundation).Among the documents consulted were the "Mapas de Ferrocarriles en Explotación y Construcción en España y Portugal" (Maps of Railways in Service and Under Construction in Spain and Portugal) which were prepared by the Instituto Geográfico y Catastral (Geographic and Land Registry Institute) in 1948 and 1956 under the supervision of Forcano Catalán, and the Cuadro de Estaciones y Distancias de los Ferrocarriles Españoles y Portugueses (Table of Railway Stations and Distances of the Spanish and Portuguese Railways), which was published in 1928, by the Compañía de los Ferrocarriles Andaluces.In addition, we also used case studies based on different lines.Most of these were obtained from the El Ferrocarril en España (Railways in Spain) database, which is hosted on the spanishrailways.comwebsite, administered by Juan Perís Torner.
Another very relevant source was the "Estaciones y Líneas del Ferrocarril de España" (Railway Stations and Lines in Spain) database elaborated by Antonio Sierra in 2010, which shows the name and geolocation of the vast majority of the stations in the Spanish network.This source was particularly useful for contrasting and completing our database.Finally, the process of creating an inventory and geolocating the stations that were operative in the provincial capitals and in the metropolitan rings around large cities, was carried out on a case-by-case basis within the framework of the Ferrocarril y Ciudad (Railway and City) project (Martí-Henneberg, 2017). 2he result of this reconstruction work was a total of 2,245, Iberian-gauge, 1,523 narrowgauge, and 38 high-speed stations and halts.
2.2 Assigning the dates of opening and closure, and the sections of line, and lines on which the stations within the network were located In this section, we establish the dates of the opening and closure of the stations, and the sections of line and lines in which the stations were located.
First, we manually assigned each station to the line to which it belonged, using the GIS database of Morillas-Torné (2012), case studies of several lines (Morilla-Critz, 1984;Muñoz Rubio et al., 1999;Mohedas García and Miguel Cámara, 2009), and the spanishrailway.comwebsite (Perís Torner, 2007).We then used Cronología Básica del Ferrocarril de Vía Ancha (Basic Chronology of the Broad-Gauge Railways) of García Raya (2006) and Ferrocarril de Vía Estrecha (Narrow Gauge Railways) by Olaizola Elordi (2005) to divide these lines into sections and linked this information to the stations.Note that stations serving more than one line were assigned to the one that opened first.
The year in which sections of line opened was assigned as the opening date of the stations that were located on them.We made this assumption because our database excluded goods and freight stations.This implies that the vast majority of the points of access to the network that we included are those that configured the initial layout of the lines and, therefore, came into service at the same time that when their corresponding sections of line were being opened.However, a different methodological criterion was applied in the case of stations located in provincial capitals and within the metropolitan rings of large cities.For them, information on their dates of opening and closure was collected on a case-by-case basis within the framework of the Ferrocarril y Ciudad project (Martí-Henneberg, 2017).Another option would have been to carry out the assignment of these dates individually for all the stations and halts in the network.However, this presented two important problems: (1) the lack of data, as many stations lacked individualised, reliable, and accessible information regarding their dates of opening; and (2) the fact that the dates we were able to compile using this approach were not always directly comparable.This was due to the different ways in which it is possible to define when a station entered service.While some sources consider the opening of the station to be the moment when the station building became operational, others take as their reference its official date of inauguration, or when rail services actually began at the location in question.
In the case of dates of closure, we manually incorporated this data section by section, using several different sources.The most relevant sources were the Relación de Líneas Clausuradas de la Península Ibérica by Marinas (2021) and the Cronología de cierres de vía Estrecha by Fernández López (MIMEO).We also complemented and contrasted this information using other sources, such as the spanishrailways.comwebsite and the Cronología General del Ferrocarril prepared by the Federación Castellano Manchega de Amigos del Ferrocarril in 2023.

Digitisation, categorisation and assigning the dates of opening and closure of sections of track and lines within the network
In this section, we digitized and assigned the opening and closing dates of the various lines of the Spanish network.
The layout of the Iberian gauge and narrow-gauge lines was taken from Sierra (2010), while that of the high-speed network was obtained from OSM. Sierra's database provides the digitisation of all the lines in the network (including those that have never been inaugurated) at a high level of detail. 3However, this database contains all means of transport that use rails, and does not divide the lines into sections, which is essential for subsequently assigning the dates of opening, closure, electrification, and doubling.We therefore started by excluding those forms of transport that did not really form part of the railway network: underground trains, funicular railways, and cable cars, although we maintained rack and pinion and tram lines that had originally operated as railways.We then discarded lines that were never inaugurated.Finally, we adapted the digitisation to allow it to be modelled in GIS.To do this, we eliminated marshalling yards, parallel tracks and service lines, and carried out several topological validations4 .These corrections were made to ensure that the points and lines within the railway network (which in this case refer to railway stations and lines) were interconnected.If this were not the case, it would not be possible to use the different GIS instruments which, for example, made possible to calculate the routes between stations using the lines in the network.
Once this process had been carried out, we obtained the digitisation of all the lines in the network.However, the line attributes, such as the dates of opening and closure of the different sections of track, had yet to be assigned.To do this, we used a GIS tool: Spatial Joint ( point to line), which made it possible to copy the attributes of the stations to the lines with which they were in contact 5 .In other words, this tool exported the section of line, line, opening and closure attributes from one layer (stations) to another (lines) based on their spatial relationships.
The result was a total of 2,548, Iberian-gauge, 1,681 narrow-gauge and 50 high-speed sections of track, which were then divided into 114, 91 and 13 lines respectively.

Data on electrification and the doubling of lines
In this section we describe how we obtained and digitised data referring to the electrification and doubling of railway lines in Spain.
The information about the dates of electrification of the different sections of track for the broad-gauge network was obtained from the chronology of electrified sections of broad-gauge tracks created by Cuéllar (MIMEO).We also contrasted and complemented this information with case studies presented at spanishrailway.com.The dates of electrification of the narrow-gauge sections were obtained from Olaizola Elordi (2005) and from the Chronology Electrificación de las líneas de vía estrecha en España (Electrification of narrow-gauge lines in Spain) which was drawn up by the Federación Castellano Manchega de Amigos del Ferrocarril (2023).
The data on the doubling of tracks were obtained from several different sources.We used case studies from spanishrailway.com,from the monographic work La doble vía en España y el sentido de la circulación por ella of García Álvarez (2007), and especially information obtained through "web scraping".Press articles and other documents, such as the Revista de Obras Públicas elaborated by El Colegio de Ingenieros de Caminos, Canales y Puertos de España, that indicated the dates of the inauguration of track sections in Spain were particularly useful.Finally, we manually assigned both the dates of electrification and doubling to the different sections of line and stations that formed part of the database.We present the summary of this work in Figure 1, where it is possible to observe the evolution ments.Any multipart line is a topological error.(Multipart line features are those that contain two or more paths).Must not overlap/Must not self-overlap.A railway line cannot overlap with itself or with any other line on the same layer.When a line overlaps with itself or another line (doubling), this is an error.Must not intersect/Must not self-intersect.A railway line cannot intersect with itself or with any other line on the same layer.Any point of intersection is a mistake.Must not have d-angles.A railway line must contact another line at both of its ends.This rule has the exception of beginnings and ends of a line and is used to detect disconnections in its trajectory.Finally, we connected the tracks and the stations using: Point must be covered by line.All the stations have to be on a railway line.Any station offset from a railway line is a topological error.Note that in most cases, the displacement was only a few metres. of kilometres of track and the percentage of the electrified and doubled Iberian and narrow-gauge network.

Data records
This database consists of six files arranged in two Shapefile formats, with three files for the lines (Iberian, narrow and high-speed) and another three for railway stations and halts, and can be accessed using the following link https://doi.org/10.34810/data917.Tables 1 and 2 respectively present and describe the variables included in the station and railway line databases.

Technical validation and robustness checks
There were three main requirements for this database.The first was that the geolocation of the lines and stations should be as geographically accurate as possible.We therefore used the MTN50 Map and OSM to manually revise the digitisation of most of the lines, especially those that were in urban areas.With regard to this, we would like to emphasise that we used historical city maps (Martí-Henneberg, 2017) of the provincial capitals and most relevant urban areas in Spain in order to manually contrast and correct the layout and opening dates of these lines and stations and halts, whenever this was required.In Figure 2, we present the case of Barcelona.In the map of 1855 we can see the tracks on the outskirts of the city, as urban expansion came after the construction of the main railway lines.For this reason, when Ildefonso Cerdá's plan for the Ensanche (Extension) was executed the route taken by the existing railway lines had to be adjusted (as shown in the 1930 map).
The second main requirement was that the opening and closure and network quality attributes had to be historically accurate and consistent for all the lines and stations throughout the whole period and territory.To achieve this, we ensured that the database complied with a series of consistency validations.
-All the lines and stations had to have a value greater than 1848, but less than 2023 in their opening field.-No line or station could have a year of closure value that pre-dated its year of opening value.No line or station could reopen unless it had previously been closed.
-No line or station could have a value (year) in its reopening field that pre-dated that in its closure field.-No station could be open on a line that was closed.This validation process followed the logic that the year in which a station opened could not pre-date the year in which the line was inaugurated, nor could the year of station closure be later than the closure of the line.
To confirm the historical accuracy of the data, we then carried out a series of robustness checks.First of all, we manually contrasted our data and corrected any possible errors relating to dates of opening and closure, electrification, and track doubling, using different official documents.More specifically, we checked and corrected, when necessary, information in our database relating to sections of track that were doubled and electrified by reference to the Mapa de Ferrocarriles en Explotación y Construcción en

Primary and secondary sources
España y Portugal (Forcano Catalán, 1948) for the year 1948, the El Mapa de Ferrocarriles de España y Portugal for the year 1958 (Imedio, 1958), El Mapa de los Ferrocarriles Españoles del año 1986 (RHEA Consultores, 1986) for 1986, and the Mapa de Infraestructura Ferroviaria de España (Díaz Pardo, 2023) for 2023.As an example, in Figures 3 and 4, we present the Original Map produced by the Instituto de Transporte directed by Imedio and the GIS Map for 1958, which was produced using our database.It is possible to observe that after making these checks and corrections, there are no discrepancies between the two maps in terms of either the layout of the network or the attributes of network quality (electrification and doubling).Secondly, we checked whether our annual data about km of network was consistent with those highlighted by other authors.To do this, in Figure 5 we extracted our GIS database annual length of the railway network, and then compared it with that reported by Álvarez (1978, pp. 485 and 486) for Iberian-gauge in the 1848-1935 period, and by Morillas-Torné (2014, p. 20) for Iberian-gauge in the 1941-2013 period, and for narrowgauge in the 1855-2013 period.Results showed very minor differences for the majority of the period.However, it can be observed that these differences are somewhat higher in the 1900-1960 period.This is because Morillas-Torné (2014) does not include some narrow-gauge lines that we do collect. 6The fact that most of these lines were inaugurated in the early 20 th century and closed in the 1960s explains the majority of the differences between both series.
Our last requirement was that this database could be included in, and/or be used to create transportation network models.These are constructed with GIS, mathematical algorithms, and computational methods; and used to simulate and analyse the movement of people, goods and vehicles within a defined geographic area.One very useful feature of these models is that they allow us to estimate how changes in highly relevant parameters, such as speed and transport costs, are able to affect the optimal routes between two points within a given network.To achieve this, and as already explained in Section 2, we carried out a number of validations.In this way, this database has already been used to model and analyse its functioning during the 19 th century (Esteban-Oliver et al., MIMEO), so its reliability for use in this field has already been demonstrated.

Potential applications of the database
The potential applications for this database are manifold.First of all, it should be stressed that this is the first GIS database of a railway system to include the dates of line openings, closures, and electrification, and also register when certain sections of track were doubled to allow two-way traffic, and/or when different stations opened and/or closed.The inclusion of these attributes can contribute to a better understanding of the extent to which the impact of the railway was determined by the quality of the network.This is an issue which has hardly been explored in the literature, but which we believe-as in the case of  Revista de Historia Economica / Journal of Iberian and Latin American Economic History 163  roads (Bogart, 2009;García-López et al., 2023)-could have had a very significant influence on the territorial impact of this infrastructure.Secondly, and given that all the data included here is geolocated at a high level of detail, this information will also help researchers to understand the historical evolution of Spain's railway infrastructure over almost two centuries, at whatever level of territorial detail they may require.In this sense, one important application of this database is that it can be used to create very accurate GIS maps for any year and level of territorial disaggregation required.As an example, in Figure 6, we present a map that includes the Spanish railway network and its main characteristics in four key cut-off years.Firstly, we can observe the state of the network in 1877, when the trunk lines had been completed and before the construction of the majority of transversal and narrow-gauge lines.On the right, the map for 1910 presents the network once almost all of the Iberian-gauge lines had been constructed.Thirdly, as already mentioned in Figure 4, we present the network in 1958, which was its moment of maximum expansion, after a number of relevant improvements.Finally, the map on the lower right shows the situation of the Spanish railway system at the end of 2023, following the closure of many Iberian-gauge lines in the final decades of the 20 th century, but after the opening of several kilometres of high-speed rail track.Although analysing this information is not the main objective of this article, a cursory examination allows us to observe the uneven territorial coverage of the Spanish network and its quality throughout the period.It is evident that certain provinces, such as Teruel, Almería or Cáceres, received significantly lower levels of railway investment than the national average, while others, such as Madrid, Barcelona and Valladolid, were particularly favoured.In summary, this database can help us to understand the causes and consequences of these regional disparities from a very long-term perspective and at the level of territorial disaggregation that we require.
Thirdly, these data can also be used as explanatory or control variables in most spatial econometric models.It is possible to create dichotomous variables, such as distances to and from railway lines or stations, to reveal the multiple effects that this type of infrastructure has had on Spain's economy and society.In other contexts these kind of data has, for example, made it possible to analyse the effect of the railway on such relevant phenomena as regional disparities, migration, the rural-urban transition, and the tourism sector, and even its impact on people´s access to essential services, such as education and health.In fact, this database has already been used by Esteban-Oliver (2023) to analyse the influence of the railway on Spain's population dynamics during the 1860-1930 period.The conclusions drawn from the study suggest that the railways probably stimulated factor mobility and economies of agglomeration, but also reinforced existing hierarchies, thus exacerbating an unequal distribution of population within space.
Fourthly, the data were compiled so as to make it possible to integrate them with other transport infrastructure databases (roads, ports, etc.) in order to generate multimodal transport models.In other words, to create a comprehensive and detailed representation of a transportation system that permits the analysis, modelling, and optimisation of the movement of goods and/or people, using various modes of transportation.Along these lines, we think that it is relevant to underline that the historical analysis of the effects of improvements in transport using a multimodal approach is a field that is currently experiencing a significant boom (Fernihough and Lyons, 2022).The reason for this is that it provides a more accurate estimate of the socio-economic impact of transport than analyses of the individual effects of each mode of transport.
Fifthly, we believe that this database can also help us to understand the processes involved in urban transformations.This is a subject with a lot of potential, as it encompasses urban planning and urban transformation from the mid-19 th century through to the present day (Ganges and Martí-Henneberg, 2023).More specifically, our database can be used in conjunction with cadastral (land registry) data, such as the evolution of the spatial distribution of housing and population and economic activities; land and housing prices; and/or local income, to better understand how the layout of tracks and the location of stations affects urban and port morphology.
Lastly, we would also like to point out that this database can be combined with other attributes of the railway network to analyse a number of other phenomena.For example, in Esteban-Oliver and Martí-Henneberg (2022) this database is combined with novel information concerning the ownership of lines in order to show that during the period between 1848 and 1941, the expansion of the railway network was greatly influenced by criteria related to economic gains and to the changing business objectives of the railway companies.

Figure 2 .
Figure 2. Railway lines and stations in Barcelona, 1855 and 1930.Notes: The map shows the Iberian and narrow (dotted) railway lines and stations in Barcelona.Sources: Own research and Martí-Henneberg (2017).

Figure 5 .
Figure 5. Cumulative number of km of Iberian and narrow-gauge lines in Spain, by year, (1848-2013).Notes: The left axis shows total km of track.The data relating to the annual evolution in terms of kilometres of line refer to the total extension of the network without considering the additional mileage implicit in double-track.Source: Own research,Alvarez, 1978, pp.485 and 486 and Morillas-Torné, 2014, Figure 1.

Table 1 .
Description of variables relating to railway stations and halts

Table 2 .
Variables used to describe sections of track