Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-08T09:30:44.006Z Has data issue: false hasContentIssue false

From counting stations to city-wide estimates: data-driven bicycle volume extrapolation

Published online by Cambridge University Press:  18 February 2025

Silke K. Kaiser*
Affiliation:
Data Science Lab, Hertie School, Berlin, Germany Center for Sustainability, Hertie School, Berlin, Germany
Nadja Klein
Affiliation:
Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, Germany
Lynn H. Kaack
Affiliation:
Data Science Lab, Hertie School, Berlin, Germany Center for Sustainability, Hertie School, Berlin, Germany
*
Corresponding author: Silke K. Kaiser; Email: s.kaiser@phd.hertie-school.org

Abstract

Shifting to cycling in urban areas reduces greenhouse gas emissions and improves public health. Access to street-level data on bicycle traffic would assist cities in planning targeted infrastructure improvements to encourage cycling and provide civil society with evidence to advocate for cyclists’ needs. Yet, the data currently available to cities and citizens often only comes from sparsely located counting stations. This paper extrapolates bicycle volume beyond these few locations to estimate street-level bicycle counts for the entire city of Berlin. We predict daily and average annual daily street-level bicycle volumes using machine-learning techniques and various data sources. These include app-based crowdsourced data, infrastructure, bike-sharing, motorized traffic, socioeconomic indicators, weather, holiday data, and centrality measures. Our analysis reveals that crowdsourced cycling flow data from Strava in the area around the point of interest are most important for the prediction. To provide guidance for future data collection, we analyze how including short-term counts at predicted locations enhances model performance. By incorporating just 10 days of sample counts for each predicted location, we are able to almost halve the error and greatly reduce the variability in performance among predicted locations.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Overview of data types used in this paper to predict bicycle volume and their use in other publications: including crowdsourced (Strava)[Crowds.], infrastructure [Infr.], weather [Weath.], socioeconomic [Socio.], bike-sharing [B.-S.], public and school holidays [Hol.], centrality measures [Centr.], and motorized traffic [Moto.]

Figure 1

Table 2. Overview of the features per data source used in this study

Figure 2

Table 3. Errors for the various machine learning models at the daily, and average annual daily bicycle volume (AADB) scale. The gray background implicates the columns employed as the criterion for model selection for the subsequent analysis

Figure 3

Figure 1. Performance of XGBoost model at the daily level and for average annual daily bicycle volume estimations (AADB) across the individual counting stations. Subfigure b) and d) were trained on 10 days’ worth of sample data and on the additional long-term counting stations (full-city model specification). Highlighted in all graphs are the counting stations whose error exceeds or is below a deviation of 1 standard deviation from the mean. The color coding and the ordering of the counting stations across all subplots are the same to ensure comparability. The counting station ‘SEN’ is left out in subplot b) and d), due to the small number of observations available.

Figure 4

Figure 2. Feature importance and proof of concept based on an XGBoost model trained on data of all available long-term counting stations.

Figure 5

Figure 3. The effect of collecting additional sample data at a new location to predict the daily volume of bicycles using XGBoost. In the left diagram, the models are trained on the full-city available data, both long-term data from other sites and sample data from the location in question; in the right diagram, the models are trained on location-specific sample data only. Best-performing specifications are depicted in gray in the other plot to allow for comparison. The error is the average over the 19 counting stations used, with 95% confidence intervals calculated from 10 repeated samples.

Figure 6

Table 4. Grouping of the features for the feature importance evaluation in Section 2.3: This table presents the features selected for the XGBoost model for 0-24 h (all day), grouped into their respective categories

Figure 7

Figure 4. Workflow for estimating errors using sample counts in a LOGO evaluation. Each counting station is held out once, with the figure providing an example for one such counter. A sampling strategy – 1-day, 3-day, or 7-day – is selected, according to which up to 28 days from the held-out counting station are sampled. These days are the location specific training data (depicted in dark orange). The test data consists of unsampled observations from the same station (depicted in white). In the case of the location-specific model, we train the model on these data alone. For the city-wide model, the training data includes both the observations of the held-out station and data from other stations (depicted in light orange). For space considerations, the figure illustrates this process for the 3-day strategy with seven sampled days only.

Figure 8

Table 5. Computation time and energy consumption for various tasks using XGBoost. The table reflects the performance of a 128-CPU server, illustrating the efficiency of the model in terms of both time and energy. Training and testing times are based on the task of predicting the short-term counting stations’ data (Table 3, column (3)). The energy consumption was computed with CodeCarbon (Courty et al., 2024)

Figure 9

Table 6. Overview of all considered features

Figure 10

Figure 5. Location of the 12 short-term and 20 long-term counting stations within Berlin.

Figure 11

Figure 6. Descriptive statistics of the daily counter measurements (number of bicycles per day).

Figure 12

Table 7. The long-term and short-term counting stations

Figure 13

Figure 7. Frequency histograms of selected infrastructure features, illustrating the distribution of these features for both counting station locations and all street segments across Berlin. A street segment describes a street section between two intersections/the end of a street and an intersection. It should be noted that since paths in forests and parks are included in the analysis, there is a higher-than-expected proportion of segments with a speed limit of 0 km/h.

Figure 14

Figure 8. The bike-sharing data was feature engineered based on a radius: For a given day all passing bike-sharing trips passing, starting or ending within a certain radius around the counting station in question were counted. This was also done for the entirety of the city. In this visualization, two bike-sharing trips are depicted. Given that both trips started and ended in a given day, the graph would produce a count of two passing, newly rented, and returned bike trips in the whole city, as well as two passing bike trips in the radius and one ending and zero originating.

Figure 15

Figure 9. The Strava data, both the hexagon and the street segment data, was feature-engineered. We computed the average across features for both data types, considering observations within a certain proximity. For the street segment data, we considered all segments within a certain radius. For the hexagon, we took the average of the features across the six neighboring hexagons. Additionally, we included the features for the hexagon, where the relevant counting station is located.

Figure 16

Table 8. Descriptives of the Strava and bike-sharing data. All specifications are in percent of the total trips recorded. The numbers indicate that bike-sharing trips are more evenly conducted throughout the whole day. Also, bike-sharing riders are much slower on average than Strava users, which seems reasonable, given the different quality of bike-sharing trips versus private bikes

Figure 17

Figure 10. Histograms displaying the frequency distribution of selected Strava features, comparing counting station locations with all street segments across Berlin, exemplified for September 2022. A street segment describes a section of a street that lies between two intersections/the end of a street and an intersection. Some features include high outliers. To enhance readability, we capped values at the 99th percentile.

Figure 18

Figure 11. Location of counting stations measuring motorized traffic.

Figure 19

Figure 12. QQ Plot of the target feature (measurements of the counting stations), before and after the log-transformation.

Figure 20

Table 9. Models and their tuned hyperparameters

Figure 21

Table 10. Models, the applied feature selection method, and the number of selected features for both the all day and the 7-19 h specification

Author comment: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R0/PR1

Comments

Silke Kaiser

Hertie School Berlin

Friedrichstr. 180

10117 Berlin

s.kaiser@phd.hertie-school.org

+49 176 67 01 85 14

Berlin, March 13, 2024

Dear Editors of Environmental Data Science,

Please find enclosed our manuscript entitled “From Counting Stations to City-Wide Estimates: Data-Driven Bicycle Volume Extrapolation‘’ (joint with Nadja Klein and Lynn Kaack) for possible publication in Environmental Data Science as an Application Paper

Transportation is a critical sector in the global fight against climate change. The Intergovernmental Panel on Climate Change (IPCC) estimates that it accounts for approximately 23% of global energy-related CO2 emissions. The shift from motorized transport to bicycles is a central option to reduce emissions for cities worldwide, and new cycling infrastructure can initiate this change.

Given the constrained financial resources of city administrations and challenges such as the lack of space in urban areas, an evidence-based policy process may improve infrastructure planning. With our work, we seek to aid such sustainable decision-making.

In our study, we pursue a data-driven approach, combining machine learning methods and a wide array of different data sources to extrapolate bicycle counts to new locations across a city. We show that new data sources can help provide more granular information than is currently available from counting stations. With our approach, we generate bicycle volume estimates for every street in the city of Berlin and show how such estimates can be improved by additional data collection. City officials and civil society can directly implement and use our approach to make evidence-based decisions regarding where to prioritize new cycling infrastructure. This demonstrates the usefulness of our research in practice.

We confirm that this manuscript has not been published elsewhere and that it is not under consideration by another journal. We have no conflicts of interest to disclose.

Given its relevance in advancing data-driven strategies for urban sustainability and environmental impact, along with its potential to bolster sustainable policymaking, we believe our work will be of significant interest to the readership of Environmental Data Science.

Kind regards

Silke Kaiser (corresponding author, on behalf of all authors)

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for your work on bicycle infrastructure. It is a research field less developed than car infrastructure, indeed, and relevant for multiple SDGs. Here are some comments:

Please rewrite the first paragraph of the introduction, as it reads quite confusing. Don’t write text along references, but along a storyline.

Good literature review.

37% is different to 42%, which is significant in a mode share context, so it should not say “align” without caveat.

Great data collection and well described in section 4. Table 1 resembles the research gap.

Good description of methods and their application. Why do you print random forest bold in table 3 a) 4), when XGBoost has a lower score?

Convincing validation and useful plots. The discussion covers benefits and weaknesses of the methods in sufficient detail.

Figure 5 omits the measure. Is it cyclists per day?

All in all, I support publication of this article, as it is well structured, well written, and developed under effort and care. It does not include a discussion of steps after identification of cyclists volumes, which is interesting in many aspects, but goes beyond the scope of this article and the journal.

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

The manuscript presents a data driven methodology using machine learning for estimating bicycle volume (traffic flows) for an entire city, based on few bicycle counting stations available and a very large number of other data features describing the city from a variety of data sources. The aim of the study is very clear and of great relevance for municipalities and transport planning and policy towards active travel and cycling in particular, fitting the Application Paper type of the submission. I personally find the study very interesting, with a potential scientific contribution to the field of sustainable urban mobility studies.

The manuscript itself is well structured and clearly written, fitting the requirements of the journal.

Despite its positive aspects, there are many aspects that need revision to lift the positives and improve the less accomplished aspects.

Overall, the study has three main methodological parts that can provide contributions (as presented in the discussion): the data sources selection, the ML algorithm selection, and the application to city-wide estimation complemented by additional sampling. Of the three I think the last one is the most accomplished and relevant for the type of paper, from which one can draw interesting recommendations for municipalities in terms of deploying additional counting stations or bicycle counting campaigns. This is to me the main contribution and novelty of the paper.

The data source and ML algorithm selection raise many questions, in my opinion they are still in a preliminary or less mature state to be able to make any recommendations. Given the applied type of the paper this is not necessarily a major issue, but the authors need to give more careful consideration to the results and possibly take additional steps in terms of data analysis if they wish to present more solid conclusions. Here I elaborate on the questions raised by each part.

When it comes to the data source engineering and selection.

- The data features are grouped by source in the study, but they represent different categories of information: actual flows, travel demand, origin/destinations, type of demand, weather, land use, cycling route infrastructure, etc. When presenting the data this is fine, but when it comes to analyse the relevance of features in the model, grouping them by source does not make sense. Each feature category contributes to different dimensions of flow estimation, and it’s important to understand the importance of categories or features, not data sources.

- Crowdsourced data (table 1) should not be a category of information/features relevant for bicycle flows modelling. What do these features actually represent in relation to cycling flows? When describing the data in section 4.1, strava and bike sharing data do not have any literature supporting their relevance (especially considering the high number of features). How do those features contribute to the model? When describing the other data sources, the authors refer to existing literature that explains for example that weather and infrastructure influence cycling flows. This is very clear and should be applied to strava and bike sharing as well.

- Obviously strava, that covers not only many features but also many categories seems like a more relevant data (section 2.3). But what to do if a municipality does not have access to strava data? What are the many features under strava representing, which categories? And to what extent are they all relevant to the model? Recommending Strava as the main data source is problematic and not necessarily accurate.

- Given the big bias in strava data, acknowledged by the authors, which features in the strava data source are most useful and which ones less problematic?

- The number of features is very large, and considering that the authors use radii of varying distances their number is much larger than suggested in Table 2. To me this collection of features is an initial exploratory data analysis stage, and not a set of features that can be recommended for application in a model.

- Not only do the authors need to justify the inclusion of features (not only because they are available in a given source) but also the choice of radii. Different features use different sets of radii without clear explanation (section 4.1)

- For example in the strava data with 91 different features the same radii seem to be applied to every feature, irrespective if they represent flows or demand or something else.

- A step of feature selection and reduction seems to be missing in the methodology. The correlation test carried out does not seem to be appropriate. For one, a 99% correlation is too strict and anything above 90% or even 80% could be considered duplicate features. It is also hard to believe that given the many radii there isn’t a very strong correlation between features at for example 500m and 1000m aggregation, requiring a large selection/reduction.

- Furthermore, some features contribute directly to route choice thus have a local influence (for example infrastructure and traffic flows), while other contribute only to demand/mode choice (for example weather and socio-economic information) thus have a area or city wide influence. A reflection relating the radii to feature categories, or even not aggregating the data and assigning it to street segments, is lacking. The fact that large radii are used in the model leads to the lack of accuracy in the model results displayed in Figure 2b

- Different features have a different spatial and temporal resolution and influence, and this must be acknowledged when engineering the data and combining it in a model.

When it comes to the ML algorithm selection.

- The authors quickly decide on using XGBoost to model bicycle flows based on some comparative tests. For the purpose of demonstrating the application this is fine, but to make a recommendation of XGBoost as the best ML method one would need a much more comprehensive definition and tests of model performance, including robustness/reliability of the result, applicability to practice (less data input), computing resource requirements, time cost, etc.

- When looking at the results in table 3, both random forest and XGBoost perform similar, with either performing better in different tests and the scores in any case being very close. The choice of XGBoost is not obvious.

- To choose a ML algorithm one should also consider other aspects as mentioned above. If this is outside the scope of this paper, then the authors need to moderate the recommendation for XGBoost since its advantages have not been properly demonstrated.

- I wonder to what extent mean based tests (MAE and SMAPE) are adequate given that the phenomenon being modelled has a non-normal distribution across street segments, i.e. traffic flows follow a 80-20 rule where most traffic concentrates on a few roads, while the vast majority has little to no traffic. Maybe the counting stations are located in streets with similar profile and a mean works, but as soon as we include residential streets the mean fails.

- Have all the features and radii been included in all the models and tests of section 2.2?

Additional minor issues with the manuscript deserving the authors' attention:

- The description of the tests in section 2.5 page 8-47 to 9-16 is hard to follow, and maybe a flowchart or diagram explaining the different options and combinations of tests carried out could help.

- Given the poor results (figure 2b) how do the authors define reasonable error in the discussion page 10-29?

- In the discussion the application to municipal planning is highlighted. Given the huge number of data sources and features, complex data engineering and modelling, how can this be applied by non-expert?

- For the discussion, what is the expected time it takes to produce one estimation following the entire method? Municipal planning often involves testing planning alternatives and estimating impacts of different choices, where a simple and fast to implement model is desirable.

- I would like to see a more elaborate reflection on the sampling bias of the counting station location based on location type ,that is usually high volume roads, and how additional sampling locations can be selected to counteract this effect and improve estimations in a broader range of locations.

- Because the purpose is to model bicycle flows, I wonder if the authors have considered including network centrality measures as infrastructure features, such as betweenness centrality, that is known to be good indicators of pedestrian, cyclist and vehicular flows.

- I would not describe the counting station with high volume outliers, as these are the most important streets and they might be only outliers in the small sample of 20 counting stations used. Robust methods must be considered to handle non-normal distributions.

Recommendation: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R0/PR4

Comments

The paper presents an ML solution to city-wide bicycle volume estimation. It is an extremely important but challenging task. The authors do a good job at motivating the task, explaining the data sources, and so on. However, several concerns were raised regarding the choice of ML algorithms and the data collection. More thorough analysis should be conducted to ensure a strong publication. The reviews provide very detailed suggestions on both major and minor points for improvement. The paper will need significant improvement, but I believe the revised version addressing the concerns will be a very strong paper.

Decision: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R0/PR5

Comments

No accompanying comment.

Author comment: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R1/PR6

Comments

We included the Cover Letter in the field “Response to Decision Letter”.

We have noted a typo in the author names (on this submission platform, not in the manuscript). The name of the corresponding author is Silke Kirstin Kaiser, not Silke Kirstin Kirstin Kaiser.

Also, the affiliation of Nadja Klein has changed. She is no longer at Technische Universität Dortmund, but now at Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, 76131, Germany. This has been changed in the manuscript. However, we were not able to change this on this submission platform.

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

The revision of the original draft is thorough and the improvements go beyond comments of my previous review. I encourage the publication without further comments, though the rebuttal of Reviewer #2’s comments seems more relevant to the remaining process.

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

The authors have carefully considered the questions and comments from the reviewers and introduced important changes to the manuscript. I appreciate the effort put into redoing analyses to consider new features and the skewed data distribution, and the increased detail in describing the data, features and methodology.

While the manuscript is now much clearer overall, a couple of questions remain that would require attention.

The effect of large radii in the results is not really answered in the revision, by saying that more street level features are required. My previous comment was on the negative effect of including large radius or whole city features, that have a very similar or constant value across all segments. Thus creating smooth interpolation of values and large errors in the model, not being sensitive to smaller local geographic characteristics. Essentially, you might not need more features, but a more strict selection of the radii included in the model.

In relation to this I suggest:

- A reflection of the impact that large radii have in the results, in the discussion.

- Some features are naturally area features (density, weather, number of trips originating or ending, holiday) that support a large radius approach, while others are segment features (traffic flows, centrality) that have a much more local network effect, that might not provide a meaningful context at large radius. It would make sense to be more selective from the start? A reflection on being parsimonious.

- A clearer explanation of the feature selection results, to understand how the collinearity of the radii of the same feature is handled. It’s not clear how many of the 257 features are included in each model, and which are excluded. A.15 gives us what method was used for each model, and the main text gives the results of the model. This doesn’t have to be detailed, but some indication of the extent to which the collinearity has been solved, or if if certain radii emerge as the “right” radii fro certain features, or if all are included what is their significance.

To me Figure 4 is most welcome, but not very clear in the way the 3 strategies (1, 3 and 7 day) are presented. From what I understand, each station has a certain number of daily observations, and the test is to see if sampling 1, 3 or 7 consecutive observation days increases the prediction, with an the increasing number of sampled days. Because days appears twice as a variable in the text and the figure, this can be difficult for the reader to understand. At least it took me some time to get it. Maybe showing in the figure more than 5 days will allow for it to be clearer.

Recommendation: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R1/PR9

Comments

Thank you for submitting a revised draft. It has successfully addressed most major concerns from the previous round of reviews. Reviewer #2 provided some valuable feedback which would make the work much stronger. The paper could benefit from one more round of minor revision before the paper is published.

Decision: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R1/PR10

Comments

No accompanying comment.

Author comment: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R2/PR11

Comments

Our response letter is included in “Your Response” to the Decision Letter. We would like to use this space to note that one of the authors, Nadja Klein, has changed institutions to

Scientific Computing Center, Karlsruhe Institute of Technology, Karlsruhe, 76131, Germany.

Also, we noticed that the middle name of Lynn Kaack ist not included. She would like to be listed as Lynn H. Kaack on the publication.

We have changed this in our submission accordingly and also noted these points in our previous (major revisions) submission. However, the list of authors is locked, so we were not able to change it within this submission portal.

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R2/PR12

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for these final revisions. The article is now clear and I do not have any further comments.

Review: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R2/PR13

Conflict of interest statement

Reviewer declares none.

Comments

As mentioned in the last review, I have no further comments.

Recommendation: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R2/PR14

Comments

The revised version has addressed all concerns raised in the reviews. It is ready for publication.

Decision: From counting stations to city-wide estimates: data-driven bicycle volume extrapolation — R2/PR15

Comments

No accompanying comment.