Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-06T09:55:41.102Z Has data issue: false hasContentIssue false

The hidden potential of call detail records in The Gambia

Published online by Cambridge University Press:  25 June 2021

Ayumi Arai
Affiliation:
Center for Spatial Information Science The University of Tokyo, Tokyo, Japan
Erwin Knippenberg*
Affiliation:
The World Bank, Washington, District of Columbia, USA
Moritz Meyer
Affiliation:
The World Bank, Washington, District of Columbia, USA
Apichon Witayangkurn
Affiliation:
Center for Spatial Information Science The University of Tokyo, Tokyo, Japan
*
*Corresponding author. E-mail: eknippenberg@worldbank.org

Abstract

Aggregated data from mobile network operators (MNOs) can provide snapshots of population mobility patterns in real time, generating valuable insights when other more traditional data sources are unavailable or out-of-date. The COVID-19 pandemic has highlighted the value of remotely-collected, high-frequency, localized data in inferring the economic impact of shocks to inform decision-making. However, proper protocols must be put in place to ensure end-to-end user-confidentiality and compliance with international best practice. We demonstrate how to build such a data pipeline, channeling data from MNOs through the national regulator to the analytical users, who in turn produce policy-relevant insights. The aggregated indicators analyzed offer a detailed snapshot of the decrease in mobility and increased out-migration from urban to rural areas during the COVID-19 lockdown. Recommendations based on lessons learned from this process can inform engagements with other regulators in creating data pipelines to inform policy-making.

Information

Type
Translational Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The World Bank, 2021. Published by Cambridge University Press
Figure 0

Figure 1. Administrative boundaries of The Gambia. Source: Authors. Note: Names on the map indicate eight local government areas (LGAs). Boundaries present 48 Districts. LGA boundaries are highlighted in bold.

Figure 1

Figure 2. A Hadoop Cluster as a hardware solution to process CDR data. Source: Authors.

Figure 2

Table 1. Summary of key indicators for the analysisa

Figure 3

Figure 3. Correspondence between Log (population density) and Log (unique subscriber density), using two different known measures of population density. (a) Log (WorldPop density). (b) Log (Census density). Source: Authors. Points represent districts, clustered by LGAs.

Figure 4

Table 2. Correspondence between known population data and call detail records (CDR) dataa

Figure 5

Figure 4. The number of active subscribers in The Gambia—ratio to the baseline. Source: Authors’ calculations.

Figure 6

Table 3. Descriptive statistics of the number of active subscribers for four periods in between the interventions/event (presented as the ratio to the baseline)a

Figure 7

Table 4. Classification of 45 districtsa

Figure 8

Figure 5. Numbers of residents at the district level in urban and rural areas—ratio to the baseline. Source: Authors’ calculations.

Figure 9

Figure 6. Distance traveled at the district level in urban and rural areas—ratio to the baseline. Source: Authors’ calculations.

Figure 10

Figure 7. Population inflows to urban and rural areas at the district level—ratio to the baseline. Source: Authors’ calculations.

Figure 11

Figure 8. Weekly averages of distances traveled at the district level—ratio to the baseline. Source: Authors’ calculations.

Supplementary material: PDF

Arai et al. supplementary material

Response to Reviewers

Download Arai et al. supplementary material(PDF)
PDF 146 KB
Submit a response

Comments

No Comments have been published for this article.

Author comment: The hidden potential of call detail records in The Gambia — R0/PR1

Comments

Dear Madam and Sir,

With reference to your special call for submission for the journal "Data and Policy," we hereby share our draft paper on "The Hidden Potential of Call Detail Records in The Gambia". This paper was prepared jointly by Ayumi Arai, Erwin Knippenberg, Moritz Meyer, and Apichon, and summarizes key findings from the analysis of CDR data in The Gambia to show patterns and trends of human mobility during COVID19. This project was implemented in close collaboration with the national regulator for telecommunication services (PURA), and the national statistics office (GBoS), and in addition to policy-relevant information, this study served as a platform to strengthen technical and statistical capacity in a fragile and low-income country in Africa. We look forward to hearing from you, and stand ready to incorporate comments and suggestions.

Best regards,

Ayumi Arai

Review: The hidden potential of call detail records in The Gambia — R0/PR2

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: Summary of the significance of the article:

The manuscript “The Hidden Potential of Call Detail Records in The Gambia” gives a descriptive account of a collaborative project involving Public Utilities Regulatory Authority

(PURA), The Gambia Bureau of Statistics (GBoS), World Bank, and the authors. A data pipeline making available aggregated indicators derived from Call Detail Records (CDR) fromtwo mobile operators in The Gambia is described. Specifically, the paper states:

“This paper showcases the use of CDR data in The Gambia, a low-income and fragile country in West Africa.”

“Our findings demonstrate how the analysis of CDR data provides important insights into the impact of COVID-19 and social distancing measures on human mobility, …”

“Our contribution demonstrates how a system-building approach can make timely, disaggregated analysis based on CDR data available for quick decision making.”

The use of CDR has been demonstrated to be useful in much literature over the past decade, of which the manuscript references from. Hence, the value of the present discussion is through highlighting a project from The Gambia.

Quality of the paper and its suitability for publication:

The manuscript showcases and demonstrates findings that can be of interest for a broader audience. However, it comes across as weak when summarizing the political implications of the collaborative project. In the reviewer’s opinion, the most important questions remain unanswered: What important insights originating from the CDR pipeline, did the health authorities use? How were policies informed and how did insights shape health authorities’ actions during the COVID-19 pandemic?

Suggestions for improvement:

The following suggestions will strengthen the contribution, improve the scientific quality of the manuscript, and clarify ambiguities and inaccuracies:

1. Section 3 B. contains copies of paragraphs contained in Section 3 A. Remove the redundant text.

2. It is not clear from the descriptions whether only charging data is used (CDRs), or if more detailed location data from network probes is the bases for the pipeline. The significance of this is that CDRs will only be generated whenever a customer initiates a service, whereas data from the network will continuously measure customers’ location. The unclarity stems from the following statement in Section 3 C. Technical: “CDR data are massive datasets, which are huge in size and generated with high speed.” This is not true for CDR data in general. However, it is certainly true for network data.

3. “Ensuring privacy, the identifiable data field will be anonymized using a hashing algorithm, which is irreversible to original data.” Which identifiable data fields are hashed? According to European legislation (GDPR) hashing in itself does not guarantee anonymity, so please clarify the definition of when anonymity has been obtained.

4. “The mobile penetration rate of The Gambia was 94.2% in 2013 and rose to 140% in

2018.” This indicates that multi-SIMing is very frequent in The Gambia. In the solution, when counting the number of travelers between locations, how is multi-SIMing accounted for in the counts to make sure the counts are not inflated due to multi-SIMing behavior? It is important to get the counts right, since these are proxies for population travel patterns.

5. Page 7; line 4-5: “These identifiers are encrypted using a one-way function by the

MNOs so the data provided to the regulator do not include any personally identifiable information.” Hashing is only de-identifying the data records and not fully anonymizing them. Please note that de-identification through hashing is different from anonymous, and these are two different things. According to GDPR, the de-identified data is still potentially sensitive, and should be considered as personally identifiable information. The reason is that an adversary may possess another dataset that together with the deidentified dataset renders it identifiable.

6. Page 7; line 12: Suggest rewriting “… but the above-mentioned aggregation process lowers the risk of being reverse engineered substantially” to ““… but the abovementioned aggregation process lowers the risk of reverse engineering.”

7. What is the significance of including this sentence, when it is stated that two weeks of data was used? “It could ideally be computed for a period of four weeks before the initial COVID-19 cases were announced, which was 17 March, if the data before

March were available.”

8. Table 1: Indicator 3 Use-case column states “Proxy for population and population movement”. This is only a proxy for population count, I believe.

9. Page 8; Section C Application in The Gambia – First bullet point: “In our data, we observe no significant fluctuations in total transaction volumes over the data period.” What is the significance of this?

10. Page 8; Section C Application in The Gambia – Fourth bullet point: “we do not use this indicator for generating Origin-Destination (OD) matrices as we were not able to examine how the OD matrix is impacted by missing links between the origin and final destination regions”. OD matrices are the most important empirical tool for mapping and understanding the travel patterns in a country, and very important in epidemiological modelling to forecast disease spread. Hence, the reviewer is very puzzled by this statement, and believe that an elaboration is needed to give more details into why the OD matrices have not been used.

11. Page 8; Section C Application in The Gambia – Last paragraph: Suggestion for general improvement is to highlight and extend findings and the indicators’ specific relevance to COVID-19.

12. Page 8; Section 5 A – First sentence: I believe “population movement” should be “population distribution”.

13. Page 8; Section 5 A: Is it IMEIs that is being used? This will count the number of unique handsets, whereas IMSI will count the number of subscribers. Clarification needed.

14. Figure 4: There is made reference to a baseline, without defining what this baseline is. Please explain.

15. Section 6 B Technical constraints: The sentence “In addition, we had to employ complex techniques and multiple steps to complete a simple task, which means a single step easily run by an available code was divided into multiple steps with intermediate results. This is because such a simple task requires a lot of time for computation once it starts running, which could be easily interrupted due to an unstable network environment.” is hard to comprehend. Please consider rewriting.

16. Section 6 C Policy dialog – towards the end of the section:

o Population movement patterns … could inform targeted testing initiatives, …

o … this can also inform where constraints on mobility should be enforced ….

My question is: How was any of this information used by the Government or health authorities in The Gambia? Having this information is not the same as acting upon it. Did the right stakeholder within the Government have access to the information? Were the findings and insights shared with the decision makers putting in place testing policies? How was mobility information used when deciding on the socialdistancing policy implemented on March 1 8th, 2020 in The Gambia?

Review: The hidden potential of call detail records in The Gambia — R0/PR3

Conflict of interest statement

I know Erwin, one of the authors.

Comments

Comments to Author: Since the analysis of CDR data did not take into account biases of multiple ownership of SIM cards; the authors should at least reference literature where this has been done and fully articulate the implications of this bias with emphasis and additional detail so that the results of the analysis are considered with this in mind.

A very solid and well detailed paper. Strong literature review, outline of the engagement processes and technical methodology of the analysis is well narrated.

Recommendation: The hidden potential of call detail records in The Gambia — R0/PR4

Comments

Comments to Author: Please take into account the detailed comments of the reviewers, where possible. In addition, also try to answer the questions as they form the essence of the special issue:

- What important insights originating from the CDR pipeline, did the health

authorities use?

- How were policies informed and how did insights shape health authorities’

actions during the COVID-19 pandemic?

Decision: The hidden potential of call detail records in The Gambia — R0/PR5

Comments

No accompanying comment.

Author comment: The hidden potential of call detail records in The Gambia — R1/PR6

Comments

Dear Editor, Data & Policy Journal

We would like to thank you for the letter dated 1 March 2021, and the opportunity to resubmit a revised copy of this manuscript. We would also like to take this opportunity to express our appreciation to the reviewers for the positive feedback and helpful comments for correction and modification.

We believe it has resulted in an improved manuscript, which you will find uploaded alongside this document. The manuscript has been revised to address the reviewer comments, which are appended alongside our responses to this letter.

We very much hope the revised manuscript is accepted for publication in Journal.

Best regards,

Ayumi Arai (on behalf of the team)

Review: The hidden potential of call detail records in The Gambia — R1/PR7

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: Dear Authors,

I have now reviewed the revisions to the manuscript "The Hidden Potential of Call Detail Records in The Gambia" (DAP-2020-0040.R1).

I am delighted to inform you that I accept all your revisions, and I register great improvements to all comments raised in the first review. I have no further comments or suggestions.

My recommendation is that this revised version of the manuscript can be published in Data & Policy.

Best regards,

Kenth Engø-Monsen

Recommendation: The hidden potential of call detail records in The Gambia — R1/PR8

Comments

No accompanying comment.

Decision: The hidden potential of call detail records in The Gambia — R1/PR9

Comments

No accompanying comment.