Hostname: page-component-6766d58669-88psn Total loading time: 0 Render date: 2026-05-15T13:47:38.384Z Has data issue: false hasContentIssue false

A dual-stage system for real-time license plate detection and recognition on mobile security robots

Published online by Cambridge University Press:  02 January 2025

Amir Ismail*
Affiliation:
Ecole Nationale d’Ingénieurs de Sousse, LATIS-Laboratory of Advanced Technology and Intelligent Systems, Université de Sousse, Sousse, Tunisie Novation City Technopole de Sousse, Enova Robotics S.A., Sousse, Tunisie
Maroua Mehri
Affiliation:
Ecole Nationale d’Ingénieurs de Sousse, LATIS-Laboratory of Advanced Technology and Intelligent Systems, Université de Sousse, Sousse, Tunisie
Anis Sahbani
Affiliation:
Novation City Technopole de Sousse, Enova Robotics S.A., Sousse, Tunisie Institute for Intelligent Systems and Robotics (ISIR), CNRS, Sorbonne Université, Paris, France
Najoua Essoukri Ben Amara
Affiliation:
Ecole Nationale d’Ingénieurs de Sousse, LATIS-Laboratory of Advanced Technology and Intelligent Systems, Université de Sousse, Sousse, Tunisie
*
Corresponding author: Amir Ismail; Email: amir.ismail@eniso.u-sousse.tn
Rights & Permissions [Opens in a new window]

Abstract

Automatic license plate recognition (ALPR) systems are increasingly used to solve issues related to surveillance and security. However, these systems assume constrained recognition scenarios, thereby restricting their practical use. Therefore, we address in this article the challenge of recognizing vehicle license plates (LPs) from the video feeds of a mobile security robot by proposing an efficient two-stage ALPR system. Our ALPR system combines the on-the-shelf YOLOv7x model with a novel LP recognition model, called vision transformer-based LP recognizer (ViTLPR). ViTLPR is based on the self-attention mechanism to read character sequences on LPs. To ease the deployment of our ALPR system on mobile security robots and improve its inference speed, we also propose an optimization strategy. As an additional contribution, we provide an ALPR dataset, named PGTLP-v2, collected from surveillance robots patrolling several plants. The PGTLP-v2 dataset has multiple features to cover chiefly the in-the-wild scenario. To evaluate the effectiveness of our ALPR system, experiments are carried out on the PGTLP-v2 dataset and five benchmark ALPR datasets collected from different countries. Extensive experiments demonstrate that our proposed ALPR system outperforms state-of-the-art baselines.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. PGuard robot scope. PGuard is equipped with advanced features that enable it to effectively patrol and secure specific plants, either autonomously or through remote control. It streams real-time video and audio for monitoring and video analytics.

Figure 1

Figure 2. Pipeline of the proposed automatic license plate recognition system deployed on PGuard.

Figure 2

Figure 3. Vision transformer-based LP recognizer architecture. Raw license plate images are partitioned into square patches and transformed into a sequence of vectors. After adding positional information, vectors are passed through a stacking of$L$Vanilla transformer encoders. Finally, feature sequence is fed to the prediction head for character recognition.

Figure 3

Table I. Specifications of the six benchmarks used in our experiments to evaluate the performance of the proposed automatic license plate recognition (ALPR) system.

Figure 4

Figure 4. Image samples in PGTLP-v2. From top to down, the first row presents images ($1,920 \times 1,080$) at entrance checkpoint. The second row contains$180^{\circ }$panoramic view images ($2,560 \times 1,024$) at restricted access plant.

Figure 5

Figure 5. Image samples from the six datasets used in our experiments and their respective license plates with respect to the ground-truth annotations.

Figure 6

Table II. An overview of the number of images used for training, testing, and validation in each dataset.

Figure 7

Table III. Architectural parameters of vision transformer-based LP recognizer.

Figure 8

Table IV. Selected hyperparameters for training the two modules of the proposed automatic license plate recognition system (YOLOv7x and vision transformer-based LP recognizer (ViTLPR)).

Figure 9

Table V. Detection and recognition results on PGTLP-v2.

Figure 10

Figure 6. Qualitative results of vision transformer-based LP recognizer on image samples from benchmarks used in our experiments (PGTLP-v2, UFPR-ALPR, CCPD, AOLP-RP, LSV-LP, and RodoSol-ALPR). Best viewed in color and zoomed in.

Figure 11

Table VI. Detection and recognition results on LSV-LP.

Figure 12

Table VII. Detection and recognition results on RodoSol-ALPR.

Figure 13

Table VIII. Detection and recognition results on UFPR-ALPR.

Figure 14

Table IX. Detection and recognition results on CCPD.

Figure 15

Table X. Detection and recognition results on AOLP.

Figure 16

Table XI. Performance of the proposed automatic license plate recognition system on the PGTLP-v2 test set.

Figure 17

Table XII. Recall and license plate recognition rates following the Try-One-Dataset-Out (*) validation protocol.

Figure 18

Table XIII. LP-RR rates and inference times achieved without a deblurring step (w/o) and with a deblurring module applied using LaKDNet [60] (ViTLPR w/ LaKDNet) and NAFNet [61] (ViTLPR w/ NAFNet).

Figure 19

Figure 7. Quantitative results of the detected license plates without a deblurring step (w/o) and adding a deblurring module applied using LaKDNet [60] (ViTLPR w/ LaKDNet) and NAFNet [61] (ViTLPR w/ NAFNet). For better clarity, it is recommended to zoom in.