Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-07T12:25:47.804Z Has data issue: false hasContentIssue false

The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection

Published online by Cambridge University Press:  25 November 2024

Pak Hung Chan*
Affiliation:
WMG, University of Warwick, Coventry, UK
Boda Li
Affiliation:
WMG, University of Warwick, Coventry, UK
Gabriele Baris
Affiliation:
WMG, University of Warwick, Coventry, UK
Qasim Sadiq
Affiliation:
WMG, University of Warwick, Coventry, UK
Valentina Donzella
Affiliation:
WMG, University of Warwick, Coventry, UK
*
Corresponding author: Pak Hung Chan; Email: Pak.Chan.1@warwick.ac.uk

Abstract

Assisted and automated driving functions will rely on machine learning algorithms, given their ability to cope with real-world variations, e.g. vehicles of different shapes, positions, colors, and so forth. Supervised learning needs annotated datasets, and several automotive datasets are available. However, these datasets are tremendous in volume, and labeling accuracy and quality can vary across different datasets and within dataset frames. Accurate and appropriate ground truth is especially important for automotive, as “incomplete” or “incorrect” learning can negatively impact vehicle safety when these neural networks are deployed. This work investigates the ground truth quality of widely adopted automotive datasets, including a detailed analysis of KITTI MoSeg. According to the identified and classified errors in the annotations of different automotive datasets, this article provides three different criteria collections for producing improved annotations. These criteria are enforceable and applicable to a wide variety of datasets. The three annotations sets are created to (i) remove dubious cases; (ii) annotate to the best of human visual system; and (iii) remove clear erroneous BBs. KITTI MoSeg has been reannotated three times according to the specified criteria, and three state-of-the-art deep neural network object detectors are used to evaluate them. The results clearly show that network performance is affected by ground truth variations, and removing clear errors is beneficial for predicting real-world objects only for some networks. The relabeled datasets still present some cases with “arbitrary”/“controversial” annotations, and therefore, this work concludes with some guidelines related to dataset annotation, metadata/sublabels, and specific automotive use cases.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Examples of errors (highlighted by dotted rectangles) in the BBs (green rectangles) of KITTI MoSeg (left) and nuScenes (right): upper frames show missing BBs, and lower frames show BBs not belonging to any objects

Figure 1

Table 1. The three sets of criteria for the reannotation of the KITTI MoSeg dataset for object detection

Figure 2

Table 2. Number of ground truth BBs for each set of annotations, with C# denoting the number of the set of criteria (1–3), and MoSeg denoting the original labels. The total number of frames (and their split into training and testing parts) remains the same throughout all the experiments

Figure 3

Table 3. Examples of ambiguous situations when applying the proposed criteria, related visual examples are given in Figure 2

Figure 4

Figure 2. Examples of cases that are ambiguous or open to interpretation cases depending on the criterion applied

Figure 5

Figure 3. Calculated mAP50 for a) Faster R-CNN, b) YOLOv3, c) DETR; on the x-axis, there are the annotations used for the testing datasets, and the different colors stand for the different labels of the training and validation sets

Figure 6

Table 4. $ {mAP}_{50} $ of the three network models trained on the different criteria (rows) and tested on the different criteria (columns)

Submit a response

Comments

No Comments have been published for this article.