Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-12T17:22:40.883Z Has data issue: false hasContentIssue false

Crowd-assessing quality in uncertain data linking datasets

Published online by Cambridge University Press:  02 July 2020

Daniel Faria
Affiliation:
Instituto Gulbenkian de Ciência, Oeiras, Portugal e-mail: dfaria@igc.gulbenkian.pt INESC-ID, Lisboa, Portugal
Alfio Ferrara
Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: alfio.ferrara@unimi.it, stefano.montanelli@unimi.it Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Ernesto Jiménez-ruiz
Affiliation:
City, University of London, London, UK e-mail: ernesto.jimenez-ruiz@city.ac.uk Department of Informatics, University of Oslo, Oslo, Norway e-mail: ernestoj@ifi.uio.no
Stefano Montanelli
Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: alfio.ferrara@unimi.it, stefano.montanelli@unimi.it Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Catia Pesquita
Affiliation:
Lasige, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal e-mail: clpesquita@fc.ul.pt

Abstract

The quality of a dataset used for evaluating data linking methods, techniques, and tools depends on the availability of a set of mappings, called reference alignment, that is known to be correct. In particular, it is crucial that mappings effectively represent relations between pairs of entities that are indeed similar due to the fact that they denote the same object. Since the reliability of mappings is decisive in order to perform a fair evaluation of automatic linking methods and tools, we call this property of mappings as mapping fairness. In this article, we propose a crowd-based approach, called Crowd Quality (CQ), for assessing the quality of data linking datasets by measuring the fairness of the mappings in the reference alignment. Moreover, we present a real experiment, where we evaluate two state-of-the-art data linking tools before and after the refinement of the reference alignment based on the CQ approach, in order to present the benefits deriving from the crowd assessment of mapping fairness.

Information

Type
Research Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable