Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-06T14:33:48.953Z Has data issue: false hasContentIssue false

Data pre-processing to improve the mining of large feed databases

Published online by Cambridge University Press:  08 March 2013

F. Maroto-Molina*
Affiliation:
Servicio de Información sobre Alimentos, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
A. Gómez-Cabrera
Affiliation:
Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
J. E. Guerrero-Ginel
Affiliation:
Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
A. Garrido-Varo
Affiliation:
Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
D. Sauvant
Affiliation:
UMR 791 Physiologie de la nutrition et de l'alimentation, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
G. Tran
Affiliation:
Association Française de Zootechnie, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
V. Heuzé
Affiliation:
Association Française de Zootechnie, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
D. C. Pérez-Marín
Affiliation:
Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
*
E-mail: g02mamof@uco.es
Get access

Abstract

The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.

Information

Type
Nutrition
Copyright
Copyright © The Animal Consortium 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Supplementary material: Image

Maroto Molina Supplementary Material

Appendix

Download Maroto Molina Supplementary Material(Image)
Image 664 KB