Hostname: page-component-77f85d65b8-g4pgd Total loading time: 0 Render date: 2026-03-28T12:50:34.639Z Has data issue: false hasContentIssue false

The dutch draw: constructing a universal baseline for binary classification problems

Published online by Cambridge University Press:  19 September 2024

Etienne van de Bijl*
Affiliation:
Centrum Wiskunde & Informatica
Jan Klein*
Affiliation:
Centrum Wiskunde & Informatica
Joris Pries*
Affiliation:
Centrum Wiskunde & Informatica
Sandjai Bhulai*
Affiliation:
Vrije Universiteit Amsterdam
Mark Hoogendoorn*
Affiliation:
Vrije Universiteit Amsterdam
Rob van der Mei*
Affiliation:
Vrije Universiteit Amsterdam and Centrum Wiskunde & Informatica
*
*Postal address: Centrum Wiskunde & Informatica, Department of Stochastics, Science Park 123, 1098 XG, Amsterdam, Netherlands.
*Postal address: Centrum Wiskunde & Informatica, Department of Stochastics, Science Park 123, 1098 XG, Amsterdam, Netherlands.
*Postal address: Centrum Wiskunde & Informatica, Department of Stochastics, Science Park 123, 1098 XG, Amsterdam, Netherlands.
*****Postal address: Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1111, 1081 HV, Amsterdam, Netherlands.
*******Postal address: Department of Computer Science, De Boelelaan 1111, 1081 HV, Amsterdam, Netherlands. Email: m.hoogendoorn@vu.nl
*Postal address: Centrum Wiskunde & Informatica, Department of Stochastics, Science Park 123, 1098 XG, Amsterdam, Netherlands.

Abstract

Novel prediction methods should always be compared to a baseline to determine their performance. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is, therefore, required to evaluate the ‘goodness’ of a performance score. Comparing results with the latest state-of-the-art model is usually insightful. However, being state-of-the-art is dynamic, as newer models are continuously developed. Contrary to an advanced model, it is also possible to use a simple dummy classifier. However, the latter model could be beaten too easily, making the comparison less valuable. Furthermore, most existing baselines are stochastic and need to be computed repeatedly to get a reliable expected performance, which could be computationally expensive. We present a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. Theoretically, we derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is general, as it is applicable to any binary classification problem; simple, as it can be quickly determined without training or parameter tuning; and informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, it is a robust and universal baseline that enables comparisons across research papers. Second, it provides a sanity check during the prediction model’s development process. When a model does not outperform the DD baseline, it is a major warning sign.

Information

Type
Original Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Supplementary material: File

van de Bijl et al. supplementary material 1

van de Bijl et al. supplementary material
Download van de Bijl et al. supplementary material 1(File)
File 502.4 KB
Supplementary material: File

van de Bijl et al. supplementary material 2

van de Bijl et al. supplementary material
Download van de Bijl et al. supplementary material 2(File)
File 345.6 KB
Supplementary material: File

van de Bijl et al. supplementary material 3

van de Bijl et al. supplementary material
Download van de Bijl et al. supplementary material 3(File)
File 18.1 KB