Hostname: page-component-89b8bd64d-46n74 Total loading time: 0 Render date: 2026-05-13T04:57:48.839Z Has data issue: false hasContentIssue false

The impact of modeling decisions in statistical profiling

Published online by Cambridge University Press:  02 October 2023

Ruben L. Bach*
Affiliation:
Mannheim Centre for European Social Research (MZES), University of Mannheim, Mannheim, Germany
Christoph Kern
Affiliation:
Department of Statistics, LMU Munich, Munich, Germany
Hannah Mautner
Affiliation:
dmTECH, Karlsruhe, Germany
Frauke Kreuter
Affiliation:
Department of Statistics, LMU Munich, Munich, Germany Joint Program in Survey Methodology, University of Maryland, College Park, MD, USA
*
Corresponding author: Ruben L. Bach; Email: r.bach@uni-mannheim.de

Abstract

Statistical profiling of job seekers is an attractive option to guide the activities of public employment services. Many hope that algorithms will improve both efficiency and effectiveness of employment services’ activities that are so far often based on human judgment. Against this backdrop, we evaluate regression and machine-learning models for predicting job-seekers’ risk of becoming long-term unemployed using German administrative labor market data. While our models achieve competitive predictive performance, we show that training an accurate prediction model is just one element in a series of design and modeling decisions, each having notable effects that span beyond predictive accuracy. We observe considerable variation in the cases flagged as high risk across models, highlighting the need for systematic evaluation and transparency of the full prediction pipeline if statistical profiling techniques are to be implemented by employment agencies.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Table 1. LTU episodes and affected individuals, by year

Figure 1

Table 2. Groups of predictors, with examples

Figure 2

Table 3. Prediction performance of selected prediction models (in 2016), trained with 2010–2015 data

Figure 3

Table 4. Jaccard similarities between class predictions (in 2016)

Figure 4

Figure 1. Top 10 feature importance for selected prediction models (in 2016). Panel (a) shows feature importance for the logistic regression approach, panel (b) shows feature importance for the penalized logistic regression approach, panel (c) shows feature importance for the random forest approach, and panel (d) shows feature importance for the gradient boosting machines approach. See Table A.1 in the Supplementary Material for a detailed description of the variable labels.

Figure 5

Table 5. Arithmetic means of selected features for predicted LTU episodes (in 2016)

Supplementary material: File

Bach et al. supplementary material
Download undefined(File)
File 213.6 KB
Submit a response

Comments

No Comments have been published for this article.