Abstract
Intrusion detection research has proliferated rapidly in recent years due to applications of deep learning models in particular. While advances might have also happened, these are impossible to evaluate or even understand because the research is almost entirely incomparable and irreproducible. As the position paper argues, the issues cover also data provenance problems. By reviewing a few datasets openly released on the Kaggle platform, various severe problems are easy to demonstrate. The paper also presents a few ideas on what the long-term consequences may be and how the problems could be at least partially remedied.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)