Hostname: page-component-848d4c4894-p2v8j Total loading time: 0.001 Render date: 2024-05-17T10:58:26.426Z Has data issue: false hasContentIssue false

For human-like models, train on human-like tasks

Published online by Cambridge University Press:  06 December 2023

Katherine Hermann
Affiliation:
Google DeepMind, Mountain View, CA, USA hermannk@google.com
Aran Nayebi
Affiliation:
McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA aran.nayebi@gmail.com https://anayebi.github.io/
Sjoerd van Steenkiste
Affiliation:
Google Research, Mountain View, CA, USA sjoerdvansteenkiste@gmail.com https://www.sjoerdvansteenkiste.com/
Matt Jones
Affiliation:
Google Research, Mountain View, CA, USA sjoerdvansteenkiste@gmail.com https://www.sjoerdvansteenkiste.com/ Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA mcj@colorado.edu http://matt.colorado.edu

Abstract

Bowers et al. express skepticism about deep neural networks (DNNs) as models of human vision due to DNNs' failures to account for results from psychological research. We argue that to fairly assess DNNs, we must first train them on more human-like tasks which we hypothesize will induce more human-like behaviors and representations.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv preprint arXiv:1907.02893.Google Scholar
Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRefGoogle Scholar
Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV) (pp. 456–473).CrossRefGoogle Scholar
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., … Fu, C. K. (2023). Do as I can, not as I say: Grounding language in robotic affordances. In Conference on robot learning (pp. 287318). PMLR.Google Scholar
Chen, X., Wang, X., Changpinyo, S., Piergiovanni, A. J., Padlewski, P., Salz, D., … Soricut, R. (2023). Pali: A jointly-scaled multilingual language-image model. International conference on learning representations (ICLR).Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).CrossRefGoogle Scholar
Gan, C., Schwartz, J., Alter, S., Schrimpf, M., Traer, J., De Freitas, J., … Yamins, D. L. K. (2021). ThreeDWorld: A platform for interactive multi-modal physical simulation. Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665673.CrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems (NeurIPS), 34, 2388523899.Google Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. International conference on learning representations (ICLR).Google Scholar
Greff, K., Belletti, F., Beyer, L., Doersch, C., Du, Y., Duckworth, D., … Tagliasacchi, A. (2022). Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3749–3761).CrossRefGoogle Scholar
Haber, N., Mrowca, D., Wang, S., Fei-Fei, L. F., & Yamins, D. L. (2018). Learning to play with intrinsically-motivated, self-aware agents. Advances in Neural Information Processing Systems (NeurIPS), 31.Google Scholar
Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 33, 1900019015.Google Scholar
Hill, F., Lampinen, A., Schneider, R., Clark, S., Botvinick, M., McClelland, J. L., & Santoro, A. (2020). Environmental drivers of systematicity and generalization in a situated agent. International conference on learning representations (ICLR).Google Scholar
Konkle, T., & Alvarez, G. A. (2022). A self-supervised domain-general learning framework for human ventral stream representation. Nature Communications, 13(1), 491.CrossRefGoogle ScholarPubMed
Kucker, S. C., Samuelson, L. K., Perry, L. K., Yoshida, H., Colunga, E., Lorenz, M. G., & Smith, L. B. (2019). Reproducibility and a unifying explanation: Lessons from the shape bias. Infant Behavior and Development, 54, 156165.CrossRefGoogle Scholar
Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? Transactions of Machine Learning Research.Google Scholar
Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3(3), 299321.CrossRefGoogle Scholar
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 5768.CrossRefGoogle ScholarPubMed
McCoy, R. T., Pavlick, E., & Linzen, T. (2020). Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In 57th annual meeting of the association for computational linguistics, ACL 2019 (pp. 34283448). Association for Computational Linguistics (ACL). https://aclanthology.org/P19-1334/Google Scholar
Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A., & Kornblith, S. (2023). Human alignment of neural network representations. International conference on learning representations (ICLR).Google Scholar
Nayebi, A., Kong, N. C., Zhuang, C., Gardner, J. L., Norcia, A. M., & Yamins, D. L. (2021). Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation. BioRxiv, 2021-06.Google Scholar
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427–436).CrossRefGoogle Scholar
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). VirtualHome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8494–8502).CrossRefGoogle Scholar
Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., … Batra, D. (2019). Habitat: A platform for embodied AI research. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9339–9347).CrossRefGoogle Scholar
Schrimpf, M. (2022). Advancing system models of brain processing via integrative benchmarking. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like? BioRxiv, 407007.Google Scholar
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 843–852).CrossRefGoogle Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.Google Scholar
Weihs, L., Kembhavi, A., Ehsani, K., Pratt, S. M., Han, W., Herrasti, A., … Farhadi, A. (2021). Learning generalizable visual representations via interactive gameplay. International conference on learning representations (ICLR).Google Scholar
Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., … Su, H. (2020). Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11097–11107).CrossRefGoogle Scholar
Xiao, K., Engstrom, L., Ilyas, A., & Madry, A. (2021). Noise or signal: The role of image backgrounds in object recognition. International conference on learning representations (ICLR).Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. International conference on learning representations (ICLR).Google Scholar
Zhuang, C., Xiang, V., Bai, Y., Jia, X., Turk-Browne, N., Norman, K., … Yamins, D. L. (2022). How well do unsupervised learning algorithms model human real-time and life-long learning? In Thirty-sixth conference on neural information processing systems datasets and benchmarks track.CrossRefGoogle Scholar
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRefGoogle ScholarPubMed