Hostname: page-component-76fb5796d-25wd4 Total loading time: 0 Render date: 2024-04-29T11:01:11.066Z Has data issue: false hasContentIssue false

Fixing the problems of deep neural networks will require better training data and learning algorithms

Published online by Cambridge University Press:  06 December 2023

Drew Linsley
Affiliation:
Department of Cognitive Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA drew_linsley@brown.edu thomas_serre@brown.edu https://sites.brown.edu/drewlinsley https://serre-lab.clps.brown.edu
Thomas Serre
Affiliation:
Department of Cognitive Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA drew_linsley@brown.edu thomas_serre@brown.edu https://sites.brown.edu/drewlinsley https://serre-lab.clps.brown.edu

Abstract

Bowers et al. argue that deep neural networks (DNNs) are poor models of biological vision because they often learn to rival human accuracy by relying on strategies that differ markedly from those of humans. We show that this problem is worsening as DNNs are becoming larger-scale and increasingly more accurate, and prescribe methods for building DNNs that can reliably model biological vision.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRefGoogle Scholar
Bakhtiari, S., Mineault, P., Lillicrap, T., Pack, C., & Richards, B. (2021). The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. Advances in Neural Information Processing Systems, 34, 2516425178.Google Scholar
Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D., & DiCarlo, J. J. (2020). Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 1307313087). Curran.Google Scholar
Fel, T., Felipe, I., Linsley, D., & Serre, T. (2022). Harmonizing the object recognition strategies of deep neural networks with humans. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., & Oh, A. (Eds.), Advances in neural information processing systems (Vol. 35, pp. 94329446). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/3d681cc4487b97c08e5aa67224dd74f2-Paper-Conference.pdfGoogle Scholar
Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665673.CrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Wortman Vaughan, J. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 2388523899). Curran.Google Scholar
Kim, J., Linsley, D., Thakkar, K., & Serre, T. (2020). Disentangling neural mechanisms for perceptual grouping. In Z. Chen, J. Zhang, M. Arjovsky, & L. Bottou (Eds.), International Conference on Learning Representations, Addis Abada, Ethopia.Google Scholar
Kim, J., Ricci, M., & Serre, T. (2018). Not-So-CLEVR: Learning same-different relations strains feedforward neural networks. Interface Focus, 8(4), 20180011.CrossRefGoogle ScholarPubMed
Kubilius, J., Schrimpf, M., Nayebi, A., Bear, D., Yamins, D. L. K., & DiCarlo, J. J. (2018). CORnet: Modeling the neural mechanisms of core object recognition. bioRxiv, 408385. https://doi.org/10.1101/408385Google Scholar
Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? https://openreview.net › forumhttps://openreview.net › forum. https://openreview.net/pdf?id=qrGKGZZvH0Google Scholar
Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews. Neuroscience, 21(6), 335346.CrossRefGoogle ScholarPubMed
Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). What are the visual features underlying human versus machine vision? In Y. Song, C. Ma, L. Gong, J. Zhang, R. W. H. Lau, & M. Yang (Eds.), IEEE international conference on computer vision workshops, Venice, Italy (pp. 2706–2714).CrossRefGoogle Scholar
Linsley, D., Kim, J., Ashok, A., & Serre, T. (2019a). Recurrent neural circuits for contour detection. International conference on representation learning. https://openreview.net/forum?id=H1gB4RVKvB&noteId=H1gB4RVKvBGoogle Scholar
Linsley, D., Kim, J., Veerabadran, V., Windolf, C., & Serre, T. (2018). Learning long-range spatial dependencies with horizontal gated recurrent units. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R. (Eds.), Advances in neural information processing systems (Vol. 31, pp. 152164). Curran.Google Scholar
Linsley, D., Malik, G., Kim, J., Govindarajan, L. N., Mingolla, E., & Serre, T. (2021). Tracking without re-recognition in humans and machines. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 1947319486). Curran.Google Scholar
Linsley, D., Shiebler, D., Eberhardt, S., & Serre, T. (2019). Learning what and where to attend. In I. Loshchilov & F. Hutter (Eds.), 7th International conference on representation learning, New Orleans.Google Scholar
Lotter, W., Kreiman, G., & Cox, D. (2016). Deep predictive coding networks for video prediction and unsupervised learning. arXiv [cs.LG]. http://arxiv.org/abs/1605.08104Google Scholar
Malhotra, G., Dujmović, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modeling visual object recognition. PLoS Computational Biology, 18(5), e1009572.CrossRefGoogle ScholarPubMed
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 5768.CrossRefGoogle ScholarPubMed
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. arXiv [cs.CV]. http://arxiv.org/abs/2003.08934Google Scholar
Mineault, P., Bakhtiari, S., Richards, B., & Pack, C. (2021). Your head is there to move you around: Goal-driven models of the primate dorsal pathway. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 2875728771). Curran.Google Scholar
Nayebi, A., Bear, D., Kubilius, J., Kar, K., Ganguli, S., Sussillo, D., … Yamins, D. L. K. (2018). Task-driven convolutional recurrent models of the visual system. arXiv [q-bio.NC]. http://arxiv.org/abs/1807.00053Google Scholar
Orhan, E., Gupta, V., & Lake, B. M. (2020). Self-supervised learning through the eyes of a child. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 99609971). Curran.Google Scholar
Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 17611770.CrossRefGoogle ScholarPubMed
Smith, L. B., & Slone, L. K. (2017). A developmental approach to machine learning? Frontiers in Psychology, 8, 2124.CrossRefGoogle ScholarPubMed
Sullivan, J., Mei, M., Perfors, A., Wojcik, E., & Frank, M. C. (2021). SAYCam: A large, longitudinal audiovisual dataset recorded from the infant's perspective. Open Mind: Discoveries in Cognitive Science, 5, 2029.CrossRefGoogle ScholarPubMed
Vaishnav, M., Cadene, R., Alamia, A., Linsley, D., VanRullen, R., & Serre, T. (2022). Understanding the computational demands underlying visual reasoning. Neural Computation, 34(5), 10751099.CrossRefGoogle ScholarPubMed
Vaishnav, M., & Serre, T. (2023). GAMR: A guided attention model for (visual) reasoning. International conference on learning representations. https://openreview.net/pdf?id=iLMgk2IGNyvGoogle Scholar
Wiskott, L., & Sejnowski, T. J. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715770.CrossRefGoogle ScholarPubMed
Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 86198624.CrossRefGoogle ScholarPubMed
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. K. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118. https://doi.org/10.1073/pnas.2014196118Google ScholarPubMed