Skip to main content
    • Aa
    • Aa
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 31
  • Cited by
    This article has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Bertram, Craig and Stafford, Tom 2016. Improving training for sensory augmentation using the science of expertise. Neuroscience & Biobehavioral Reviews, Vol. 68, p. 234.

    Hamilton-Fletcher, Giles Ward, Jamie and Wright, Thomas D. 2016. Cross-Modal Correspondences Enhance Performance on a Colour-to-Sound Sensory Substitution Device. Multisensory Research, Vol. 29, Issue. 4-5, p. 337.

    Mottaghi, Roozbeh Fidler, Sanja Yuille, Alan Urtasun, Raquel and Parikh, Devi 2016. Human-Machine CRFs for Identifying Bottlenecks in Scene Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, Issue. 1, p. 74.

    Watson, David M. Hymers, Mark Hartley, Tom and Andrews, Timothy J. 2016. Patterns of neural response in scene-selective regions of the human brain are affected by low-level manipulations of spatial frequency. NeuroImage, Vol. 124, p. 107.

    Yeum, Chul Min Dyke, Shirley J. Basora Rovira, Ricardo E. Silva, Christian and Demo, Jeff 2016. Acceleration-Based Automated Vehicle Classification on Mobile Bridges. Computer-Aided Civil and Infrastructure Engineering,

    2016. Perception of Pixelated Images.

    Bahmanyar, Reza and Murillo Montes de Oca, Ambar 2015. 2015 IEEE International Conference on Image Processing (ICIP). p. 566.

    Filipe, Sílvio and Alexandre, Luís A. 2015. RETRACTED ARTICLE: From the human visual system to the computational models of visual attention: a survey. Artificial Intelligence Review, Vol. 43, Issue. 4, p. 601.

    Phillips, W.A. Clark, A. and Silverstein, S.M. 2015. On the functions, mechanisms, and malfunctions of intracortical contextual modulation. Neuroscience & Biobehavioral Reviews, Vol. 52, p. 1.

    Röhrbein, Florian Goddard, Peter Schneider, Michael James, Georgina and Guo, Kun 2015. How does image noise affect actual and predicted human gaze allocation in assessing image quality?. Vision Research, Vol. 112, p. 11.

    Thakoor, Kaveri A. 2015. 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE). p. 43.

    Vargas-Sierra, Sonia Linán-Cembrano, Gustavo and Rodríguez-Vázquez, Ángel 2015. A 151 dB High Dynamic Range CMOS Image Sensor Chip Architecture With Tone Mapping Compression Embedded In-Pixel. IEEE Sensors Journal, Vol. 15, Issue. 1, p. 180.

    Banks, Martin S. Cooper, Emily A. and Piazza, Elise A. 2014. Camera Focal Length and the Perception of Pictures. Ecological Psychology, Vol. 26, Issue. 1-2, p. 30.

    Barenholtz, Elan 2014. Quantifying the role of context in visual object recognition. Visual Cognition, Vol. 22, Issue. 1, p. 30.

    Levy-Tzedek, Shelly Riemer, Dar and Amedi, Amir 2014. Color improves “visual” acuity via sound. Frontiers in Neuroscience, Vol. 8,

    Mottaghi, Roozbeh Chen, Xianjie Liu, Xiaobai Cho, Nam-Gyu Lee, Seong-Whan Fidler, Sanja Urtasun, Raquel and Yuille, Alan 2014. 2014 IEEE Conference on Computer Vision and Pattern Recognition. p. 891.

    Tanabe-Ishibashi, Azumi Ikeda, Takashi and Osaka, Naoyuki 2014. Raise two effects with one scene: scene contexts have two separate effects in visual working memory of target faces. Frontiers in Psychology, Vol. 5,

    Zhang, Weigang Liu, Chunxi Wang, Zhenjun Li, Guorong Huang, Qingming and Gao, Wen 2014. Web video thumbnail recommendation with content-aware analysis and query-sensitive matching. Multimedia Tools and Applications, Vol. 73, Issue. 1, p. 547.

    Gygli, Michael Grabner, Helmut Riemenschneider, Hayko Nater, Fabian and Gool, Luc Van 2013. 2013 IEEE International Conference on Computer Vision. p. 1633.

    Hamilton-Fletcher, Giles and Ward, Jamie 2013. Representing Colour Through Hearing and Touch in Sensory Substitution Devices. Multisensory Research, Vol. 26, Issue. 6, p. 503.


How many pixels make an image?

  • DOI:
  • Published online: 01 January 2009

The human visual system is remarkably tolerant to degradation in image resolution: human performance in scene categorization remains high no matter whether low-resolution images or multimegapixel images are used. This observation raises the question of how many pixels are required to form a meaningful representation of an image and identify the objects it contains. In this article, we show that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes. Most strikingly, this low resolution permits observers to report, with 80% accuracy, four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation. The robustness of the information available at very low resolution for describing semantic content of natural images could be an important asset to explain the speed and efficiently at which the human brain comprehends the gist of visual scenes.

Corresponding author
*Address correspondence and reprint requests to: Antonio Torralba, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32-D432, 32 Vassar Street, Cambridge, MA 02139. E-mail:
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

T. Bachmann (1991). Identification of spatially quantized tachistoscopic images of faces: How many pixels does it take to carry identity? European Journal of Cognitive Psychology 3, 85–103.

M. Bar (2004). Visual objects in context. Nature Neuroscience Reviews 5, 617–629.

M. Bar (2007). The proactive brain: Using analogies and associations to generate predictions. Trends in Cognitive Sciences 11, 280–289.

D.M. Chandler & D.J. Field (2006). Estimates of the information content and dimensionality of natural scenes from proximity distributions. Journal of the Optical Society of America. A, Optics, Image Science, and Vision 24, 922–941.

L. Fei-Fei , A. Iyer , C. Koch & P. Perona (2007). What do we perceive in a glance of a real-world scene? Journal of Vision 7(1), 1–29.

A. Friedman (1979). Framing pictures: The role of knowledge in automatized encoding and memory of gist. Journal of Experimental Psychology: General 108, 316–355.

V. Goffaux , C. Jacques , A. Mouraux , A. Oliva , B. Rossion & P.G. Schyns (2005). Diagnostic colors contribute to early stages of scene categorization: Behavioral and neurophysiological evidences. Visual Cognition 12, 878–892.

M.R. Greene & A. Oliva (2009). Recognition of natural scenes from global properties: Seeing the forest without representing the trees. Cognitive Psychology 58(2), 137–179.

L.D. Harmon & B. Julesz (1973). Masking in visual recognition: Effects of two-dimensional filtered noise. Science 180, 1194–1197.

O. Joubert , G. Rousselet , D. Fize & M. Fabre-Thorpe (2007). Processing scene context: Fast categorization and object interference. Vision Research 47, 3286–3297.

S.A. Klein (2001). Measuring, estimating, and understanding the psychometric function: A commentary. Perception & Psychophysics 63, 1421–1455.

A.B. Lee , K.S. Pedersen & D. Mumford (2003). The nonlinear statistics of high-contrast patches in natural images. International Journal of Computer Vision 54(1–3), 83–103.

A. Oliva & P.G. Schyns (1997). Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology 34, 72–107.

A Oliva . (2005). Gist of the scene. In The Encyclopedia of Neurobiology of Attention, ed. L. Itti , G. Rees & J.K. Tsotsos , pp. 251–256. San Diego, CA: Elsevier.

A. Oliva & P.G. Schyns (2000). Diagnostic colors mediate scene recognition. Cognitive Psychology 41, 176–210.

A. Oliva & A. Torralba (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175.

A. Oliva & A. Torralba (2007). The role of context in object recognition. Trends in Cognitive Sciences 11, 520–527.

B.A. Olshausen & D.J. Field (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609.

M.C. Potter (1975). Meaning in visual scenes. Science 187, 965–966.

L.W. Renninger & J. Malik (2004). When is scene recognition just texture recognition? Vision Research 44, 2301–2311.

G.A. Rousselet , O.R. Joubert & M. Fabre-Thorpe (2005). How long to get to the “gist” of real-world natural scenes? Visual Cognition 12, 852–877.

B. Russell , A. Torralba , K. Murphy & W.T. Freeman (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision 77(3), 157–173.

P.G. Schyns & A. Oliva (1994). From blobs to boundary edges: Evidence for time- and spatial-scale-dependent scene recognition. Psychological Science 5, 195–200.

T. Serre , A. Oliva & T.A. Poggio (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences 104, 6424–6429.

P. Sinha , B.J. Balas , Y. Ostrovsky & R. Russell (2006). Face recognition by humans: 19 results all computer vision researchers should know about. Proceedings of the IEEE 94 (No. 11), 1948–1962.

S. Thorpe , D. Fize & C. Marlot (1996). Speed of processing in the human visual system. Nature 381, 520–522.

A. Torralba , R. Fergus & W.T Freeman . (2008). 80 million tiny images: A large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1958–1970.

A. Torralba , A. Oliva , M. Castelhano & J.M. Henderson (2006). Contextual guidance of attention in natural scenes: The role of global features on object search. Psychological Review 113, 766–786.

R. VanRullen & S.J. Thorpe (2001 b). The time course of visual processing: From early perception to decision making. Journal of Cognitive Neuroscience 13, 454–461.

J.M. Wolfe (1998). Visual memory: What do you know about what you saw? Current Biology 8, R303–R304.

P.G. Schyns & A. Oliva (1997). Flexible, diagnostically-driven, rather than fixed, perceptually determined scale selection in scene and face recognition. Perception 26, 1027–1038.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Visual Neuroscience
  • ISSN: 0952-5238
  • EISSN: 1469-8714
  • URL: /core/journals/visual-neuroscience
Please enter your name
Please enter a valid email address
Who would you like to send this to? *