Skip to main content
    • Aa
    • Aa

How many pixels make an image?


The human visual system is remarkably tolerant to degradation in image resolution: human performance in scene categorization remains high no matter whether low-resolution images or multimegapixel images are used. This observation raises the question of how many pixels are required to form a meaningful representation of an image and identify the objects it contains. In this article, we show that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes. Most strikingly, this low resolution permits observers to report, with 80% accuracy, four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation. The robustness of the information available at very low resolution for describing semantic content of natural images could be an important asset to explain the speed and efficiently at which the human brain comprehends the gist of visual scenes.

Corresponding author
*Address correspondence and reprint requests to: Antonio Torralba, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32-D432, 32 Vassar Street, Cambridge, MA 02139. E-mail:
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

T. Bachmann (1991). Identification of spatially quantized tachistoscopic images of faces: How many pixels does it take to carry identity? European Journal of Cognitive Psychology 3, 85–103.

M. Bar (2004). Visual objects in context. Nature Neuroscience Reviews 5, 617–629.

M. Bar (2007). The proactive brain: Using analogies and associations to generate predictions. Trends in Cognitive Sciences 11, 280–289.

D.M. Chandler & D.J. Field (2006). Estimates of the information content and dimensionality of natural scenes from proximity distributions. Journal of the Optical Society of America. A, Optics, Image Science, and Vision 24, 922–941.

L. Fei-Fei , A. Iyer , C. Koch & P. Perona (2007). What do we perceive in a glance of a real-world scene? Journal of Vision 7(1), 1–29.

A. Friedman (1979). Framing pictures: The role of knowledge in automatized encoding and memory of gist. Journal of Experimental Psychology: General 108, 316–355.

V. Goffaux , C. Jacques , A. Mouraux , A. Oliva , B. Rossion & P.G. Schyns (2005). Diagnostic colors contribute to early stages of scene categorization: Behavioral and neurophysiological evidences. Visual Cognition 12, 878–892.

M.R. Greene & A. Oliva (2009). Recognition of natural scenes from global properties: Seeing the forest without representing the trees. Cognitive Psychology 58(2), 137–179.

L.D. Harmon & B. Julesz (1973). Masking in visual recognition: Effects of two-dimensional filtered noise. Science 180, 1194–1197.

O. Joubert , G. Rousselet , D. Fize & M. Fabre-Thorpe (2007). Processing scene context: Fast categorization and object interference. Vision Research 47, 3286–3297.

S.A. Klein (2001). Measuring, estimating, and understanding the psychometric function: A commentary. Perception & Psychophysics 63, 1421–1455.

A.B. Lee , K.S. Pedersen & D. Mumford (2003). The nonlinear statistics of high-contrast patches in natural images. International Journal of Computer Vision 54(1–3), 83–103.

A. Oliva & P.G. Schyns (1997). Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology 34, 72–107.

A Oliva . (2005). Gist of the scene. In The Encyclopedia of Neurobiology of Attention, ed. L. Itti , G. Rees & J.K. Tsotsos , pp. 251–256. San Diego, CA: Elsevier.

A. Oliva & P.G. Schyns (2000). Diagnostic colors mediate scene recognition. Cognitive Psychology 41, 176–210.

A. Oliva & A. Torralba (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175.

A. Oliva & A. Torralba (2007). The role of context in object recognition. Trends in Cognitive Sciences 11, 520–527.

B.A. Olshausen & D.J. Field (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609.

M.C. Potter (1975). Meaning in visual scenes. Science 187, 965–966.

L.W. Renninger & J. Malik (2004). When is scene recognition just texture recognition? Vision Research 44, 2301–2311.

G.A. Rousselet , O.R. Joubert & M. Fabre-Thorpe (2005). How long to get to the “gist” of real-world natural scenes? Visual Cognition 12, 852–877.

B. Russell , A. Torralba , K. Murphy & W.T. Freeman (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision 77(3), 157–173.

P.G. Schyns & A. Oliva (1994). From blobs to boundary edges: Evidence for time- and spatial-scale-dependent scene recognition. Psychological Science 5, 195–200.

T. Serre , A. Oliva & T.A. Poggio (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences 104, 6424–6429.

P. Sinha , B.J. Balas , Y. Ostrovsky & R. Russell (2006). Face recognition by humans: 19 results all computer vision researchers should know about. Proceedings of the IEEE 94 (No. 11), 1948–1962.

S. Thorpe , D. Fize & C. Marlot (1996). Speed of processing in the human visual system. Nature 381, 520–522.

A. Torralba , R. Fergus & W.T Freeman . (2008). 80 million tiny images: A large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1958–1970.

A. Torralba , A. Oliva , M. Castelhano & J.M. Henderson (2006). Contextual guidance of attention in natural scenes: The role of global features on object search. Psychological Review 113, 766–786.

R. VanRullen & S.J. Thorpe (2001 b). The time course of visual processing: From early perception to decision making. Journal of Cognitive Neuroscience 13, 454–461.

J.M. Wolfe (1998). Visual memory: What do you know about what you saw? Current Biology 8, R303–R304.

P.G. Schyns & A. Oliva (1997). Flexible, diagnostically-driven, rather than fixed, perceptually determined scale selection in scene and face recognition. Perception 26, 1027–1038.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Visual Neuroscience
  • ISSN: 0952-5238
  • EISSN: 1469-8714
  • URL: /core/journals/visual-neuroscience
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Altmetric attention score

Full text views

Total number of HTML views: 6
Total number of PDF views: 25 *
Loading metrics...

Abstract views

Total abstract views: 232 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 25th March 2017. This data will be updated every 24 hours.