Words and Pictures: Categories, Modifiers, Depiction, and Iconography

doi:10.1017/CBO9780511635465.010

9 - Words and Pictures: Categories, Modifiers, Depiction, and Iconography

Published online by Cambridge University Press: 20 May 2010

D.A. Forsyth ,

Tamara Berg ,

Cecilia Ovesdotter Alm ,

Nicolas Loeff and

Edited by

Bernt Schiele and

Sven J. Dickinson: Affiliation:
University of Toronto
Aleš Leonardis: Affiliation:
University of Ljubljana
Bernt Schiele: Affiliation:
Technische Universität, Darmstadt, Germany
Michael J. Tarr: Affiliation:
Carnegie Mellon University, Pennsylvania

Book contents

Get access

Summary

Introduction

Collections of digital pictures are now very common. Collections can range from a small set of family pictures, to the entire contents of a picture site like Flickr. Such collections differ from what one might see if one simply attached a camera to a robot and recorded everything, because the pictures have been selected by people. They are not necessarily “good” pictures (say, by standards of photographic aesthetics), but, because they have been chosen, they display quite strong trends. It is common for such pictures to have associated text, which might be keywords or tags but is often in the form of sentences or brief paragraphs. Text could be a caption (a set of remarks explicitly bound to the picture, and often typeset in a way that emphasizes this), region labels (terms associated with image regions, perhaps identifying what is in that region), annotations (terms associated with the whole picture, often identifying objects in the picture), or just nearby text. We review a series of ideas about how to exploit associated text to help interpret pictures.

Word Frequencies, Objects, and Scenes

Most pictures in electronic form seem to have related words nearby (or sound or metadata, and so on; we focus on words), so it is easy to collect word and picture datasets, and there are many examples. Such multimode collections should probably be seen as the usual case, because one usually has to deliberately ignore information to collect only images.

Information

Type: Chapter
Information: Object Categorization
Computer and Human Vision Perspectives
, pp. 167 - 181

DOI: https://doi.org/10.1017/CBO9780511635465.010 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.