Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-04-21T12:08:20.044Z Has data issue: false hasContentIssue false

Understanding visual scenes

Published online by Cambridge University Press:  28 March 2018

CARINA SILBERER
Affiliation:
DTCL, Universitat Pompeu Fabra, Roc Boronat 138, 08018 Barcelona, Spain e-mail: CarinaSilberer@gmail.com
JASPER UIJLINGS
Affiliation:
School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK e-mail: jrr.uijlings@gmail.com
MIRELLA LAPATA
Affiliation:
ILCC, School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK e-mail: mlap@inf.ed.ac.uk

Abstract

A growing body of recent work focuses on the challenging problem of scene understanding using a variety of cross-modal methods which fuse techniques from image and text processing. In this paper, we develop representations for the semantics of scenes by explicitly encoding the objects detected in them and their spatial relations. We represent image content via two well-known types of tree representations, namely constituents and dependencies. Our representations are created deterministically, can be applied to any image dataset irrespective of the task at hand, and are amenable to standard NLP tools developed for tree-based structures. We show that we can apply syntax-based SMT and tree kernel methods in order to build models for image description generation and image-based retrieval. Experimental results on real-world images demonstrate the effectiveness of the framework.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable