Analyzing Linguistic Data: A Practical Introduction to Statistics using R

R. H. Baayen

doi:10.1017/CBO9780511801686

Chapter 5: Clustering and classification

pp. 118-164

R. H. Baayen

, University of Alberta

Get access

Add bookmark
Cite
Share

Summary

The previous chapter introduced various techniques for analyzing data with one or two vectors. The remaining chapters of this book discuss various ways of dealing with data sets with more than two vectors. Data sets with many vectors are typically brought together in matrices. These matrices list the observations on the rows, with the vectors (column variables) specifying the different properties of the observations. Data sets like this are referred to as multivariate data.

There are two approaches for discovering the structure in multivariate data sets that we discuss in this chapter. In one approach, we seek to find structure in the data in terms of groupings of observations. These techniques are unsupervised in the sense that we do not prescribe what groupings should be there. We discuss these techniques under the heading of clustering. In the other approach, we know what groups there are in theory, and the question is whether the data support these groups. This second group of techniques can be described as supervised, because the techniques work with a grouping that is imposed by the analyst on the data. We will refer to these techniques as methods for classication.

Clustering

Tables with measurements: principal components analysis

Words such as goodness and sharpness can be analyzed as consisting of a stem, good, sharp, and an affix, the suffix -ness. Some affixes are used in many words, -ness is an example.

About the book

Chapter DOI https://doi.org/10.1017/CBO9780511801686.006
Book DOI https://doi.org/10.1017/CBO9780511801686
Subjects Grammar and Syntax,Language and Linguistics
Format: Paperback
- Publication date: 17 March 2008
- ISBN: 9780521709187
Format: Digital
- Publication date: 05 June 2012
- ISBN: 9780511801686
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$77.00

Paperback

US$77.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers

Analyzing Linguistic Data A Practical Introduction to Statistics using R