What Is This Book About?
This book is a practical introduction to statistical procedures in corpus linguistics, a discipline that uses computers to analyse language, organized according to linguistic topics. These range from vocabulary and grammar to sociolinguistics, discourse analysis and historical investigations of language. The book offers an overview of the state-of-the-art methodologies of language analysis using corpora and introduces new techniques that have not previously been used in the field. No prior knowledge of statistics is assumed; instead, all necessary concepts and methods are explained in non-technical language. In addition, all procedures described in the book can be easily carried out using Lancaster Stats Tools online (see ‘How Should You Use This Book?’ below). Throughout the book, many examples (case studies) of the application of corpus statistics are provided and standard reporting of statistics is shown. The emphasis of the book on the practical aspects of statistical analysis of language is also reflected in its focus on research design and the implications of different ‘shapes’ of data for statistical analysis – for this reason, the companion website offers complete datasets used in this book for easy replication of the analyses. Corpus linguistics is an extremely versatile methodology of language analysis applicable in a wide range of contexts, in linguistics, social science, digital humanities and elsewhere – the book thus aims to facilitate meaningful use of corpora for as wide a range of users as possible.
Who Is This Book For?
The book is intended for anyone interested in corpus linguistics and quantitative analysis of language. This includes students and researchers in the field of linguistics, sociology, history, psychology, education etc. The main goal of the book is to help readers understand key principles of statistical thinking in order to be able to make informed decisions about the applications of particular statistical techniques. To facilitate this, in addition to the expository parts, the book also includes discussion questions (‘Think about…’) and exercises which the readers can use to better engage with the material and to check their comprehension of the subject matter; answers to the exercises are provided at the companion website (http://corpora.lancs.ac.uk/stats/materials.php).
How Should You Use This Book?
The book reflects the needs of students and researchers who are looking for a practical guidebook on corpus statistics, which is grounded in the current literature in the field and reflects the best practice. The book can be used as a course book or for independent study. After reviewing general statistical principles in Chapter 1, readers can follow their own path through the book according to the linguistic topics of their interest. Statistical techniques introduced in the book are cross-referenced and included in the Index at the end of the book.
The book comes with a companion website – Lancaster Stats Tools online – (http://corpora.lancs.ac.uk/stats), which not only provides additional examples, datasets, video tutorials and PowerPoint slides but also, more importantly, includes easy-to-use tools for calculating statistics and producing graphs discussed in the book. In fact, all procedures described in the book can be performed by the reader using Lancaster Stats Tools online. Readers thus don’t need to rely on commercial statistical packages such as IBM SPSS which are not easily affordable for users without institutional subscriptions. Neither will readers be required to learn the complex syntax of free statistical packages such as R. Instead, Lancaster Stats Tools online offers access to powerful statistical tools through a simple user interface, into which the data can be directly copy-pasted from a spreadsheet (e.g. Excel or Calc). I believe that statistics shouldn’t be a hurdle in our research – computers can and should do all the hard work of number crunching for us; statistics, instead, can be used as a very effective analytical tool – all that is needed is to understand the basic principles of statistical thinking and their application to language analysis. Let’s explore them together!