Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Binary Regression: The Logit Model
- 3 Generalized Linear Models
- 4 Modeling of Binary Data
- 5 Alternative Binary Regression Models
- 6 Regularization and Variable Selection for Parametric Models
- 7 Regression Analysis of Count Data
- 8 Multinomial Response Models
- 9 Ordinal Response Models
- 10 Semi- and Non-Parametric Generalized Regression
- 11 Tree-Based Methods
- 12 The Analysis of Contingency Tables: Log-Linear and Graphical Models
- 13 Multivariate Response Models
- 14 Random Effects Models and Finite Mixtures
- 15 Prediction and Classification
- A Distributions
- B Some Basic Tools
- C Constrained Estimation
- D Kullback-Leibler Distance and Information-Based Criteria of Model Fit
- E Numerical Integration and Tools for Random Effects Modeling
- List of Examples
- Bibliography
- Author Index
- Subject Index
11 - Tree-Based Methods
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Binary Regression: The Logit Model
- 3 Generalized Linear Models
- 4 Modeling of Binary Data
- 5 Alternative Binary Regression Models
- 6 Regularization and Variable Selection for Parametric Models
- 7 Regression Analysis of Count Data
- 8 Multinomial Response Models
- 9 Ordinal Response Models
- 10 Semi- and Non-Parametric Generalized Regression
- 11 Tree-Based Methods
- 12 The Analysis of Contingency Tables: Log-Linear and Graphical Models
- 13 Multivariate Response Models
- 14 Random Effects Models and Finite Mixtures
- 15 Prediction and Classification
- A Distributions
- B Some Basic Tools
- C Constrained Estimation
- D Kullback-Leibler Distance and Information-Based Criteria of Model Fit
- E Numerical Integration and Tools for Random Effects Modeling
- List of Examples
- Bibliography
- Author Index
- Subject Index
Summary
Tree-based models provide an alternative to additive and smooth models for regression problems. The method has its roots in automatic interaction detection (AID), proposed by Morgan and Sonquist (1963). The most popular modern version is due to Breiman et al. (1984) and is known by the name classification and regression trees, often abbreviated as CART, which is also the name of a program package. The method is conceptually very simple. By binary recursive partitioning the feature space is partitioned into a set of rectangles, and on each rectangle a simple model (for example, a constant) is fitted.
The approach is different from that given by fitting parametric models like the logit model, where linear combinations of predictors are at the core of the method. Rather than getting parameters, one obtains a binary tree that visualizes the partitioning of the feature space. If only one predictor is used, the one-dimensional feature space is partitioned, and the resulting estimate is a step function that may be considered a non-parametric (but rough) estimate of the regression function.
Regression and Classification Trees
In the following the basic concepts of classification and regression trees (CARTs) are given. The term “regression” in CARTs refers to metrically scaled outcomes, while “classification” refers to the prediction of underlying classes. By considering the underlying classes as the outcomes of a categorical response variable, classification can be treated within the general framework of regression.
- Type
- Chapter
- Information
- Regression for Categorical Data , pp. 317 - 330Publisher: Cambridge University PressPrint publication year: 2011