Search results for Computational statistics, machine learning and information science

Frontmatter
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - Multilevel Models, and Repeated Measures
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 318-372
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the models discussed here, there is a hierarchy of variation that corresponds to groupings within the data. For example, students may be sampled from different classes, that in turn are sampled from different schools. Or, rather than being nested, groups may be crossed. Important notions are those of fixed and random effects, and variance components. Analysis of data from designs that have the balance needed to allow an analysis of variance breakdown are a special case. Further types of mixed models are generalized linear mixed models and repeated measures models. Repeated measures models are multilevel models where measurements consist of multiple profiles in time or space, resulting in time or spatial dependence. Relative to the length of time series that is required for a realistic analysis, each individual repeated measures profile can and often will have values for a few time points only.

4 - Exploiting the Linear Model Framework
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 208-244
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter explores ways to set up a model matrix so that linear combinations of the columns can fit curves and multidimensional surfaces. These extend to methods, within a generalized additive model framework, that use a penalization approach to constrain over-fitting. A further extension is to fitting quantiles of the data. The methodologies are important both for direct use for modeling data, and for checking for pattern in residuals from models that are in a more classical parametric style. The methodology is extended, in later chapters, to include smoothing terms in generalized linear models and models that allow for time series errors.

Index of R Functions
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 514-518
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Dedication
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp v-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Appendix A - The R System: a Brief Overview
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 469-494
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The notes in this appendix provide a brief and limited overview of R syntax, semantics, and the R package system, as background for working with the R code included in the text. It is intended for use alongside R help pages and the wealth of tutorial material that is available online.

6 - Time Series Models
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 292-317
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Common time series models allow for a correlation between observations that is likely to be largest for points that are close together in time. Adjustments can be made, also, for seasonal effects. Variation in a single spatial dimension may have characteristics akin to those of time series, and comparable models find application there. Autoregressive models, which make good intuitive sense and are simple to describe, are the starting point for discussion; then moving on to autoregressive moving average with possible differencing. The "forecast" package for R has mechanisms that allow automatic selection of model parameters. Exponential smoothing state space (exponential time series or ETS) models are an important alternative that have often proved effective in forecasting applications. ARCH and GARCH heteroskedasticity models are further classes that have been developed to handle the special characteristics of financial time series.

Index of Terms
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 519-526
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 495-507
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Generalizing from Models
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 88-143
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Inferences are never assumption free. Data summaries that do not account for all relevant effects readily mislead. Distributions for the Pearson correlation and for counts, and extensions accounting for handling extra-binomial and extra-Poisson variation are noted. Notions of statistical power are introduced. Resampling methods, the bootstrap, and permutation tests, extend available inferential approaches. Regression with a single explanatory variable is used as a context in which to introduce residual plots, outliers, influence, robust regression, and standard errors of predicted values. There are two regression lines – that of y on x and that of x on y. Power transformations, with the logarithmic transformation as a special case, are often effective in giving a linear relationship. The training/test approach, and the closely allied of cross-validation approach, can be important for avoiding over-fitting. Other topics include one- and two-way comparisons, adjustments when there are multiple comparisons, and the estimation of false discovery rates when there is severe multiplicity. Discussions of theories of inference, including likelihood, and Bayes Factor and other Bayesian perspectives, ends the chapter.

Preface
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp xvii-xxiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The strengths of this book include the directness of its encounter with research data, its advice on practical data analysis issues, careful critiques of analysis results, its use of modern data analysis tools and approaches, its use of simulation and other computer-intensive methods where these provide insight or give results that are not otherwise available, its attention to graphical and other presentation issues, its use of examples drawn from across the range of statistical applications, the links that it makes into the debate over reproducibility in science, and the inclusion of code that reproduces analyses. The methods that we cover have wide application. The datasets, many of which have featured in published papers, are drawn from many different fields. They reflect a journey in learning and understanding, alike for the authors and for those with whom they have worked, that has ranged widely over many different research areas. The R system has brought into a common framework a huge range of abilities for data analysis, data manipulation and graphics. Our text has as its aim helping its readers to take full advantage of those abilities.

5 - Generalized Linear Models, and Survival Analysis
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 245-291
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Generalized linear models extend classical linear models in two ways. They allow the fitting of a linear model to a dependent variable whose expected values have been transformed using a "link" function. They allow for a range of error families other than the normal. They are widely used to fit models to count data and to binomial-type data, including models with errors that may exhibit extra-binomial or extra-Poisson variation. The discussion extends to models in the generalized additive model framework, and to ordinal regression models. Survival analysis, also referred to as time-to-event analysis, is principally concerned with the time duration of a given condition, often but not necessarily sickness or death. In nonmedical contexts, it may be referred to as failure time or reliability analysis. Applications include the failure times of industrial machine components, electronic equipment, kitchen toasters, light bulbs, businesses, loan defaults, and more. There is an elegant methodology for dealing with "censoring" – where all that can be said is that the event of interest occured before or after a certain time, or in a specified interval.

8 - Tree-Based Classification and Regression
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 373-399
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Tree-based methods use methodologies that are radically different from those discussed in previous chapters. They are relatively easy to use and can be applied to a wide class of problems. As with many of the new machine learning methods, construction of a tree, or (in the random forest approach, trees) follows an algorithmic process. Single-tree methods occupy the first part this chapter. An important aspect of the methodology is the determining of error estimates. By building a large number of trees and using a voting process to make predictions, the random forests methodology that occupies the latter part of this chapter can often greatly improve on what can be achieved with a single tree. The methodology operates more as a black box, but with implementation details that are simpler to describe than for single- tree methods. In large sample classification problems, the methodology has often proved superior to other contenders.

References to R Packages
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 508-513
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

A Practical Guide to Data Analysis Using R

An Example-Based Approach
John H. Maindonald, W. John Braun, Jeffrey L. Andrews
Published online:

11 May 2024

Print publication:

30 May 2024
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Using diverse real-world examples, this text examines what models used for data analysis mean in a specific research context. What assumptions underlie analyses, and how can you check them? Building on the successful 'Data Analysis and Graphics Using R,' 3rd edition (Cambridge, 2010), it expands upon topics including cluster analysis, exponential time series, matching, seasonality, and resampling approaches. An extended look at p-values leads to an exploration of replicability issues and of contexts where numerous p-values exist, including gene expression.Developing practical intuition, this book assists scientists in the analysis of their own data, and familiarizes students in statistical theory with practical data analysis. The worked examples and accompanying commentary teach readers to recognize when a method works and, more importantly, when it doesn't. Each chapter contains copious exercises. Selected solutions, notes, slides, and R code are available online, with extensive references pointing to detailed guides to R.

7 - Robust Supervised Learning
Ilias Diakonikolas, University of Wisconsin-Madison, Daniel M. Kane, University of California, San Diego
Book:

Algorithmic High-Dimensional Robust Statistics

Published online:

24 August 2023

Print publication:

07 September 2023, pp 204-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Robust Estimation via Higher Moments
Ilias Diakonikolas, University of Wisconsin-Madison, Daniel M. Kane, University of California, San Diego
Book:

Algorithmic High-Dimensional Robust Statistics

Published online:

24 August 2023

Print publication:

07 September 2023, pp 166-203
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Efficient High-Dimensional Robust Mean Estimation
Ilias Diakonikolas, University of Wisconsin-Madison, Daniel M. Kane, University of California, San Diego
Book:

Algorithmic High-Dimensional Robust Statistics

Published online:

24 August 2023

Print publication:

07 September 2023, pp 29-60
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Ilias Diakonikolas, University of Wisconsin-Madison, Daniel M. Kane, University of California, San Diego
Book:

Algorithmic High-Dimensional Robust Statistics

Published online:

24 August 2023

Print publication:

07 September 2023, pp xi-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Notation
Ilias Diakonikolas, University of Wisconsin-Madison, Daniel M. Kane, University of California, San Diego
Book:

Algorithmic High-Dimensional Robust Statistics

Published online:

24 August 2023

Print publication:

07 September 2023, pp xv-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Computational statistics, machine learning and information science

Refine search

Refine search

Actions for selected content:

1031 results in Computational statistics, machine learning and information science

Frontmatter

7 - Multilevel Models, and Repeated Measures

Summary

4 - Exploiting the Linear Model Framework

Summary

Index of R Functions

Dedication

Appendix A - The R System: a Brief Overview

Summary

6 - Time Series Models

Summary

Index of Terms

References

2 - Generalizing from Models

Summary

Preface

Summary

5 - Generalized Linear Models, and Survival Analysis

Summary

8 - Tree-Based Classification and Regression

Summary

References to R Packages

A Practical Guide to Data Analysis Using R

7 - Robust Supervised Learning

6 - Robust Estimation via Higher Moments

2 - Efficient High-Dimensional Robust Mean Estimation

Preface

Notation

Computational statistics, machine learning and information science

Refine search

Refine search

Actions for selected content:

Save Search

1031 results in Computational statistics, machine learning and information science

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

A Practical Guide to Data Analysis Using R