Search results for Statistics and Probability

4 - Evaluation of Predictive Models
from Part I - Framework for Multivariate Biomarker Discovery
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 44-75
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 4 provides a detailed coverage of methods for the evaluation of predictive models: the methods applicable to regression models implementing estimation biomarkers, as well as methods evaluating binary and multiclass classification models. Discussion of resampling techniques is accompanied by accentuating the danger of information leakage and by emphasizing the paramount importance of avoiding internal validation. Discussion of metrics for the evaluation of classification biomarkers includes the issue of proper and improper interpretation of sensitivity and specificity, illustrated by an example of a screening biomarker targeting a population with low prevalence of the tested disease. For such biomarkers, positive predictive value may be unacceptably low even when the biomarker has a very high specificity and sensitivity. Discussed in this chapter are also misclassification costs and incorporating them into cost-sensitive classification.

Acknowledgments
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp xvii-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp 273-276
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Copyright page
Darius M. Dziuda, Central Connecticut State University
Book:

Multivariate Biomarker Discovery

Published online:

30 May 2024

Print publication:

06 June 2024, pp iv-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

On the probability of a Pareto record
Part of
- Geometric probability and stochastic geometry
- Stochastic processes
James Allen Fill, Ao Sun
Journal:

Probability in the Engineering and Informational Sciences / Volume 38 / Issue 4 / October 2024

Published online by Cambridge University Press:

04 June 2024, pp. 752-764
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Given a sequence of independent random vectors taking values in ${\mathbb R}^d$ and having common continuous distribution function F, say that the $n^{\rm \scriptsize}$th observation sets a (Pareto) record if it is not dominated (in every coordinate) by any preceding observation. Let $p_n(F) \equiv p_{n, d}(F)$ denote the probability that the $n^{\rm \scriptsize}$th observation sets a record. There are many interesting questions to address concerning pn and multivariate records more generally, but this short paper focuses on how pn varies with F, particularly if, under F, the coordinates exhibit negative dependence or positive dependence (rather than independence, a more-studied case). We introduce new notions of negative and positive dependence ideally suited for such a study, called negative record-setting probability dependence (NRPD) and positive record-setting probability dependence (PRPD), relate these notions to existing notions of dependence, and for fixed $d \geq 2$ and $n \geq 1$ prove that the image of the mapping pn on the domain of NRPD (respectively, PRPD) distributions is $[p^*_n, 1]$ (resp., $[n^{-1}, p^*_n]$), where $p^*_n$ is the record-setting probability for any continuous F governing independent coordinates.

Advancing digital healthcare engineering for aging ships and offshore structures: an in-depth review and feasibility analysis
Part of
- Translational Data-Centric Engineering in the Marine Sector
Abdulaziz Sindi, Hyeong Jin Kim, Young Jun Yang, Giles Thomas, Jeom Kee Paik
Journal:

Data-Centric Engineering / Volume 5 / 2024

Published online by Cambridge University Press:

03 June 2024, e18
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Aging ships and offshore structures face harsh environmental and operational conditions in remote areas, leading to age-related damages such as corrosion wastage, fatigue cracking, and mechanical denting. These deteriorations, if left unattended, can escalate into catastrophic failures, causing casualties, property damage, and marine pollution. Hence, ensuring the safety and integrity of aging ships and offshore structures is paramount and achievable through innovative healthcare schemes. One such paradigm, digital healthcare engineering (DHE), initially introduced by the final coauthor, aims at providing lifetime healthcare for engineered structures, infrastructure, and individuals (e.g., seafarers) by harnessing advancements in digitalization and communication technologies. The DHE framework comprises five interconnected modules: on-site health parameter monitoring, data transmission to analytics centers, data analytics, simulation and visualization via digital twins, artificial intelligence-driven diagnosis and remedial planning using machine and deep learning, and predictive health condition analysis for future maintenance. This article surveys recent technological advancements pertinent to each DHE module, with a focus on its application to aging ships and offshore structures. The primary objectives include identifying cost-effective and accurate techniques to establish a DHE system for lifetime healthcare of aging ships and offshore structures—a project currently in progress by the authors.

Persistence of spectral projections for stochastic operators on large tensor products
Part of
- Markov processes
- Stochastic processes
Robert S. Mackay
Journal:

Journal of Applied Probability / Volume 62 / Issue 1 / March 2025

Published online by Cambridge University Press:

03 June 2024, pp. 1-14

Print publication:

March 2025
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
It is proved that for families of stochastic operators on a countable tensor product, depending smoothly on parameters, any spectral projection persists smoothly, where smoothness is defined using norms based on ideas of Dobrushin. A rigorous perturbation theory for families of stochastic operators with spectral gap is thereby created. It is illustrated by deriving an effective slow two-state dynamics for a three-state probabilistic cellular automaton.

Risk of central line-associated bloodstream infections during COVID-19 pandemic in intensive care patients in a tertiary care centre in Saudi Arabia
Majid M. Alshamrani, Aiman El-Saed, Omar Aldayhani, Abdulaziz Alhassan, Abdullah Alhamoudi, Mohammed Alsultan, Mohammed Alrasheed, Fatmah Othman
Journal:

Epidemiology & Infection / Volume 152 / 2024

Published online by Cambridge University Press:

03 June 2024, e95
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This retrospective study compared central line-associated bloodstream infection (CLABSI) rates per 1 000 central line days, and overall mortality before and during the COVID-19 pandemic in adult, paediatric, and neonatal ICU patients at King Abdul-Aziz Medical City-Riyadh who had a central line and were diagnosed with CLABSI according to the National Healthcare Safety Network standard definition. The study spanned between January 2018 and December 2019 (pre-pandemic), and January 2020 and December 2021 (pandemic). SARS-CoV-2 was confirmed by positive RT-PCR testing. The study included 156 CLABSI events and 46 406 central line days; 52 and 22 447 (respectively) in pre-pandemic, and 104 and 23 959 (respectively) during the pandemic. CLABSI rates increased by 2.02 per 1 000 central line days during the pandemic period (from 2.32 to 4.34, p < 0.001). Likewise, overall mortality rates increased by 0.86 per 1 000 patient days (from 0.93 to 1.79, p = 0.003). Both CLABSI rates (6.18 vs. 3.7, p = 0.006) and overall mortality (2.72 vs. 1.47, p = 0.014) were higher among COVID-19 patients compared to non-COVID-19 patients. The pandemic was associated with a substantial increase in CLABSI-associated morbidity and mortality.

Optimal experiment design with adjoint-accelerated Bayesian inference
Matthew Yoko, Matthew P. Juniper
Journal:

Data-Centric Engineering / Volume 5 / 2024

Published online by Cambridge University Press:

31 May 2024, e17
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We develop and demonstrate a computationally cheap framework to identify optimal experiments for Bayesian inference of physics-based models. We develop the metrics (i) to identify optimal experiments to infer the unknown parameters of a physics-based model, (ii) to identify optimal sensor placements for parameter inference, and (iii) to identify optimal experiments to perform Bayesian model selection. We demonstrate the framework on thermoacoustic instability, which is an industrially relevant problem in aerospace propulsion, where experiments can be prohibitively expensive. By using an existing densely sampled dataset, we identify the most informative experiments and use them to train the physics-based model. The remaining data are used for validation. We show that, although approximate, the proposed framework can significantly reduce the number of experiments required to perform the three inference tasks we have studied. For example, we show that for task (i), we can achieve an acceptable model fit using just 2.5% of the data that were originally collected.

IDENTIFICATION AND STATISTICAL DECISION THEORY
Charles F. Manski
Journal:

Econometric Theory / Volume 41 / Issue 4 / August 2025

Published online by Cambridge University Press:

31 May 2024, pp. 977-993
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Econometricians have usefully separated study of estimation into identification and statistical components. Identification analysis, which assumes knowledge of the probability distribution generating observable data, places an upper bound on what may be learned about population parameters of interest with finite-sample data. Yet Wald’s statistical decision theory studies decision-making with sample data without reference to identification, indeed without reference to estimation. This paper asks if identification analysis is useful to statistical decision theory. The answer is positive, as it can yield an informative and tractable upper bound on the achievable finite-sample performance of decision criteria. The reasoning is simple when the decision-relevant parameter (true state of nature) is point-identified. It is more delicate when the true state is partially identified and a decision must be made under ambiguity. Then the performance of some criteria, such as minimax regret, is enhanced by randomizing choice of an action in a controlled manner. I find it useful to recast choice of a statistical decision function as selection of choice probabilities for the elements of the choice set.

SPURIOUS FACTORS IN DATA WITH LOCAL-TO-UNIT ROOTS
Alexei Onatski, Chen Wang
Journal:

Econometric Theory / Volume 41 / Issue 4 / August 2025

Published online by Cambridge University Press:

31 May 2024, pp. 779-816
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper extends the spurious factor analysis of Onatski and Wang (2021, Spurious factor analysis. Econometrica, 89(2), 591–614.) to high-dimensional data with heterogeneous local-to-unit roots. We find a spurious factor phenomenon similar to that observed in the data with unit roots. Namely, the “factors” estimated by the principal components analysis converge to principal eigenfunctions of a weighted average of the covariance kernels of the demeaned Ornstein–Uhlenbeck processes with different decay rates. Thus, such “factors” reflect the structure of the strong temporal correlation of the data and do not correspond to any cross-sectional commonalities, that genuine factors are usually associated with. Furthermore, the principal eigenvalues of the sample covariance matrix are very large relative to the other eigenvalues, creating an illusion of the “factors”capturing much of the data’s common variation. We conjecture that the spurious factor phenomenon holds, more generally, for data obtained from high frequency sampling of heterogeneous continuous time (or spacial) processes, and provide an illustration.

Optimizing insurance risk assessment: a regression model based on a risk-loaded approach
Zinoviy Landsman, Tomer Shushi
Journal:

Annals of Actuarial Science / Volume 19 / Issue 1 / March 2025

Published online by Cambridge University Press:

31 May 2024, pp. 82-95
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Risk measurement and econometrics are the two pillars of actuarial science. Unlike econometrics, risk measurement allows taking into account decision-makers’ risk aversion when analyzing the risks. We propose a hybrid model that captures decision-makers’ regression-based approach to study risks, focusing on explanatory variables while paying attention to risk severity. Our model considers different loss functions that quantify the severity of the losses that are provided by the risk manager or the actuary. We present an explicit formula for the regression estimators for the proposed risk-based regression problem and study the proposed results. Finally, we provide a numerical study of the results using data from the insurance industry.

APPLICATIONS OF FUNCTIONAL DEPENDENCE TO SPATIAL ECONOMETRICS
Zeqi Wu, Wen Jiang, Xingbai Xu
Journal:

Econometric Theory , First View

Published online by Cambridge University Press:

31 May 2024, pp. 1-36
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, we generalize the concept of functional dependence (FD) from time series (see Wu [2005, Proceedings of the National Academy of Sciences 102, 14150–14154]) and stationary random fields (see El Machkouri, Volný, and Wu [2013, Stochastic Processes and Their Applications 123, 1–14]) to nonstationary spatial processes. Within conventional settings in spatial econometrics, we define the concept of spatial FD measure and establish a moment inequality, an exponential inequality, a Nagaev-type inequality, a law of large numbers, and a central limit theorem. We show that the dependent variables generated by some common spatial econometric models, including spatial autoregressive (SAR) models, threshold SAR models, and spatial panel data models, are functionally dependent under regular conditions. Furthermore, we investigate the properties of FD measures under various transformations, which are useful in applications. Moreover, we compare spatial FD with the spatial mixing and spatial near-epoch dependence proposed in Jenish and Prucha ([2009, Journal of Econometrics 150, 86–98], [2012, Journal of Econometrics 170, 178–190]), and we illustrate its advantages.

Multivariate Biomarker Discovery

Data Science Methods for Efficient Analysis of High-Dimensional Biomedical Data
Darius M. Dziuda
Published online:

30 May 2024

Print publication:

06 June 2024
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Multivariate biomarker discovery is increasingly important in the realm of biomedical research, and is poised to become a crucial facet of personalized medicine. This will prompt the demand for a myriad of novel biomarkers representing distinct 'omic' biosignatures, allowing selection and tailoring treatments to the various individual characteristics of a particular patient. This concise and self-contained book covers all aspects of predictive modeling for biomarker discovery based on high-dimensional data, as well as modern data science methods for identification of parsimonious and robust multivariate biomarkers for medical diagnosis, prognosis, and personalized medicine. It provides a detailed description of state-of-the-art methods for parallel multivariate feature selection and supervised learning algorithms for regression and classification, as well as methods for proper validation of multivariate biomarkers and predictive models implementing them. This is an invaluable resource for scientists and students interested in bioinformatics, data science, and related areas.

Preface
Edited by R. A. Bailey, University of St Andrews, Scotland, Peter J. Cameron, University of St Andrews, Scotland, Yaokun Wu, Shanghai Jiao Tong University, China
Book:

Groups and Graphs, Designs and Dynamics

Published online:

11 May 2024

Print publication:

30 May 2024, pp xiii-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Learning from Data, and Tools for the Task
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 1-87
- Chapter
- - You have access
- PDF
- Export citation
Summary

We begin by illustrating the interplay between questions of scientific interest and the use of data in seeking answers. Graphs provide a window through which meaning can often be extracted from data. Numeric summary statistics and probability distributions provide a form of quantitative scaffolding for models of random as well as nonrandom variation. Simple regression models foreshadow the issues that arise in the more complex models considered later in the book. Frequentist and Bayesian approaches to statistical inference are contrasted, the latter primarily using the Bayes Factor to complement the limited perspective that p-values offer. Akaike Information Criterion (AIC) and related "information" statistics provide a further perspective. Resampling methods, where the one available dataset is used to provide an empirical substitute for a theoretical distribution, are introduced. Remaining topics are of a more general nature. RStudio is one of several tools that can help in organizing and managing work. The checks provided by independent replication at another time and place are an indispensable complement to statistical analysis. Questions of data quality, of relevance to the questions asked, of the processes that generated the data, and of generalization, remain just as important for machine learning and other new analysis approaches as for more classical methods.

3 - Multiple Linear Regression
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 144-207
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Multiple linear regression generalizes straight line regression to allow multiple explanatory (or predictor) variables, in this chapter under the normal errors assumption. The focus may be on accurate prediction. Or it may, alternatively or additionally, be on the regression coefficients themselves. Simplistic interpretations of coefficients can be grossly misleading. Later chapters elaborate on the ideas and methods developed in this chapter, applying them in new contexts. The attaching of causal interpretations to model coefficients must be justified both by reference to subject area knowledge and by careful checks to ensure that they are not artefacts of the correlation structure. There is attention to regression diagnostics, to assessment, and comparison of models. Variable selection strategies can readily over-fit. Hence the importance of training/test approaches and cross-validation. The potential is demonstrated for errors in x to seriously bias regression coefficients. Strong multicollinearity leads to large variance inflation factors.

9 - Multivariate Data Exploration and Discrimination
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp 400-468
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter moves from regression to methods that focus on the pattern presented by multiple variables, albeit with applications in regression analysis. A strong focus is to find patterns that beg further investigation, and/or replace many variables by a much smaller number that capture important structure in the data. Methodologies discussed include principal components analysis and multidimensional scaling more generally, cluster analysis (the exploratory process that groups “alike” observations) and dendogram construction, and discriminant analysis. Two sections discuss issues for the analysis of data, such as from high throughput genomics, where the aim is to determine, from perhaps thousands or tens of thousands of variables, which are shifted in value between groups in the data. A treatment of the role of balance and matching in making inferences from observational data then follows. The chapter ends with a brief introduction to methods for multiple imputation, which aims to use multivariate relationships to fill in missing values in observations that are incomplete, allowing them to have at least some role in a regression or other further analysis.

Contents
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp vii-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Figures
John H. Maindonald, Statistics Research Associates, Wellington, New Zealand, W. John Braun, University of British Columbia, Okanagan, Jeffrey L. Andrews, University of British Columbia, Okanagan
Book:

A Practical Guide to Data Analysis Using R

Published online:

11 May 2024

Print publication:

30 May 2024, pp xi-xvi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Statistics and Probability

Refine search

Refine search

Actions for selected content:

52343 results in Statistics and Probability

4 - Evaluation of Predictive Models

Summary

Acknowledgments

Index

Copyright page

On the probability of a Pareto record

Advancing digital healthcare engineering for aging ships and offshore structures: an in-depth review and feasibility analysis

Persistence of spectral projections for stochastic operators on large tensor products

Risk of central line-associated bloodstream infections during COVID-19 pandemic in intensive care patients in a tertiary care centre in Saudi Arabia

Optimal experiment design with adjoint-accelerated Bayesian inference

IDENTIFICATION AND STATISTICAL DECISION THEORY

SPURIOUS FACTORS IN DATA WITH LOCAL-TO-UNIT ROOTS

Optimizing insurance risk assessment: a regression model based on a risk-loaded approach

APPLICATIONS OF FUNCTIONAL DEPENDENCE TO SPATIAL ECONOMETRICS

Multivariate Biomarker Discovery

Preface

1 - Learning from Data, and Tools for the Task

Summary

3 - Multiple Linear Regression

Summary

9 - Multivariate Data Exploration and Discrimination

Summary

Contents

Figures

Statistics and Probability

Refine search

Refine search

Actions for selected content:

Save Search

52343 results in Statistics and Probability

Summary

Multivariate Biomarker Discovery

Summary

Summary

Summary