Search

Probability matching and statistical naïveté
Megan Barlow, Tiffany Doan, Ori Friedman, Stephanie Denison
Journal:

Judgment and Decision Making / Volume 20 / 2025

Published online by Cambridge University Press:

26 September 2025, e37
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
People often probability match: they select choices based on the probability of outcomes. For example, when predicting 10 individual results of a spinner with 7 green and 3 purple sections, many people choose green mostly but not always, even though they would be better off always choosing it (i.e., maximizing). This behavior has perplexed cognitive scientists for decades. Why do people make such an obvious error? Here, we provide evidence that this difficulty may often arise from statistical naïveté: Even when shown the optimal strategy of maximizing, many people fail to recognize that it will produce better payouts than other strategies. In 3 preregistered experiments (N = 907 Americans tested online), participants made 10 choices in a spinner game and estimated the payout for each of 3 strategies: probability matching, maximizing, and 50/50 guessing. The key finding across experiments is that while most maximizers recognize that maximizing results in higher payouts than matching, probability matchers predict similar payouts for each .

1 - Introduction
Luis E. Nieto-Barajas, Instituto Tecnológico Autónomo de México (ITAM)
Book:

Dependence Models via Hierarchical Structures

Published online:

20 March 2025

Print publication:

27 March 2025, pp 1-22
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we start by reviewing the different types of inference procedures: frequentist, Bayesian, parametric and non-parametric. We introduce notation by providing a list of the probability distributions that will be used later on, together with their first two moments. We review some results on conditional moments and carry out several examples. We review definitions of stochastic processes, stationary processes and Markov processes, and finish by introducing the most common discrete-time stochastic processes that show dependence in time and space.

Decoupling Visualization and Testing when Presenting Confidence Intervals
David A. Armstrong II, William Poirier
Journal:

Political Analysis / Volume 33 / Issue 3 / July 2025

Published online by Cambridge University Press:

17 January 2025, pp. 252-257
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Confidence intervals are ubiquitous in the presentation of social science models, data, and effects. When several intervals are plotted together, one natural inclination is to ask whether the estimates represented by those intervals are significantly different from each other. Unfortunately, there is no general rule or procedure that would allow us to answer this question from the confidence intervals alone. It is well known that using the overlaps in 95% confidence intervals to perform significance tests at the 0.05 level does not work. Recent scholarship has developed and refined a set of tools for inferential confidence intervals that permit inference on confidence intervals with the appropriate type I error rate in many different bivariate contexts. These are all based on the same underlying idea of identifying the multiple of the standard error (i.e., a new confidence level) such that the overlap in confidence intervals matches the desired type I error rate. These procedures remain stymied by multiple simultaneous comparisons. We propose an entirely new procedure for developing inferential confidence intervals that decouples the testing and visualization that can overcome many of these problems in any visual testing scenario. We provide software in R and Stata to accomplish this goal.

Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test
Paolo Girardi, Anna Vesely, Daniël Lakens, Gianmarco Altoè, Massimiliano Pastore, Antonio Calcagnì, Livio Finos
Journal:

Psychometrika / Volume 89 / Issue 2 / June 2024

Published online by Cambridge University Press:

27 December 2024, pp. 542-568
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.

STATISTICAL ANALYSIS ON COMPLEXLY STRUCTURED DATA
PROSHA RAHMAN
Journal:

Bulletin of the Australian Mathematical Society / Volume 111 / Issue 1 / February 2025

Published online by Cambridge University Press:

25 November 2024, pp. 184-185

Print publication:

February 2025
- Article
- - You have access
- PDF
- HTML
- Export citation

Numerical Methods in Physics with Python

2nd edition
Alex Gezerlis
Published online:

30 August 2023

Print publication:

20 July 2023
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Bringing together idiomatic Python programming, foundational numerical methods, and physics applications, this is an ideal standalone textbook for courses on computational physics. All the frequently used numerical methods in physics are explained, including foundational techniques and hidden gems on topics such as linear algebra, differential equations, root-finding, interpolation, and integration. The second edition of this introductory book features several new codes and 140 new problems (many on physics applications), as well as new sections on the singular-value decomposition, derivative-free optimization, Bayesian linear regression, neural networks, and partial differential equations. The last section in each chapter is an in-depth project, tackling physics problems that cannot be solved without the use of a computer. Written primarily for students studying computational physics, this textbook brings the non-specialist quickly up to speed with Python before looking in detail at the numerical methods often used in the subject.

6 - Bivariate Regression and Correlation and Statistical Inference
Dudley L. Poston, Jr, Texas A&M University, Eugenia Conde, University of North Carolina, Chapel Hill, Layton M. Field, Mount St. Mary’s University
Book:

Applied Regression Models in the Social Sciences

Published online:

17 August 2023

Print publication:

17 August 2023, pp 88-124
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A major concern in the social sciences is understanding and explaining the relationship between two variables. We showed in Chapter 5 how to address this issue using tabular presentations. In this chapter we show how to address the issue statistically via regression and correlation. We first cover the two concepts of regression and correlation. We then turn to the issue of statistical inference and ways of evaluating the statistical significance of our results. Since most social science research is undertaken using sample data, we need to determine whether the regression and correlation coefficients we calculate using the sample data are statistically significant in the larger population from which the sample data were drawn.

Descriptive vs. Inferential Community Detection in Networks

Pitfalls, Myths and Half-Truths
Tiago P. Peixoto
Published online:

04 July 2023

Print publication:

31 August 2023
- Element
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods which are actually used in practice in a variety of fields. The Elements attempts to address this discrepancy by dividing existing methods according to whether they have a 'descriptive' or an 'inferential' goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate a precise generative model, and attempt to fit it to data. In this way, they are able to provide insights into formation mechanisms and separate structure from noise. This title is also available as open access on Cambridge Core.

4 - Statistical Inference
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 101-136
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

From observed data, statistical inference infers the properties of the underlying probability distribution. For hypothesis testing, the t-test and some non-parametric alternatives are covered. Ways to infer confidence intervals and estimate goodness of fit are followed by the F-test (for test of variances) and the Mann-Kendall trend test. Bootstrap sampling and field significance are also covered.

4 - Of Dice and Men*
Stephen Senn
Book:

Dicing with Death

Published online:

18 November 2022

Print publication:

08 December 2022, pp 74-96
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A link is made between epistemology – that is to say, the philosophy of knowledge – and statistics. Hume's criticism of induction is covered, as is Popper's. Various philosophies of statistics are described.

The Essential Role of Statistical Inference in Evaluating Electoral Systems: A Response to DeFord et al.
Jonathan N. Katz, Gary King, Elizabeth Rosenblatt
Journal:

Political Analysis / Volume 31 / Issue 3 / July 2023

Published online by Cambridge University Press:

02 December 2021, pp. 325-331
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Katz, King, and Rosenblatt (2020, American Political Science Review 114, 164–178) introduces a theoretical framework for understanding redistricting and electoral systems, built on basic statistical and social science principles of inference. DeFord et al. (2021, Political Analysis, this issue) instead focuses solely on descriptive measures, which lead to the problems identified in our article. In this article, we illustrate the essential role of these basic principles and then offer statistical, mathematical, and substantive corrections required to apply DeFord et al.’s calculations to social science questions of interest, while also showing how to easily resolve all claimed paradoxes and problems. We are grateful to the authors for their interest in our work and for this opportunity to clarify these principles and our theoretical framework.

8 - Graph-Theoretic Analysis of Power Grid Robustness
from Part II - Data-Driven Anomaly Detection
- By Dorcas Ofori-Boateng, Asim Kumer Dey, R. Gel Yulia, H. Vincent Poor
Edited by Ali Tajer, Rensselaer Polytechnic Institute, New York, Samir M. Perlaza, H. Vincent Poor, Princeton University, New Jersey
Book:

Advanced Data Analytics for Power Systems

Published online:

22 March 2021

Print publication:

08 April 2021, pp 175-194
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter focuses on critical infrastructures in the power grid, which often rely on Industrial Control Systems (ICS) to operate and are exposed to vulnerabilities ranging from physical damage to injection of information that appears to be consistent with industrial control protocols. This way, inﬁltration of ﬁrewalls protecting the control perimeter of the control network becomes a signiﬁcant tread. The goal of this chapter is to review identiﬁcation and intrusion detection algorithms for protecting the power grid, based on the knowledge of the expected behavior of the system.

Learning socio-organizational network structure in buildings with ambient sensing data
Andrew Sonta, Rishee K. Jain
Journal:

Data-Centric Engineering / Volume 1 / 2020

Published online by Cambridge University Press:

02 October 2020, e9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We develop a model that successfully learns social and organizational human network structure using ambient sensing data from distributed plug load energy sensors in commercial buildings. A key goal for the design and operation of commercial buildings is to support the success of organizations within them. In modern workspaces, a particularly important goal is collaboration, which relies on physical interactions among individuals. Learning the true socio-organizational relational ties among workers can therefore help managers of buildings and organizations make decisions that improve collaboration. In this paper, we introduce the Interaction Model, a method for inferring human network structure that leverages data from distributed plug load energy sensors. In a case study, we benchmark our method against network data obtained through a survey and compare its performance to other data-driven tools. We find that unlike previous methods, our method infers a network that is correlated with the survey network to a statistically significant degree (graph correlation of 0.46, significant at the 0.01 confidence level). We additionally find that our method requires only 10 weeks of sensing data, enabling dynamic network measurement. Learning human network structure through data-driven means can enable the design and operation of spaces that encourage, rather than inhibit, the success of organizations.

Rejoinder to Daniel Stegmueller's Comments
Martin Elff, Jan Paul Heisig, Merlin Schaeffer, Susumu Shikano
Journal:

British Journal of Political Science / Volume 51 / Issue 1 / January 2021

Published online by Cambridge University Press:

13 May 2020, pp. 460-462
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Multilevel Analysis with Few Clusters: Improving Likelihood-Based Methods to Provide Unbiased Estimates and Accurate Inference
Martin Elff, Jan Paul Heisig, Merlin Schaeffer, Susumu Shikano
Journal:

British Journal of Political Science / Volume 51 / Issue 1 / January 2021

Published online by Cambridge University Press:

13 May 2020, pp. 412-426
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Quantitative comparative social scientists have long worried about the performance of multilevel models when the number of upper-level units is small. Adding to these concerns, an influential Monte Carlo study by Stegmueller (2013) suggests that standard maximum-likelihood (ML) methods yield biased point estimates and severely anti-conservative inference with few upper-level units. In this article, the authors seek to rectify this negative assessment. First, they show that ML estimators of coefficients are unbiased in linear multilevel models. The apparent bias in coefficient estimates found by Stegmueller can be attributed to Monte Carlo Error and a flaw in the design of his simulation study. Secondly, they demonstrate how inferential problems can be overcome by using restricted ML estimators for variance parameters and a t-distribution with appropriate degrees of freedom for statistical inference. Thus, accurate multilevel analysis is possible within the framework that most practitioners are familiar with, even if there are only a few upper-level units.

A question of logic: experiments cannot prove lack of an herbicide-resistance fitness penalty
Roger D. Cousens
Journal:

Weed Science / Volume 68 / Issue 3 / May 2020

Published online by Cambridge University Press:

06 April 2020, pp. 197-198
- Article
- - You have access
- PDF
- HTML
- Export citation

Penn State Summer Schools
Gutti Jogesh Babu
Journal:

Proceedings of the International Astronomical Union / Volume 15 / Issue S367 / December 2019

Published online by Cambridge University Press:

23 December 2021, pp. 413-414

Print publication:

December 2019
- Article
- - You have access
  - Open access
- PDF
- Export citation
Intensive week-long Summer Schools in Statistics for Astronomers were initiated at Penn State in 2005 and have been continued annually. Due to their popularity and high demand, additional full summer schools have been organized in India, Brazil, Space Telescope Science Institute.
The Summer Schools seek to give a broad exposure to fundamental concepts and a wide range of resulting methods across many fields of statistics. The Summer Schools in statistics and data analysis for young astronomers present concepts and methodologies with hands on tutorials using the data from astronomical surveys.

MODELING THE NUMBER OF INSURED HOUSEHOLDS IN AN INSURANCE PORTFOLIO USING QUEUING THEORY
Jean-Philippe Boucher, Guillaume Couture-Piché
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 46 / Issue 2 / May 2016

Published online by Cambridge University Press:

28 March 2016, pp. 401-430

Print publication:

May 2016
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this paper, we use queuing theory to model the number of insured households in an insurance portfolio. The model is based on an idea from Boucher and Couture-Piché (2015), who use a queuing theory model to estimate the number of insured cars on an insurance contract. Similarly, the proposed model includes households already insured, but the modeling approach is modified to include new households that could be added to the portfolio. For each household, we also use the queuing theory model to estimate the number of insured cars. We analyze an insurance portfolio from a Canadian insurance company to support this discussion. Statistical inference techniques serve to estimate each parameter of the model, even in cases where some explanatory variables are included in each of these parameters. We show that the proposed model offers a reasonable approximation of what is observed, but we also highlight the situations where the model should be improved. By assuming that the insurance company makes a $1 profit for each one-year car exposure, the proposed approach allows us to determine a global value of the insurance portfolio of an insurer based on the customer equity concept.

Presenting parasitological data: the good, the bad and the error bar
SOPHIE G. ZALOUMIS, FREYA J. I. FOWKES, ALYSHA DE LIVERA, JULIE A. SIMPSON
Journal:

Parasitology / Volume 142 / Issue 11 / September 2015

Published online by Cambridge University Press:

29 June 2015, pp. 1351-1363
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Visual displays of data in the parasitology literature are often presented in a way which is not very informative regarding the distribution of the data. An example being simple barcharts with half an error bar on top to display the distribution of parasitaemia and biomarkers of host immunity. Such displays obfuscate the shape of the data distribution through displaying too few statistical measures to explain the spread of all the data and selecting statistical measures which are influenced by skewness and outliers. We describe more informative, yet simple, visual representations of the data distribution commonly used in statistics and provide guidance with regards to the display of estimates of population parameters (e.g. population mean) and measures of precision (e.g. 95% confidence interval) for statistical inference. In this article we focus on visual displays for numerical data and demonstrate such displays using an example dataset consisting of total IgG titres in response to three Plasmodium blood antigens measured in pregnant women and parasitaemia measurements from the same study. This tutorial aims to highlight the importance of displaying the data distribution appropriately and the role such displays have in selecting statistics to summarize its distribution and perform statistical inference.

Large scale spatio-temporal patterns of road development in the Amazon rainforest
SADIA E. AHMED, ROBERT M. EWERS, MATTHEW J. SMITH
Journal:

Environmental Conservation / Volume 41 / Issue 3 / September 2014

Published online by Cambridge University Press:

02 December 2013, pp. 253-264
- Article
- - You have access
- PDF
- HTML
- Export citation
There is burgeoning interest in predicting road development because of the wide ranging important socioeconomic and environmental issues that roads present, including the close links between road development, deforestation and biodiversity loss. This is especially the case in developing nations, which are high in natural resources, where road development is rapid and often not centrally managed. Characterization of large scale spatio-temporal patterns in road network development has been greatly overlooked to date. This paper examines the spatio-temporal dynamics of road density across the Brazilian Amazon and assesses the relative contributions of local versus neighbourhood effects for temporal changes in road density at regional scales. To achieve this, a combination of statistical analyses and model-data fusion techniques inspired by studies of spatio-temporal dynamics of populations in ecology and epidemiology were used. The emergent development may be approximated by local growth that is logistic through time and directional dispersal. The current rates and dominant direction of development may be inferred, by assuming that roads develop at a rate of 55 km per year. Large areas of the Amazon will be subject to extensive anthropogenic change should the observed patterns of road development continue.

Search Results

Refine search

Refine search

Actions for selected content:

26 results

Probability matching and statistical naïveté

1 - Introduction

Summary

Decoupling Visualization and Testing when Presenting Confidence Intervals

Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test

STATISTICAL ANALYSIS ON COMPLEXLY STRUCTURED DATA

Numerical Methods in Physics with Python

6 - Bivariate Regression and Correlation and Statistical Inference

Summary

Descriptive vs. Inferential Community Detection in Networks

4 - Statistical Inference

Summary

4 - Of Dice and Men*

Summary

The Essential Role of Statistical Inference in Evaluating Electoral Systems: A Response to DeFord et al.

8 - Graph-Theoretic Analysis of Power Grid Robustness

Summary

Learning socio-organizational network structure in buildings with ambient sensing data

Rejoinder to Daniel Stegmueller's Comments

Multilevel Analysis with Few Clusters: Improving Likelihood-Based Methods to Provide Unbiased Estimates and Accurate Inference

A question of logic: experiments cannot prove lack of an herbicide-resistance fitness penalty

Penn State Summer Schools

MODELING THE NUMBER OF INSURED HOUSEHOLDS IN AN INSURANCE PORTFOLIO USING QUEUING THEORY

Presenting parasitological data: the good, the bad and the error bar

Large scale spatio-temporal patterns of road development in the Amazon rainforest

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

26 results

Summary

Numerical Methods in Physics with Python

Summary

Descriptive vs. Inferential Community Detection in Networks

Summary

Summary

Summary