To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter considers induction, deduction and abduction as methods of obtaining scientific knowledge. The introductory section again ends by highlighting that there is no single method, and refers to claims that scientific reasoning uses various heuristics or rules of thumb based on the specific approach and the background information we have, and that we should recognise that this can introduce various errors of reasoning: by being aware of the potential for making these errors, we are better able to guard against making them. The bulk of the chapter then looks at specific logical fallacies, using neuroscience examples to illustrate them. These include ad hoc reasoning; begging the question; confusing correlation for causation; confirmation and disconfirmation biases; false dichotomies; false metaphors; the appeal to authority, tradition and emotion; the mereological fallacy; the naturalistic fallacy; and straw man arguments.
When overdispersion and correlation co-occur in longitudinal count data, as is often the case, an analysis method that can handle both phenomena simultaneously is needed. The correlated Poisson distribution (CPD) proposed by Drezner and Farnum (Communications in Statistics-Theory and Methods, 22(11), 3051–3063, 1994) is a generalization of the classical Poisson distribution with the incorporation of an additional parameter that allows dependence between successive observations of the phenomenon under study. This parameter both measures the correlation and reflects the degree of dispersion. The classical Poisson distribution is obtained as a special case when the correlation is zero. We present an in-depth review of this CPD and discuss some methods to estimate the distribution parameters. The inclusion of regression components in this distribution is enabled by allowing one of the parameters to include available information concerning, in this case, automobile insurance policyholders. The proposed distribution can be viewed as an alternative to the Poisson, negative binomial, and Poisson-inverse Gaussian approaches. We then describe applications of the distribution, suggest it is appropriate for modeling the number of claims in an automobile insurance portfolio, and establish some new distribution properties.
In Chapter 1, we estimated the correlations of linear approximations by finding a suitable linear trail and applying the piling-up lemma, but this approach relied on an unjustified independence assumption. This chapter puts the piling-up lemma and linear cryptanalysis in general on a more solid theoretical foundation. This is achieved by using the theory of correlation matrices. Daemen proposed these matrices in 1994 to simplify the description of linear cryptanalysis.
The aim of this study was to identify potential risk factors for acute and late genitourinary toxicities and to determine, using a logistic regression model, which of these factors are also significant and robust predictors of these toxicities.
Methods:
We conducted a retrospective study by analysing the patient records and their treatment plans from 2013 to 2021. In total, a cohort of 46 patients with clinically staged cT1c-T4N0-1M0 prostate adenocarcinoma was treated with three-dimensional conformal radiotherapy (3D-CRT) with doses ranging from 66 to 80 Gy. Post-radiotherapy genitourinary toxicities were classified and graded according to the Common Terminology Criteria for Adverse Events (CTCAE v4.0).
Results:
Median follow-up was 57·5 months (range: 39 – 88 months). In univariable analysis, patient age (p = 0·040), the prostate volume (p = 0·0423), the clinical prostate volume irradiated at the prescribed dose (p = 0·029) and the volume of the bladder receiving doses varying from 60 to 70 Gy were correlated with acute GU toxicities. Arterial hypertension (p = 0·022), some pre-existing urinary symptoms, a history of catheterisation (p = 0·044) and acute genitourinary toxicity (p = 0.009) were linked to late genitourinary toxicities. The logistic regression model found that the prostate volume (p = 0·0423) and the clinical prostate volume irradiated at the prescribed dose (p = 0·029) were predictive of acute GU toxicity. Hypertension (p = 0·039) and acute toxicities were predictive of late GU toxicity.
Conclusion:
The results of our study showed that it is essential to identify patients at risk of toxicities from the start of radiotherapy and to offer more proactive monitoring and management.
From this point on in the textbook, the student researcher has finished collecting data for the study and is performing the data analysis. In this chapter, students learn how to clean and screen their data as well as checking the relationships between independent variables (IVs) and the dependent variable (DV). Basic statistical calculations (e.g., mean, standard deviation, normal distribution) are reviewed and applied. How to create survey factors (e.g., by calculating the total or mean of a subset of survey items) is reviewed. Instructions for calculating Pearson r among the hypotheses’ variables are provided along with reasoning (and warnings) for using correlations to investigate relationships among the data. Step-by-step instructions are provided for both SPSS and R.
This chapter provides a discussion on multivariate random variables, which are collections of univariate random variables. The chapter discusses how the presence of multiple random variables gives rise to concepts of covariance and correlation, which capture relationships that can arise between variables. The chapter also discussed the multivariate Gaussian model, which is widely used in applications.
This chapter focuses on correlation, a key metric in data science that quantifies to what extent two quantities are linearly related. We begin by defining correlation between normalized and centered random variables. Then, we generalize the definition to all random variables and introduce the concept of covariance, which measures the average joint variation of two random variables. Next, we explain how to estimate correlation from data and analyze the correlation between the height of NBA players and different basketball stats.In addition, we study the connection between correlation and simple linear regression. We then discuss the differences between uncorrelation and independence. In order to gain better intuition about the properties of correlation, we provide a geometric interpretation of correlation, where the covariance is an inner product between random variables. Finally, we show that correlation does not imply causation, as illustrated by the spurious correlation between temperature and unemployment in Spain.
We rely heavily on cut-off points of brief measures of psychological distress in research and clinical practice to identify those at risk of mental health conditions; however, few studies have compared the performance of different scales.
Aim
To determine the extent to which the child- and parent-report Strength and Difficulty Questionnaire (SDQ), Revised Children’s Anxiety and Depression Scale (RCADS), short Mood and Feeling Questionnaire (sMFQ) and child-report KIDSCREEN correlated and identified the same respondents above cut-off points and at risk of mental health conditions.
Method
A cross-sectional survey was conducted among 231 children aged 11–16 years and 289 parents who completed all the above measures administered via a mobile app, MyJournE, including the SDQ, RCADS and sMFQ.
Results
The psychopathology measures identified similar proportions of young people as above the cut-off point and at risk of depression (child report 14.7% RCADS, 19.9% sMFQ, parent report 8.7% RCADS, 12.1% sMFQ), anxiety (child report 24.7% RCADS, 26.0% SDQ-Emotional subscale, parent report 20.1% RCADS, 26% SDQ-Emotional subscale) and child-report internalising problems (26.8% RCADS, 29.9% SDQ). Despite strong correlations between measures (child report 0.77–0.84 and parent report 0.70–0.80 between the SDQ, sMFQ and RCADS) and expected directions of correlation with KIDSCREEN and SDQ subscales, kappa values indicate moderate to substantial agreement between measures. Measures did not consistently identify the same children; half (n = 36, 46%) of those on child report and a third (n = 30, 37%) on parent report, scoring above the cut-off point for the SDQ-Emotional subscale, RCADS total or sMFQ, scored above the cut-off point on all of them. Only half (n = 46, 54%) of the children scored above the cut-off point on child report by the SDQ-Internalising and RCADS total scales.
Conclusion
This study highlights the risk of using a screening test to ‘rule out’ potential psychopathology. Screening tests should not be used diagnostically and are best used together with broad assessment.
The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host–parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.
We introduce the expectation of a random variable on a probability space, which is its best guess in the least-squares sense. The variance is an operator quantifying the dispersion of its distribution. We recall the expression of the expectation and variance of a weighted sum of variables. The moment generating function is a characterization of a distribution which is powerful for finding the distribution of a linear combination of variables. Those concepts are then extended to pairs of variables, for which covariance and correlation can be defined. The Central Limit Theorem gives another interpretation to the expectation, which is also the asymptotic value taken by the average of variables having the same distribution. Finally, we introduce a special class of random variables called Radon–Nikodym derivatives, which are nonnegative and display unit expectation. This family of variables can be used to build new probability measures starting from a reference probability space. Switching probability measures triggers a modification of the distribution of the random variables at hand. Those concepts are illustrated on various examples including coins, dice, and stock price models.
We run a laboratory experiment to test the concept of coarse correlated equilibrium (Moulin and Vial in Int J Game Theory 7:201–221, 1978), with a two-person game with unique pure Nash equilibrium which is also the solution of iterative elimination of strictly dominated strategies. The subjects are asked to commit to a device that randomly picks one of three symmetric outcomes (including the Nash point) with higher ex-ante expected payoff than the Nash equilibrium payoff. We find that the subjects do not accept this lottery (which is a coarse correlated equilibrium); instead, they choose to play the game and coordinate on the Nash equilibrium. However, given an individual choice between a lottery with equal probabilities of the same outcomes and the sure payoff as in the Nash point, the lottery is chosen by the subjects. This result is robust against a few variations. We explain our result as selecting risk-dominance over payoff dominance in equilibrium.
In the present technological age, where cyber-risk ranks alongside natural and man-made disasters and catastrophes – in terms of global economic loss – businesses and insurers alike are grappling with fundamental risk management issues concerning the quantification of cyber-risk, and the dilemma as to how best to mitigate this risk. To this end, the present research deals with data, analysis, and models with the aim of quantifying and understanding cyber-risk – often described as “holy grail” territory in the realm of cyber-insurance and IT security. Nonparametric severity models associated with cyber-related loss data – identified from several competing sources – and accompanying parametric large-loss components, are determined, and examined. Ultimately, in the context of analogous cyber-coverage, cyber-risk is quantified through various types and levels of risk adjustment for (pure-risk) increased limit factors, based on applications of actuarially founded aggregate loss models in the presence of various forms of correlation. By doing so, insight is gained into the nature and distribution of volatile severity risk, correlated aggregate loss, and associated pure-risk limit factors.
Interval estimates of the Pearson, Kendall tau-a and Spearman correlations are reviewed and an improved standard error for the Spearman correlation is proposed. The sample size required to yield a confidence interval having the desired width is examined. A two-stage approximation to the sample size requirement is shown to give accurate results.
Corrections of correlations for range restriction (i.e., selection) and unreliability are common in psychometric work. The current rule of thumb for determining the order in which to apply these corrections looks to the nature of the reliability estimate (i.e., restricted or unrestricted). While intuitive, this rule of thumb is untenable when the correction includes the variable upon which selection is made, as is generally the case. Using classical test theory, we show that it is the nature of the range restriction, not the nature of the available reliability coefficient, that determines the sequence for applying corrections for range restriction and unreliability.
It is commonly held that even where questionnaire response is poor, correlational studies are affected only by loss of degrees of freedom or precision. We show that this supposition is not true. If the decision to respond is correlated with a substantive variable of interest, then regression or analysis of variance methods based upon the questionnaire results may be adversely affected by self-selection bias. Moreover such bias may arise even where response is 100%. The problem in both cases arises where selection information is passed to the score indirectly via the disturbance or individual effects, rather than entirely via the observable explanatory variables. We suggest tests for the ensuing self-selection bias and possible ways of handling the ensuing problems of inference.
Using variance stabilizing transformations, this note describes approximate solutions to the ranking and selection problem of selecting the best binomial population or the bivariate normal population with the largest correlation coefficient.
The validity of a test is often estimated in a nonrandom sample of selected individuals. To accurately estimate the relation between the predictor and the criterion we correct this correlation for range restriction. Unfortunately, this corrected correlation cannot be transformed using Fisher's Z transformation, and asymptotic tests of hypotheses based on small or moderate samples are not accurate. We developed a Fisher r to Z transformation for the corrected correlation for each of two conditions: (a) the criterion data were missing due to selection on the predictor (the missing data were MAR); and (b) the criterion was missing at random, not due to selection (the missing data were MCAR). The two Z transformations were evaluated in a computer simulation. The transformations were accurate, and tests of hypotheses and confidence intervals based on the transformations were superior to those that were not based on the transformations.
The scoring of response vectors to give maximum test-retest correlation is investigated. Simple sufficiency arguments show that the form of the best scores is very restricted. A general method is given for finding the best scores, deriving the best scores for the normal factor model, and showing by calculation of several particular cases that for a standard model for binary response it is easy to approximate the best scores.
We derive an analytic model of the inter-judge correlation as a function of five underlying parameters. Inter-cue correlation and the number of cues capture our assumptions about the environment, while differentiations between cues, the weights attached to the cues, and (un)reliability describe assumptions about the judges. We study the relative importance of, and interrelations between these five factors with respect to inter-judge correlation. Results highlight the centrality of the inter-cue correlation. We test the model’s predictions with empirical data and illustrate its relevance. For example, we show that, typically, additional judges increase efficacy at a greater rate than additional cues.