We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
One commonly used analytic technique that examines predictors of a binary outcome (disease/no disease, test positive/test negative, etc.) is called a logistic regression. As it is for other types of regression analyses, the final set of predictors that are added to the regression equation must all be present for each case that is included in the final analysis pool. The most important point when building a model is not to enter all variables in a haphazard fashion. There are specific steps to arriving at the final set of predictors.
Scientists are a curious breed. They seek knowledge to advance science to improve health and innovate novel therapies and techniques. Most scientists will develop hypotheses and gather evidence to support or refute their scientific suspicions or hypotheses. Therefore, analysis of the data should be intentional and directed at the research question in hand.
Again, parametric procedures are preferred over non-parametric because parametric analyses are more robust in that they use the actual values of the distribution in the analysis. If the data are incapable of becoming “normalized” by transforming the distribution to approximate a normal distribution, such as taking log10 of all HIV viral load values, non-parametric tests should be applied to examine your data. Let’s examine some non-parametric approaches to analyzing non-normally distributed data. In general, two tests, the Mann–Whitney U test and Spearman rank test, fall into this analytic category. In short, the Mann–Whitney U test is the non-parametric equivalent to the T-test and the Spearman rank test is the non-parametric equivalent to the Pearson correlation.
This chapter is dedicated mainly to laboratory professionals who need to design tests starting with sample size considerations and power. It provides multiple examples of calculatng sample size and power in circumstances of varying disease prevalence, varying confidence intervals, varying levels of power, and for different test indices. There may be instances when some of these techniques are used by epidemiologists and clinicians, especially in situations when they must evaluate diagnostic tests before implementing testing in the field. The sensitivity, specificity, and positive and negative predictive value would be indices of interest to both laboratory and clinical professionals. The chapter also gets into concepts that explain differences in commonly used terms such as precision vs. reproducibility, validity vs. reliability, inter-lot variation vs. intra-lot variation. For those in the drug development industry, the concepts of pharmacodynamics and pharmacokinetics are also described. Calculations of the coefficient of variation are included.
This chapter focuses on guidance in selecting the appropriate statistical test depending on what type of data is being analyzed. Remember, data can be continuous, binary, ordinal, nominal, normally distributed, non-normally distributed, log-distributed, and so on (Chapter 2). Decisions must be based on a full understanding of the kind of data you have and your analytic objective. Conduct your preliminary analyses! Plot your data! Look at your data! Do you have outliers, skewness, errors?
A contingency table is a table that shows the distribution of one variable within categories of another, like gender vs. disease/no disease. These tables can be 2 × 2, 2 × 3, 2 × 4 (if you were to examine gender by race, for example), 2 × 6, etc. The second variable can have two values (such as yes or no) or three or more values like race (White, African American, Asian, etc.). When examining a 2 × 2 table like disease by gender, one would test for statistical significance using chi-square (χ) analysis. However, like ANOVA (Section 4.3.2), when including a variable that has more than two categories, like race, you can run the χ statistic but there are so many resulting cells, you won’t really know where the statistical differences lie since you are examining so many categories at once.
This chapter serves as a navigation tool that informs the investigator on what statistical test to perform when the data are continuous, non-continuous, log-distributed, when one has time-to-event data, and it explains why these are proper tests. The reader may have to consult Chapter 1 again to look up data types and why it is important to know how their data are distributed, and particularly why it is important to harness this knowledge as they become more proficient in their statistical skills, so that they can comfortably assess what is proper or improper statistics when evaluating a manuscript for publication, or just to fully understand peer-reviewed methods. The chapter also provides numerous online tools that you may use free of charge, to conduct your own statistical tests.
This chapter serves as the foundation of understanding the underlying concepts related to statistics. This chapter should be read over and over and over again, as it is key to understanding the remainder of the book. Such important concepts as sample vs. population, data management, Central Limit Theorem, and when to use parametric vs. non-parametric procedures are introduced. It provides concise descriptions and examples of basic measures of central tendancy (mean, median, mode range, interquartile range, etc.), measures of disperion around the averages, description of continuous vs. non-continuous measures, normal vs. non-normal distributions, log distributions, confidence intervals, and when to use one-sided vs. two-sided P-values. Many example calculations of sample size and power are also described for a number of different test situations.
This chapter gives examples of very basic graphs and charts and how to read them. Such graphs and charts are used by clinicians and by laboratory personnel. One plot that may be relatively overlooked is the normal probability plot, which gives a visual snapshot of the distribution of data. It might be a simple way to non-statistically determine whether or not data are normally or non-normally distributed, or if they are bi- or tri-modal, or log-distributed.
This chapter briefly reviews basic measures of disease occurrence and methods used to stratify disease occurrence by any number of factors, such as age. The distinctions between disease rate and disease density are described. Investigators may refer back to Chapter 8 to understand how varying prevalence, sensitivity and specificity, and confidence intervals impact the tools used to measure disease burden.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.