To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Delving into the foundational aspects of data management, this chapter explores the relationship between logical data formats and physical storage in computing systems. It discusses how logical abstractions in system software for data management interact with the physical placement of data. The chapter emphasizes the significance of designing storage data formats effectively to minimize unnecessary I/O traffic and network communications. By optimizing these formats, readers learn how to achieve efficient utilization of resources, leading to improved performance in data processing tasks. This sets a crucial foundation for understanding the broader concepts of data management throughout the book.
This guide illuminates the intricate relationship between data management, computer architecture, and system software. It traces the evolution of computing to today's data-centric focus and underscores the importance of hardware-software co-design in achieving efficient data processing systems with high throughput and low latency. The thorough coverage includes topics such as logical data formats, memory architecture, GPU programming, and the innovative use of ray tracing in computational tasks. Special emphasis is placed on minimizing data movement within memory hierarchies and optimizing data storage and retrieval. Tailored for professionals and students in computer science, this book combines theoretical foundations with practical applications, making it an indispensable resource for anyone wanting to master the synergies between data management and computing infrastructure.
In Chapter 3 we learned how to do basic probability calculations and even put them to use solving some fairly complicated probability problems. In this chapter and the next two, we generalize how we do probability calculations, where we will transition from working with sets and events to working with random variables.
To do statistics you must first be able to “speak probability.” In this chapter we are going to concentrate on the basic ideas of probability. In probability, the mechanism that generates outcomes is assumed known and the problems focus on calculating the chance of observing particular types or sets of outcomes. Classical problems include flipping “fair” coins (where fair means that on one flip of the coin the chance it comes up heads is equal to the chance it comes up tails) and “fair” dice (where fair now means the chance of landing on any side of the die is equal to that of landing on any other side).
In Chapter 5 we learned about a number of discrete distributions. In this chapter we focus on continuous distributions, which are useful as models of various real-world events. By the end of this chapter you will know nine continuous and eight discrete distributions. There are many more continuous distributions, but these nine will suffice for our purposes. These continuous distributions are useful for modeling various types of processes and phenomena that are encountered in the real world.
Sampling joke: “If you don’t believe in random sampling, the next time you have a blood test, tell the doctor to take it all.” At the beginning of Chapter 7 we introduced the ideas of population vs. sample and parameter vs. statistic. We build on this in the current chapter. The key concept in this chapter is that if we were to take different samples from a distribution and compute some statistic, such as the sample mean, then we would get different results.
The last two chapters have covered the basic concepts of estimation. In Chapter 9 we studied the problem of giving a single number to estimate a parameter. In Chapter 10 we looked at ways to give an interval that we believe will include the true parameter. In many applications, we want to ask some very specific questions about the parameter(s).
We begin this chapter with a review of hypothesis testing from Chapter 12. A hypothesis is a statement about one or more parameters of a model. The null hypothesis is usually a specific statement that encapsulates “no effect.” For example, if we apply one of the two treatments, A or B, to volunteers we may be interested in testing whether the population mean outcomes are equal.
Up to this point we have been talking about what are often called frequentist methods, because a statistical method is based on properties of its long-run relative frequency. With this approach, the probability of an event is defined as the proportion of times the event occurs in the long run. Parameters, that is values that characterize a distribution, such as the mean and variance of a normal distribution, are considered fixed but unknown.
We are often interested in how one or more predictor variables are associated with some outcome or response. We might postulate that the outcome depends on the predictors through some function.
In statistics, we are often interested in some characteristics of a population. Maybe we are interested in the mean of some measurable characteristic, or maybe we are interested in the proportion of the population that have some property. In all but the simplest cases, the population is so large that it is impossible, or at least impractical, to take the measurement on every item in the population. We therefore have to settle on taking a sample and measuring those units selected for this sample.
Forecasting is an important problem that spans many fields, including business and industry, government, economics, environmental sciences, medicine, social science, politics, and finance. Forecasting problems are often classified as short term, medium term, and long term. Short-term forecasting problems involve predicting events only a few time periods (days, weeks, and months) into the future. Medium-term forecasts extend from 1 to 2 years into the future, and long-term forecasting problems can extend beyond that by many years.