To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.
Hadoop is an open-source framework, written in Java, for big data processing and storage that is based on the MapReduce programming model. This chapter starts off with a brief introduction to Hadoop and how it has evolved to become a solid base platform for most big data frameworks. We show how to implement the classical Word Count using Hadoop MapReduce, highlighting the difficulties in doing so. After that, we provide essential information about how the resource negotiator, YARN, and its distributed file system, HDFS, work. We describe step by step how a MapReduce process is executed on YARN, introducing the concepts of resource and node managers, application master, and containers, as well as the different execution models (standalone, pseudo-distributed, and fully-distributed). Likewise, we talk about the HDFS, covering the basic design of this filesystem, and what it means in terms of functionality and efficiency. We also discuss recent advances such as erasure coding, HDFS federation, and high availability. Finally, we expose the main limitations of Hadoop and how it has sparked the rise of many new big data frameworks, which now coexist within the Hadoop ecosystem.
This chapter puts forward new guidelines for designing and implementing distributed machine learning algorithms for big data. First, we present two different alternatives, which we call local and global approaches. To show how these two strategies work, we focus on the classical decision tree algorithm, revising its functioning and some details that need modification to deal with large datasets. We implement a local-based solution for decision trees, comparing its behavior and efficiency against a sequential model and the MLlib version. We also discuss the nitty-gritty of the implementation of decision trees in MLlib as a great example of a global solution. That allows us to formally define these two concepts, discussing the key (expected) advantages and disadvantages. The second part is all about measuring the scalability of a big data solution. We talk about three classical metrics, speed-up, size-up, and scale-up, to help understand if a distributed solution is scalable. Using these, we test our local-based approach and compare it against its global counterpart. This experiment allows us to give some tips for calculating these metrics correctly using a Spark cluster.
This chapter deals with the topic of inequality in the healthcare sector. The chapter begins by looking at some summary data and the inequality in various measures of health that are obvious from simple differences. Then the chapter discusses some basic social philosophy on the topic of how to judge fairness in an abstract society. More detailed data on health outcomes is then explored, highlighted inequality along different demographic lines. Then there is a discussion of theories that fit the facts presented: why do these inequalities exist, and what do we learn by unpacking them: both social implications as well as clinical. Finally, inequalities in the labor market for healthcare workers are discussed.
This chapter covers the medical malpractice system: how it works, what its goals are, and how it influences provider behavior. The chapter begins by defining key terminology in tort law and explaining the process by which a medical malpractice case is brought and resolved, as well as the goals that this system is trying to achieve. Then discussion turns to how this system creates incentive for actions that run counter to its goals and the problems that are likely to arise, along with some empirical evidence of the existence of said problems.
This chapter introduces the basics of the economic approach to understanding decision-making. This is done using examples drawn from consumer decision-making in the context of healthcare. Topics include how to think about preferences, different types of costs, optimization, and the importance of perceptions. The end of chapter supplement discusses how to use price indexes.
Chapter 14 provides a brief overview of major health insurance provided by the government in the United States. The majority of the chapter goes through the major programmatic details of the Medicare and Medicaid Programs: who is covered, what services are covered and how generously, and how providers are paid by the programs. The discussion of Medicaid also explains the nature of the federal-state partnership and explores ways in which states have and use their freedom to expand upon the program. The remainder of the chapter briefly describes other public insurance (or insurance-related) programs: the VA and CHAMPVA, TRICARE, CHIP, COBRA, and the IHS.
This chapter introduces Spark, a data processing engine that mitigates some of the limitations of Hadoop MapReduce to perform data analytics efficiently. We begin with the motivation of Spark, introducing Spark RDDs as an in-memory distributed data structure that allows for faster processing while maintaining the attractive properties of Hadoop, such as fault tolerance. We then cover, hands-on, how to create and operate with RDDs, distinguishing between transformations and actions. Furthermore, we discuss how to work with key–value RDDs (which is more like MapReduce), how to use caching to perform iterative queries/operations, and how RDD lineage works to ensure fault tolerance. We provide a great range of examples with transformations such as map vs. flatMap and groupByKey vs. reduceByKey, discussing their behavior, adequacy (depending on what we want to achieve), and their performance. More advanced concepts, such as shared variables (broadcast and accumulators) or work by partitions are presented towards the end. Finally, we talk about the anatomy of a Spark application, as well as the different types of dependencies (narrow vs. wide) and the limitations on optimizing their processing.
Chapter 5 moves from theory into evidence and discusses how empirical economists think about causality. First, the chapter covers common issues that make it difficult to have confidence in causal claims based on associational evidence alone. Then, experimental evidence is discussed: how to run an experiment, common pitfalls that can undermine confidence in experimental evidence, and what can be done to avoid them. Next, major experimental studies on the impact of health insurance are described. Finally, the chapter discusses the concept of quasi-experimental evidence and how it fits into economics. The end of chapter supplement discusses ethics in research with human subjects and the role of institutional review boards.
Chapter 3 develops the fundamentals of demand analysis through the lens of demand for medical care. The chapter discusses how to read and use demand curves and why demand for health is not commonly used (there is no direct market for health). Then various demand elasticities are covered: price elasticity of demand, income elasticity of demand, and cross-price elasticity of demand. Each elasticity is developed along with examples specific to the market for medical care and health decision-making. The chapter also develops important tools for using demand curves: how to think through demand shifters, Engel curves, how to calculate consumer surplus, and how to aggregate from individual to market demand. The end of chapter supplement walks through how to calculate elasticities.