To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter puts forward new guidelines for designing and implementing distributed machine learning algorithms for big data. First, we present two different alternatives, which we call local and global approaches. To show how these two strategies work, we focus on the classical decision tree algorithm, revising its functioning and some details that need modification to deal with large datasets. We implement a local-based solution for decision trees, comparing its behavior and efficiency against a sequential model and the MLlib version. We also discuss the nitty-gritty of the implementation of decision trees in MLlib as a great example of a global solution. That allows us to formally define these two concepts, discussing the key (expected) advantages and disadvantages. The second part is all about measuring the scalability of a big data solution. We talk about three classical metrics, speed-up, size-up, and scale-up, to help understand if a distributed solution is scalable. Using these, we test our local-based approach and compare it against its global counterpart. This experiment allows us to give some tips for calculating these metrics correctly using a Spark cluster.
This chapter deals with the topic of inequality in the healthcare sector. The chapter begins by looking at some summary data and the inequality in various measures of health that are obvious from simple differences. Then the chapter discusses some basic social philosophy on the topic of how to judge fairness in an abstract society. More detailed data on health outcomes is then explored, highlighted inequality along different demographic lines. Then there is a discussion of theories that fit the facts presented: why do these inequalities exist, and what do we learn by unpacking them: both social implications as well as clinical. Finally, inequalities in the labor market for healthcare workers are discussed.
This chapter covers the medical malpractice system: how it works, what its goals are, and how it influences provider behavior. The chapter begins by defining key terminology in tort law and explaining the process by which a medical malpractice case is brought and resolved, as well as the goals that this system is trying to achieve. Then discussion turns to how this system creates incentive for actions that run counter to its goals and the problems that are likely to arise, along with some empirical evidence of the existence of said problems.
This chapter introduces the basics of the economic approach to understanding decision-making. This is done using examples drawn from consumer decision-making in the context of healthcare. Topics include how to think about preferences, different types of costs, optimization, and the importance of perceptions. The end of chapter supplement discusses how to use price indexes.
Chapter 14 provides a brief overview of major health insurance provided by the government in the United States. The majority of the chapter goes through the major programmatic details of the Medicare and Medicaid Programs: who is covered, what services are covered and how generously, and how providers are paid by the programs. The discussion of Medicaid also explains the nature of the federal-state partnership and explores ways in which states have and use their freedom to expand upon the program. The remainder of the chapter briefly describes other public insurance (or insurance-related) programs: the VA and CHAMPVA, TRICARE, CHIP, COBRA, and the IHS.
This chapter introduces Spark, a data processing engine that mitigates some of the limitations of Hadoop MapReduce to perform data analytics efficiently. We begin with the motivation of Spark, introducing Spark RDDs as an in-memory distributed data structure that allows for faster processing while maintaining the attractive properties of Hadoop, such as fault tolerance. We then cover, hands-on, how to create and operate with RDDs, distinguishing between transformations and actions. Furthermore, we discuss how to work with key–value RDDs (which is more like MapReduce), how to use caching to perform iterative queries/operations, and how RDD lineage works to ensure fault tolerance. We provide a great range of examples with transformations such as map vs. flatMap and groupByKey vs. reduceByKey, discussing their behavior, adequacy (depending on what we want to achieve), and their performance. More advanced concepts, such as shared variables (broadcast and accumulators) or work by partitions are presented towards the end. Finally, we talk about the anatomy of a Spark application, as well as the different types of dependencies (narrow vs. wide) and the limitations on optimizing their processing.
Chapter 5 moves from theory into evidence and discusses how empirical economists think about causality. First, the chapter covers common issues that make it difficult to have confidence in causal claims based on associational evidence alone. Then, experimental evidence is discussed: how to run an experiment, common pitfalls that can undermine confidence in experimental evidence, and what can be done to avoid them. Next, major experimental studies on the impact of health insurance are described. Finally, the chapter discusses the concept of quasi-experimental evidence and how it fits into economics. The end of chapter supplement discusses ethics in research with human subjects and the role of institutional review boards.
Chapter 3 develops the fundamentals of demand analysis through the lens of demand for medical care. The chapter discusses how to read and use demand curves and why demand for health is not commonly used (there is no direct market for health). Then various demand elasticities are covered: price elasticity of demand, income elasticity of demand, and cross-price elasticity of demand. Each elasticity is developed along with examples specific to the market for medical care and health decision-making. The chapter also develops important tools for using demand curves: how to think through demand shifters, Engel curves, how to calculate consumer surplus, and how to aggregate from individual to market demand. The end of chapter supplement walks through how to calculate elasticities.
This chapter focuses in on how individuals invest in or harm their health. This is done using the concept of the health production function. Various inputs to the function such as medical care, illness and injury, lifestyle choices, age, genetics, and environmental factors are discussed along with their interactions with each other. Key health economics concepts such as the flat of the curve, the tradeoff between health and utility, and the role of technological innovation in medical care effectiveness are discussed, as well as the key economic concept of the margin. The end of chapter supplement introduces the concept of standardized units for measuring effectiveness of care.
In this chapter we cover clustering and regression, looking at two traditional machine learning methods: k-means and linear regression. We briefly discuss how to implement these methods in a non-distributed manner first, to then carefully analyze the bottlenecks of these methods when manipulating big data. This enables us to design global-based solutions based on the DataFrame API of Spark. The key focus is on the principles for designing solutions effectively. Nevertheless, some of the challenges in this chapter are to investigate tools from Spark to speed up the processing even further. k-means is an example of an iterative algorithm, and how to exploit caching in Spark, and we analyze its implementation with both RDD and DataFrame APIs. For linear regression, we first implement the closed form, which involves numerous matrix multiplications and outer products, to simplify the processing in big data. Then, we look at gradient descent. These examples give us the opportunity to expand on the principles of designing a global solution, and also allow us to show how knowing the underlying platform, Spark in this case, well is essential to really maximize the performance.