To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces what we mean by big data, its importance, and the key principles for handling it efficiently. The term “big data” is frequently used to refer to the idea of exploiting lots of data to obtain some benefit, but there is no standard definition. It is also commonly known as large-scale data processing or data-intensive applications. We discuss the key components of big data, and how it is not all about volume, but also other aspects such as velocity and variety need to be considered. The world of big data has multiple faces such as databases, infrastructure, and security, but we focus on data analytics. Then, we cover how to deal with big data, explaining why we cannot scale up using a single computer, but we must scale out and use multiple machines to process the data. We suggest why traditional high-performance computing clusters are not appropriate for data-intensive applications and how they would collapse the network. Finally, we introduce key features of a big data cluster such as not being focused on pure computation, no need for high-end computers, the need for fault tolerance mechanisms, and respecting the principle of data locality.
Chapter 16 establishes the concept of an externality, a situation where a decision carries additional costs or benefits to a party not involved in the decision-making process. The chapter classifies several types of externalities and explores how and why they lead to undesirable outcomes. Then potential solutions to externalities as well as their strengths and weaknesses are discussed: subsidies, taxes, regulation, and the assignment of property rights. The chapter ends with a discussion of cost–benefit analysis and its role in evaluating interventions aimed at combatting externality problems.
This chapter begins by applying the theory of the firm to the context of a medical practice. The concepts of profit maximization, inputs to production, and outputs from production are all applied to this example. Then the chapter moves on to discuss sources of efficiency in production and specific applications to medical care production. The chapter then develops the basics of supply curves, and how they are interpreted and used, and then brings supply together with demand to describe the basics of equilibrium in an efficient market. The last part of the chapter discusses threats to efficiency in markets, giving a brief overview of the largest sources of difficulty in applying efficient markets to the healthcare context. This sets up the rest of the book: the healthcare sector is filled with problems that make efficient markets unlikely, meaning that understanding the underlying economics is vital.
Chapter 9 explores the hospital as an economic entity. The chapter discusses the differences between for-profit and not-for-profit hospitals, the organization of hospital workforces, and how the for-profit/not-for-profit distinction interacts with the organization. Then the chapter explores how different parties involved with the hospital seek to exert influence over how hospital resources are used: what they want and how they get it. Finally, the chapter covers topics in how hospitals operate within the sector: hospital growth, hospital competition, and operation of hospital systems.
Chapter 11 focuses on the adverse selection problem in the health insurance market, which is sometimes referred to as the “death spiral.” First, the concept of “uninsurable” individuals is developed and the rationale for eliminating differential pricing based on pre-existing conditions (experience rating) is explored. Then the chapter discusses the consequences of this choice: the adverse selection problem. Finally, alternative solutions to adverse selection are explored – both single-payer systems and market-based solutions such as the Affordable Care Act. The end of chapter supplement explores the historical reasons for differences between the health insurance system in the United States and the system of a close ally, the United Kingdom.
This chapter introduces the machine learning side of things of this book. Although we assume some prior experience in machine learning, we start off with a full recap of the basic concepts and key terminology. This includes a discussion of learning paradigms, such as supervised and unsupervised learning, and the machine learning life cycle, articulating the steps to go from data collection to model deployment. We cover topics like data preparation and preprocessing, model evaluation and selection, and machine learning pipelines, showing how all the stages of this cycle are susceptible to being compromised when we talk about large-scale data analytics. After that, the rest of the chapter is devoted to the machine learning library of Spark, MLLib. Basic concepts such as Transformers, Estimators, and Pipelines are presented with an example using linear regression. The example provided forces us to use a pipeline of methods to get the data ready for training. This allows us to introduce some of the data preparation packages of Spark (e.g., VectorAssembler or StandardScaler). Finally, we explore evaluation packages (e.g., RegressionEvaluator) and how to perform hyperparameter tuning.
Chapter 12 deals with how prices are set within the healthcare sector. Prices do not come from direct interaction between buyers and sellers but are instead arrived at via a complex negotiation process between multiple parties. This chapter covers the lifecycle of a medical bill, the role of chargemasters (master price lists) in billing, how providers negotiate with insurers, and how the uninsured are treated in billing. The chapter then discusses price competition and quality competition between providers, as well as the role of price transparency.
Chapter 19 explores differences in spending on healthcare and health outcomes across countries. The main result of interest from these comparisons is that the United States spends vastly more than other OECD countries but gets only middle of the road health outcomes. The second part of the chapter explores leading theories as to why this is the case. The final part of the chapter compares different strategies for setting up a healthcare and health insurance sector: the Bismarck model, the Beveridge model, national health insurance, and hybrid models.
Chapter 8 explores how incentives influence the behavior of providers. The first part of the chapter is an application of personnel economics, discussing how different pay schemes create incentives for different types of effort. The chapter goes on to discuss the latent variable problem when looking at pay schemes and behavior and covers literature that helps to unpack the causal story. The second half of the chapter discusses demand inducement and its variants (fee splitting and self-referral) from both a theoretical and empirical perspective. The end of chapter supplement gives a brief introduction to behavioral economics and how cognitive biases can also be used to influence provider behavior.
MapReduce is a parallel programming model that follows a simple divide-and-conquer strategy to tackle big datasets in distributed computing. This chapter begins with a discussion of the key distinguishing features and differences of MapReduce with respect to similar distributing computing tools like Message Passing Interface (MPI). Then, we introduce its two main functions, map and reduce, based on functional programming. After that, the notation of how MapReduce works is presented using the classical WordCount example as the Hello World of big data, discussing different ways to parallelize it and their main advantages and disadvantages. Next, we delve into MapReduce a bit more formally, and its functions in terms of key–value pairs, as well as the key properties of the map, shuffle, and reduce operations. At the end of the chapter we cover some important details as to how to achieve fault tolerance, how to exploit MapReduce to preserve data locality, how it can reduce data transfer across computers using combiners, and additional information about its internal working.
The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.
Hadoop is an open-source framework, written in Java, for big data processing and storage that is based on the MapReduce programming model. This chapter starts off with a brief introduction to Hadoop and how it has evolved to become a solid base platform for most big data frameworks. We show how to implement the classical Word Count using Hadoop MapReduce, highlighting the difficulties in doing so. After that, we provide essential information about how the resource negotiator, YARN, and its distributed file system, HDFS, work. We describe step by step how a MapReduce process is executed on YARN, introducing the concepts of resource and node managers, application master, and containers, as well as the different execution models (standalone, pseudo-distributed, and fully-distributed). Likewise, we talk about the HDFS, covering the basic design of this filesystem, and what it means in terms of functionality and efficiency. We also discuss recent advances such as erasure coding, HDFS federation, and high availability. Finally, we expose the main limitations of Hadoop and how it has sparked the rise of many new big data frameworks, which now coexist within the Hadoop ecosystem.