Content Listing

1 - Introduction
from Part I - Understanding and Dealing with Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 3-18
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces what we mean by big data, its importance, and the key principles for handling it efficiently. The term “big data” is frequently used to refer to the idea of exploiting lots of data to obtain some benefit, but there is no standard definition. It is also commonly known as large-scale data processing or data-intensive applications. We discuss the key components of big data, and how it is not all about volume, but also other aspects such as velocity and variety need to be considered. The world of big data has multiple faces such as databases, infrastructure, and security, but we focus on data analytics. Then, we cover how to deal with big data, explaining why we cannot scale up using a single computer, but we must scale out and use multiple machines to process the data. We suggest why traditional high-performance computing clusters are not appropriate for data-intensive applications and how they would collapse the network. Finally, we introduce key features of a big data cluster such as not being focused on pure computation, no need for high-end computers, the need for fault tolerance mechanisms, and respecting the principle of data locality.

Part I - Understanding and Dealing with Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 1-42
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 373-378
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

16 - Externalities
from Part IV - The Sector at Large
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 196-206
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 16 establishes the concept of an externality, a situation where a decision carries additional costs or benefits to a party not involved in the decision-making process. The chapter classifies several types of externalities and explores how and why they lead to undesirable outcomes. Then potential solutions to externalities as well as their strengths and weaknesses are discussed: subsidies, taxes, regulation, and the assignment of property rights. The chapter ends with a discussion of cost–benefit analysis and its role in evaluating interventions aimed at combatting externality problems.

6 - Provider-Firms and the Market
from Part II - Providers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 73-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter begins by applying the theory of the firm to the context of a medical practice. The concepts of profit maximization, inputs to production, and outputs from production are all applied to this example. Then the chapter moves on to discuss sources of efficiency in production and specific applications to medical care production. The chapter then develops the basics of supply curves, and how they are interpreted and used, and then brings supply together with demand to describe the basics of equilibrium in an efficient market. The last part of the chapter discusses threats to efficiency in markets, giving a brief overview of the largest sources of difficulty in applying efficient markets to the healthcare context. This sets up the rest of the book: the healthcare sector is filled with problems that make efficient markets unlikely, meaning that understanding the underlying economics is vital.

9 - Hospitals
from Part II - Providers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 113-126
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 9 explores the hospital as an economic entity. The chapter discusses the differences between for-profit and not-for-profit hospitals, the organization of hospital workforces, and how the for-profit/not-for-profit distinction interacts with the organization. Then the chapter explores how different parties involved with the hospital seek to exert influence over how hospital resources are used: what they want and how they get it. Finally, the chapter covers topics in how hospitals operate within the sector: hospital growth, hospital competition, and operation of hospital systems.

11 - Adverse Selection
from Part III - Health Insurers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 141-150
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 11 focuses on the adverse selection problem in the health insurance market, which is sometimes referred to as the “death spiral.” First, the concept of “uninsurable” individuals is developed and the rationale for eliminating differential pricing based on pre-existing conditions (experience rating) is explored. Then the chapter discusses the consequences of this choice: the adverse selection problem. Finally, alternative solutions to adverse selection are explored – both single-payer systems and market-based solutions such as the Affordable Care Act. The end of chapter supplement explores the historical reasons for differences between the health insurance system in the United States and the system of a close ally, the United Kingdom.

References
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 237-244
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Machine Learning with Spark
from Part III - Machine Learning for Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 177-211
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces the machine learning side of things of this book. Although we assume some prior experience in machine learning, we start off with a full recap of the basic concepts and key terminology. This includes a discussion of learning paradigms, such as supervised and unsupervised learning, and the machine learning life cycle, articulating the steps to go from data collection to model deployment. We cover topics like data preparation and preprocessing, model evaluation and selection, and machine learning pipelines, showing how all the stages of this cycle are susceptible to being compromised when we talk about large-scale data analytics. After that, the rest of the chapter is devoted to the machine learning library of Spark, MLLib. Basic concepts such as Transformers, Estimators, and Pipelines are presented with an example using linear regression. The example provided forces us to use a pipeline of methods to get the data ready for training. This allows us to introduce some of the data preparation packages of Spark (e.g., VectorAssembler or StandardScaler). Finally, we explore evaluation packages (e.g., RegressionEvaluator) and how to perform hyperparameter tuning.

Part IV - The Sector at Large
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 185-236
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

12 - Prices
from Part III - Health Insurers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 151-162
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 12 deals with how prices are set within the healthcare sector. Prices do not come from direct interaction between buyers and sellers but are instead arrived at via a complex negotiation process between multiple parties. This chapter covers the lifecycle of a medical bill, the role of chargemasters (master price lists) in billing, how providers negotiate with insurers, and how the uninsured are treated in billing. The chapter then discusses price competition and quality competition between providers, as well as the role of price transparency.

19 - International Comparisons
from Part IV - The Sector at Large
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 227-236
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 19 explores differences in spending on healthcare and health outcomes across countries. The main result of interest from these comparisons is that the United States spends vastly more than other OECD countries but gets only middle of the road health outcomes. The second part of the chapter explores leading theories as to why this is the case. The final part of the chapter compares different strategies for setting up a healthcare and health insurance sector: the Bismarck model, the Beveridge model, national health insurance, and hybrid models.

8 - Providers and Incentives
from Part II - Providers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 102-112
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 8 explores how incentives influence the behavior of providers. The first part of the chapter is an application of personnel economics, discussing how different pay schemes create incentives for different types of effort. The chapter goes on to discuss the latent variable problem when looking at pay schemes and behavior and covers literature that helps to unpack the causal story. The second half of the chapter discusses demand inducement and its variants (fee splitting and self-referral) from both a theoretical and empirical perspective. The end of chapter supplement gives a brief introduction to behavioral economics and how cognitive biases can also be used to influence provider behavior.

Part III - Health Insurers
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp 127-184
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp vii-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - MapReduce
from Part I - Understanding and Dealing with Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 19-42
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

MapReduce is a parallel programming model that follows a simple divide-and-conquer strategy to tackle big datasets in distributed computing. This chapter begins with a discussion of the key distinguishing features and differences of MapReduce with respect to similar distributing computing tools like Message Passing Interface (MPI). Then, we introduce its two main functions, map and reduce, based on functional programming. After that, the notation of how MapReduce works is presented using the classical WordCount example as the Hello World of big data, discussing different ways to parallelize it and their main advantages and disadvantages. Next, we delve into MapReduce a bit more formally, and its functions in terms of key–value pairs, as well as the key properties of the map, shuffle, and reduce operations. At the end of the chapter we cover some important details as to how to achieve fault tolerance, how to exploit MapReduce to preserve data locality, how it can reduce data transfer across computers using combiners, and additional information about its internal working.

9 - Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment
from Part III - Machine Learning for Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 305-368
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.

Dedication
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp v-vi
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Acknowledgments
Andrew Friedson, Milken Institute, California
Book:

Economics of Healthcare

Published online:

02 November 2023

Print publication:

23 November 2023, pp xix-xx
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Hadoop
from Part II - Big Data Frameworks
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 45-67
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Hadoop is an open-source framework, written in Java, for big data processing and storage that is based on the MapReduce programming model. This chapter starts off with a brief introduction to Hadoop and how it has evolved to become a solid base platform for most big data frameworks. We show how to implement the classical Word Count using Hadoop MapReduce, highlighting the difficulties in doing so. After that, we provide essential information about how the resource negotiator, YARN, and its distributed file system, HDFS, work. We describe step by step how a MapReduce process is executed on YARN, introducing the concepts of resource and node managers, application master, and containers, as well as the different execution models (standalone, pseudo-distributed, and fully-distributed). Likewise, we talk about the HDFS, covering the basic design of this filesystem, and what it means in terms of functionality and efficiency. We also discuss recent advances such as erasure coding, HDFS federation, and high availability. Finally, we expose the main limitations of Hadoop and how it has sparked the rise of many new big data frameworks, which now coexist within the Hadoop ecosystem.

Textbooks

Refine search

Refine search

Actions for selected content:

38627 results in Cambridge Textbooks

1 - Introduction

Summary

Part I - Understanding and Dealing with Big Data

Index

16 - Externalities

Summary

6 - Provider-Firms and the Market

Summary

9 - Hospitals

Summary

11 - Adverse Selection

Summary

References

6 - Machine Learning with Spark

Summary

Part IV - The Sector at Large

12 - Prices

Summary

19 - International Comparisons

Summary

8 - Providers and Incentives

Summary

Part III - Health Insurers

Contents

2 - MapReduce

Summary

9 - Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment

Summary

Dedication

Acknowledgments

3 - Hadoop

Summary

Textbooks

Refine search

Refine search

Actions for selected content:

Save Search

38627 results in Cambridge Textbooks

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary