Large-Scale Data Analytics with Python and Spark: A Hands-on Guide to Implementing Machine Learning Solutions

Isaac Triguero; Mikel Galar

doi:10.1017/9781009318242

Chapter 9: Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment

pp. 305-368

Isaac Triguero

, University of Nottingham,

Mikel Galar

, Public University of Navarre

Get access

Add bookmark
Cite
Share

Extract

The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.

Keywords

Semi-supervised Learning
self-labeling
classification
ensembles
bagging
boosting
model deployment
deep learning
distributed inference

About the book

Chapter DOI https://doi.org/10.1017/9781009318242.010
Book DOI https://doi.org/10.1017/9781009318242
Subjects Computer Science,Data Science, Databases, Data Mining, and Information Retrieval,Machine Learning and Pattern Recognition
Format: Paperback
- Publication date: 08 February 2024
- ISBN: 9781009318259
Format: Digital
- Publication date: 15 December 2023
- ISBN: 9781009318242
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$39.99

Paperback

US$39.99

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers

Large-Scale Data Analytics with Python and Spark A Hands-on Guide to Implementing Machine Learning Solutions