Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Chapter 8: Implementing Classical Methods: k-Means and Linear Regression

Chapter 8: Implementing Classical Methods: k-Means and Linear Regression

pp. 254-304

Authors

, University of Nottingham, , Public University of Navarre
Resources available Unlock the full potential of this textbook with additional resources. There are free resources and Instructor restricted resources available for this textbook. Explore resources
  • Add bookmark
  • Cite
  • Share

Extract

In this chapter we cover clustering and regression, looking at two traditional machine learning methods: k-means and linear regression. We briefly discuss how to implement these methods in a non-distributed manner first, to then carefully analyze the bottlenecks of these methods when manipulating big data. This enables us to design global-based solutions based on the DataFrame API of Spark. The key focus is on the principles for designing solutions effectively. Nevertheless, some of the challenges in this chapter are to investigate tools from Spark to speed up the processing even further. k-means is an example of an iterative algorithm, and how to exploit caching in Spark, and we analyze its implementation with both RDD and DataFrame APIs. For linear regression, we first implement the closed form, which involves numerous matrix multiplications and outer products, to simplify the processing in big data. Then, we look at gradient descent. These examples give us the opportunity to expand on the principles of designing a global solution, and also allow us to show how knowing the underlying platform, Spark in this case, well is essential to really maximize the performance.

Keywords

  • <span class='italic'>k</span>-means
  • linear regression
  • distributed implementation
  • closed-form solution
  • gradient descent
  • matrix multiplication
  • outer product
  • iterative algorithms

About the book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook
US$39.99
Paperback
US$39.99

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers