To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
• To implement the k-means clustering algorithm in Python.
• To determining the ideal number of clusters by implementing its code.
• To understand how to visualize clusters using plots.
• To create the dendrogram and find the optimal number of clusters for agglomerative hierarchical clustering.
• To compare results of k-means clustering with agglomerative hierarchical clustering.
• To implement clustering through various case studies.
13.1 Implementation of k-means Clustering and Hierarchical Clustering
In the previous chapter, we discussed various clustering algorithms. We learned that clustering algorithms are broadly classified into partitioning methods, hierarchical methods, and density-based methods. The k-means clustering algorithm follows partitioning method; agglomerative and divisive algorithms follow the hierarchical method, while DBSCAN is based on density-based clustering methods.
In this chapter, we will implement each of these algorithms by considering various case studies by following a step-by-step approach. You are advised to perform all these steps on your own on the mentioned databases stated in this chapter.
The k-means algorithm is considered a partitioning method and an unsupervised machine learning (ML) algorithm used to identify clusters of data items in a dataset. It is one of the most prominent ML algorithms, and its implementation in Python is quite straightforward. This chapter will consider three case studies, i.e., customers shopping in the mall dataset, the U.S. arrests dataset, and a popular Iris dataset. We will understand the significance of k-means clustering techniques to implement it in Python through these case studies. Along with the clustering of data items, we will also discuss the ways to find out the optimal number of clusters. To compare the results of the k-means algorithm, we will also implement hierarchical clustering for these problems.
We will kick-start the implementation of the k-means algorithm in Spyder IDE using the following steps.
Step 1: Importing the libraries and the dataset—The dataset for the respective case study would be downloaded, and then the required libraries would be imported.
Step 2: Finding the optimal number of clusters—We will find the optimal number of clusters by the elbow method for the given dataset.
Step 3: Fitting k-means to the dataset—A k-means model will be prepared by training the model over the acquired dataset.
Step 4: Visualizing the clusters—The clusters formed by the k-means model would then be visualized in the form of scatter plots.
Although international legal scholars have never captured or paid attention to the epistemology of the secret at work in international legal thought of practice, the idea of secret has not been totally absent from international legal thought. For instance, international legal scholars have occasionally mobilized the idea of the hermeneutics of suspicion to describe the way in which certain scholars dismiss opponents’ arguments to be ideologically or politically motivated wrong postures as opposed to scientifically valid positions. Likewise, a lot of scholarly works have been focused on the secretive and undisclosed practices which are supposedly at work in various international legal processes. This chapter reviews these contemporary engagements by international lawyers with the idea of secret in international law.
This chapter focuses on the domain of the vegetative soul that represents some of the simplest activities that distinguish the organic from the inorganic. It examines the central vegetative system consisting of the liver, the veins and their supporting organs, as well as the vegetative capacities present in all the tissues that are subservient to this system. The chapter not only discusses the relationship between the central parts and capacities in all of the body, but also examines the ways in which these capacities manifest themselves, arguing they represent Galen’s attempts to grapple with the notion of basic vitality. On some occasions, Galen also calls them ‘demiurgic’, implying a creative capacity. A discussion of how he engages with the pre-existing philosophical tradition and the notion of a biological demiurge helps to delineate the scope of these capacities.
The final chapter entitled Conclusions contains a summary of the findings of the study, explaining the key motivations and claims behind the Galenic understanding of bodily unity.
After careful study of this chapter, students should be able to do the following:
LO1: Describe stresses and displacements for a rotating disk.
LO2: Compare the stress distribution in a flat disk with and without a central hole.
LO3: Illustrate the stress distribution in a disk of variable thickness.
LO4: Design the rotating disk of uniform stress.
7.1 INTRODUCTION [LO1]
The problems of stresses and deformations in disks rotating at high speeds are important in the design of both gas and steam turbines, generators and many such rotating machinery in industry. As discussed in earlier chapters, this is another example of axisymmetric problems in polar coordinates. Although the theoretical treatment of a flat disk is simpler, in many industrial applications, disks are tapered. They are usually thicker near the hub, and their theoretical analysis is slightly more involved. We shall first take up the analysis for flat disks.
In the case of rotating disks with centrifugal force as body force, the equation of equilibrium reduces to as in equation (6.1.3).
Combining this with displacement equations, we have, as in equation (6.1.5), a general equation for determining the stress distribution in axisymmetric problems. This is given as
This is a nonhomogeneous differential equation. The associated homogeneous equation (complementary equation) is
The solution of this equation is Lame's equation as discussed in Chapter 6, equation (6.2.3), and taking into consideration the particular solution, the solution to equation (7.1.2) turns out to be
We may also determine the radial displacement from equation (6.2.11), and this is given as
We may therefore write the stresses and displacement for the rotating disk under one bracket as
With these introductory basic equations, we shall now set out to discuss the stress distribution and displacement in rotating disks.
This chapter looks at the ways in which Galen posits the theoretical unity among the discrete physiological system, especially with a reference to tripartition. Unlike Platonic, the tripartition that is motivated by psychological conflict, Galen’s tripartition of physiological domains shows the three domains to be highly cooperative and co-dependent on each other. The respective material fluxes they control are together necessary for continued functioning. The chapter looks at Galen’s adoption of the popular philosophical idea that identity persists because of the form, and at his analysis of different causes, strongly influenced by the ideas of his contemporary Middle Platonists. While more popular analyses of causes explain that the body is unified in its design, it is the notion of cohesive cause, the chapter argues, that accounts for unified physiological functioning.
• To comprehend the concept of association mining and its applications.
• To understand the role of support, confidence, and lift.
• To understand the naive algorithm for finding association mining rules, its limits, and improvements.
• To learn about different ways to store transaction database storage.
• To understand and apply the Apriori algorithm to identify the association mining rules.
14.1 Introduction to Association Rule Mining
Association rule mining is a rule-based technique to discover the relation between the attributes of a dataset. It is used to find the relation between the sales of item X and item Y. It is often called a “market basket” analysis, as shown in Figure 14.1. Here, the market analyst examines the items that consumers often purchase together to find the relation between the sale of item X and item Y.
In other words, when customers visit a store, they may buy a certain type of items together during a shopping trip. For example, as shown in Figure 14.1, a database of customer’s transactions (e.g., shopping baskets) is shown where each transaction consists of a set of items (e.g., products) purchased during a visit, machine learning (ML) engineers can use association mining for finding out a group of items which are frequently purchased together (customers purchasing behavior). This is also referred to as an analysis of customer purchasing behavior. For example, “IF one buys bread, THEN there is a high probability of buying butter with it”, as it is common that people who buy bread often buy butter with it. The store manager can use this information and arrange the items accordingly to increase sales and the overall efficiency of the store.
Let us consider a situation where the store manager feels that there is a lot of rush and customers always complain about the slow working of his store. He is exploring different ways to improve the efficiency of his store. He performed an association analysis and prepared a list of associated items like bread and butter. He may decide to put all these associated items together on the same shelf or near each other so that customers can find them quickly, reducing their shopping time. It will also improve the overall efficiency of the store and the sale of the products. To further improve the shopping experience of his customers, he can create different combos and put sales over these combos.
• To know the inspiration behind the genetic algorithm.
• To understand the concept of natural selection, recombination, and mutation.
• To understand the correlation between nature and genetic algorithm.
• To formulate the mathematic representation of genes and fitness theory.
• To implement natural selection through roulette wheel.
• To implement recombination or crossover.
• To implement the process of mutation.
• To understand the elitism and its implementation.
• To discuss the advantages and disadvantages of genetic algorithms.
22.1 Intuition of Genetic Algorithm
Genetic algorithm (GA) is inspired by nature, and it plays a vital role in the field of machine learning (ML). It selects the best-optimized solution from all available possible solutions or candidates. As nature selects the best possible candidates using the theory of evolution, in the same way, the GA selects the best possible solution from the available solutions.
One of the applications of GAs in ML is to select the global minima from all possible (local) minima by using natural selection. In earlier chapters, we learned that during the training of an artificial neural network, the main goal is to obtain the weights with a minimum cost function value. The gradient descent algorithm is commonly used to find the local minima of the cost function. But, we must find the global minima to reach the optimal weights. A GA can be used to find the global minima out of all available local minima or possible solutions. In this case, the set of possible local minima becomes the population containing possible candidates.
In this chapter, we will discuss inspiration from nature which is the main driving concept in working of GAs and their implementation. To get a good idea about the GA, we will discuss the basics of natural selection by revisiting the theory of evolution in the next section.
22.2 The Inspiration behind Genetic Algorithm
The concepts discussed in this chapter are also available in the form of the free online Udemy Course, Genetic Algorithm for Machine Learning by Parteek Bhatia,
The GA is one of the first and most well-regarded evolutionary algorithms in computer science literature. John Holland, a researcher at the University of Michigan, gave this algorithm in the 1970s, but it became popular in the ‘90s.
• To learn about different phases of data pre-processing like data cleaning, data integration, data transformation, and data reduction.
• To understand the need for feature scaling.
• To comprehend normalization and standardization techniques for feature scaling.
• To understand principal component analysis for feature extraction.
• To pre-process the categorical data for building machine learning models.
3.1 Need for Data Pre-processing
We live in an age where data is considered oil because we need data to train machine learning (ML) algorithms. The most important job for a data analyst is to collect, clean, and analyze the data and build ML models on the cleaned dataset. But often, the raw data that we obtain is noisy. It consists of many discrepancies, inconsistencies, and often missing values. To understand this situation, let us consider an example.
Suppose we have to predict the house price, and for this, we have collected data from a few previous transactions, as shown in Figure 3.1.
In a perfect situation, the captured data should be of this format, as shown in Figure 3.1. Here, we have the size of the house and the number of bedrooms as input features, while the price is the output attribute. We can predict the price of an unknown instance through regression.
But practically, in most situations, the captured data is not of good quality, and usually, we have a dataset, as shown in Figure 3.2.
You can see that this data is messy. There are a lot of unknown or missing values, and if we trained the model on this data, its prediction would be very poor. Also, you can identify the noise and incorrect labels like the second record price is incorrect and will result in poor model training.
We can also consider some more examples like if someone entered –1 in the “salary credited” column in the case of employee dataset. It does not make any sense and will be considered noise. Sometimes, we may have an unrealistic and impossible combination of data; for example, let us consider a record where we have Gender–Male and Pregnant–Yes.
After careful study of this chapter, students should be able to do the following:
LO1: Describe stress equations in thick cylinders.
LO2: Explain stress distribution in pressurized cylinders.
LO3: Analyze compound cylinders.
LO4: Analyze autofrettage.
LO5: Analyze failure theories for thick cylinders.
6.1 INTRODUCTION [LO1]
In earlier chapters, we have discussed axisymmetric problems in two-dimensional (2D) polar coordinate systems. Thick cylinders fall into this class of problems. Cylindrical pressure vessels, hydraulic cylinders, gun-barrels, and pipes carrying fluids at high pressure develop radial and tangential stresses (circumferential). Longitudinal stresses can also be developed if the ends are closed. Therefore, ideally, this is a triaxial stress system as shown in Figure 6.1.
(a) Circumferential or hoop stress (σθ)
(b) Longitudinal stress (σz)
(c) Radial stress (σr)
If the wall thickness of a hollow cylinder is less than about 10% of its radius, it may be treated as a thin cylinder. Cylinders with higher wall thickness are considered to be thick cylinders. Before analyzing the stress in a thick cylinder, we should briefly consider the stress state in thin cylinders, where radial stress is small compared to the other stresses, and this can be neglected. Stress variation across the thin wall is also negligible. Analysis of thin-walled pressure vessels may therefore be carried out on the basis of biaxial stress system. Since the presence of shear stress at the cut section would lead to incompatible distortion, the longitudinal and circumferential stresses in this case are both principal stresses. We now take another section of the cut section as shown in Figure 6.2 (a) to consider the equilibrium of the section, and this is shown in Figure 6.2 (b).
The section is acted upon by internal pressure p and the circumferential stress developed at the cut section is σθ. Force on an infinitesimal small area subtended by angle dθ at θ inclination from the horizontal axis is pridθ.
• To know about various integrated development environments of Python.
• To implement basic programming constructs using Python.
• To understand the usage of various data types like numbers, list, tuple, strings, set, and dictionary.
• To compare various data types like list, tuple, dictionary, and set.
• To use if and looping statements in Python.
• To define user-defined functions.
Today, Python is known to be one of the most in-demand programming languages. As per the stats of GitHub (a provider of Internet hosting for software development), Python is the second most popular programming language, following JavaScript, as shown in Figure 2.1, and soon it may be on the top of the chart. Python surpassed Java, PHP, and other prominent languages in 2019.
Python is easy and versatile. So it is acclaimed as the major programming language to work on many new-age technologies like machine learning (ML), artificial intelligence, data science, and natural language processing. The creator of Python, Guido van Rossum, in 1991, stated that Python is a high-level programming language, and its core design philosophy is about code readability and syntax, which allows programmers to express concepts in a few lines of code. Interestingly, the name Python is inspired by Guido's favorite television show Monty Python's Flying Circus.
In this chapter, we will discuss various programming constructs of Python so that you can easily implement ML algorithms by using it. Before writing the actual code in Python, let us focus on the features of Python that make it so popular and unique.
2.1 Features of Python
Features offered by Python can be visualized in Figure 2.2. Talking about them profoundly, the main features of Python are as follows:
• Beginner's Language: Python is not only just easy to code and learn, but also fast to grasp, and hence it is a suitable choice for any novice user who wants to learn to program. This is why nowadays this language is introduced to students in schools.
• Interpreted: Unlike other programming languages such as C or C++, Python does not require you to compile programs before executing them. It is an interpreted language, i.e., the code written in Python gets processed in real-time line by line.
• Interactive: The interactive feature of Python enables real-time feedback, allowing programmers to experiment, debug, and make adjustments on the go.
• To understand the need for simple linear regression.
• To comprehend the concept of hypothesis and parameters of simple linear regression.
• To understand mathematical modeling of cost function and its minimization.
• To understand the importance and different steps of the gradient descent algorithm.
• To comprehend the mathematical modeling of the gradient descent algorithm.
• To understand the role of learning rate α.
5.1 Introduction to Simple Linear Regression
As discussed in earlier chapters, regression predicts a continuous value or real-valued output. This chapter will discuss how regression works (from a mathematical aspect) to predict the continuous value for the given dataset. Our first learning algorithm is simple linear regression. In this section, we will discuss the fundamental concepts and mathematical modeling of simple linear regression.
We usually have a dependent variable having a continuous value whose value we wish to predict based on one or more independent variables. If we have only one independent or input variable, this situation is known as simple linear regression (also called univariate regression). If we have multiple independent or input variables, it is known as multiple linear regression or multivariate regression.
Linear regression could be used for studying patterns in different real-life scenarios. Consider a research lab where a researcher wants to understand how the stipend is effected by the years of experience, or, in simple words, we wish to predict the stipend based on the years of experience of the researcher. Machine learning (ML) is about learning from past experiences or data. Thus, to predict the researcher's stipend, we have to collect some data about past researchers, specifically their stipend and experience.
In the supervised learning models, we need a dataset called a training set. We will use the dataset as given in Table 5.1 for training the model, and our job will be to build the ML model that learns from this data and hence predicts the stipend of a researcher based on his experience. Here, the stipend will be considered the dependent or output variable because it depends on the researcher's years of experience. Thus, years of experience will be considered an independent or input variable. So, we will use simple linear regression to build the ML model. For proceeding with this problem, we will use a dataset of researchers’ stipends with their corresponding years of experience, as shown in Table 5.1.