Chapter Objectives
• To import the necessary libraries and loading the dataset.
• To split the dataset into training and testing datasets.
• To build the simple linear model and make predictions.
• To visualize the training set and testing set results.
• To calculate mean absolute error, mean squared error, and root mean squared error.
In this chapter, we are going to implement the simple linear regression in Python. To implement this concept, we will analyze how the stipend of a researcher is related to their years of research experience. Our aim is to predict the stipend of the researcher based on his/her research experience.
Problem Statement and Dataset
To perform this task, we will consider a dataset consisting of two attributes: ResearchExperience and Stipend. There are 30 observations in this dataset to draw the correlation between the research experience and their corresponding stipend. A research institute aims to find this correlation between research experience and the stipend. This will assist the management in providing an appropriate stipend to new research scholars based on their years of research experience, rather than deciding randomly. The obvious thing is that the stipend is directly proportional to the research experience. The higher the experience, the more will be the stipend. We will use a simple linear regression model to solve this problem.
Let us quickly refresh our concepts of a simple linear regression model. We know that simple linear regression can best fit the straight line to generate a relationship between the research experience and the stipend. Though the dataset is quite simple, it has a great business value to it, as the model created will help the institute predict the stipend of the researcher based on their experience. Therefore, using this model, the stipend of the new researchers can be easily predicted, and this would also acknowledge the transparency in the management. Here, ResearchExperience (independent variable) will be our X and act as horizontal axis. In contrast, the Stipend (variable to be predicted is the dependent variable) will be Y and act as vertical axis, as shown in Figure 6.1.
Review the options below to login to check your access.
Log in with your Cambridge Aspire website account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.