Calculate accuracy of linear regression in python

In linear regression, you are attempting to build a model that allows you to predict the value of new data, given the training data used to train your model. This will become clear as we work through this post.

Calculate accuracy of linear regression in python

Courtesy of Department of Statistics, ITS Surabaya

Above, we can see the simple linear regression equation. The y-variable is considered our response or dependent variable. This is what we intend to predict, for example, Sales is a popular choice.

The B0 is the y-intercept, i.e) Where X=0 and the line meets the y-axis. The B1X is essentially, our B1 (The amount of impact our X has on our y), and our X, which is our feature/independent variable. Unlike our y-variable, multiple X’s can be used with a corresponding beta (coefficient for each). This allows us to create a model with many feature (X) variables to predict values in y. The random error component is irreducible error.

First Step: Visualization

Using visualisation, you should be able to judge which variables have a linear relationship with y. Start by using Seaborn’s pairplot.

In this case, we have used “Sales” as our response/y. Substitute the variables list of beta’s with your anticipated feature list

Calculate accuracy of linear regression in python

Seaborn Pairplot

Additional parameters to use:

size= : Allows you to manipulate the size of the rendered pairplot

kind= ‘reg’ : Will attempt to add line of best fit and a 95% confidence band. Will aim to minimize sum of squared error.

Second Step: SK Learn — Setting variables

Scikit-Learn expects X to be a feature matrix (Pandas Dataframe) and y to be a response vector (Pandas Series). Let’s begin by separating our variables as below.

Handling your features (X):

In this example, we are using the columns TV, Radio, and Social as predictor variables.

Calculate accuracy of linear regression in python

Handling your response (y):

Calculate accuracy of linear regression in python

If you are wondering why a capital X is used for features, and lowercase y for response, it is mainly due to convention.

Third Step: SK Learn — Splitting our data

Splitting X & y into training and testing sets:

By passing our X and y variables into the train_test_split method, we are able to capture the splits in data by assigning 4 variables to the result.

Calculate accuracy of linear regression in python

Fourth step: SK Learn — Training our model

Firstly, importing of sklearn.linear_model is required for us to access LinearRegression. It then needs to be instantiated and model fit to our training data. This is seen below.

Calculate accuracy of linear regression in python

Instantiate and fitting model to training data

Fifth step: Interpreting Coefficients

The coefficients will allow us to model our equation with values for our beta’s. The linreg variable (assigned to a LinearRegression object), is able to have the intercept and coefficients extracted, using the code below.

Calculate accuracy of linear regression in python

Extracting data from model

The intercept will be your B0 value; and each coefficient will be the corresponding Beta for the X’s passed (in their respective order).

Sixth step: Making predictions based on your model

Making predictions based on your model is as simple as using the code below: passing the predict method your test data. This will return predicted values of y given the new test X data.

Calculate accuracy of linear regression in python

Returns results of y predictions given X data in X_test

Seventh Step: Model Evaluation

There are three primary metrics used to evaluate linear models. These are: Mean absolute error (MAE), Mean squared error (MSE), or Root mean squared error (RMSE).

MAE: The easiest to understand. Represents average error

MSE: Similar to MAE but noise is exaggerated and larger errors are “punished”. It is harder to interpret than MAE as it’s not in base units, however, it is generally more popular.

RMSE: Most popular metric, similar to MSE, however, the result is square rooted to make it more interpretable as it’s in base units. It is recommended that RMSE be used as the primary metric to interpret your model.

Below, you can see how to calculate each metric. All of them require two lists as parameters, with one being your predicted values and the other being the true values

Calculate accuracy of linear regression in python

Eighth Step: Feature Selection

Once you have obtained your error metric/s, take note of which X’s have minimal impacts on y. Removing some of these features may result in an increased accuracy of your model.

So, We begin a process of trial and error, where the process is started over again, until a satisfactory model is produced. The steps below may be useful for this particular part.

  1. Replace feature_cols & X
  2. Train_test_split your data
  3. Fit the model to linreg again using linreg.fit
  4. Make predictions using (y_pred = linreg.predict(X_test))
  5. Compute RMSE
  6. Repeat until RMSE satisfactory

How do you find the accuracy score in a linear regression in Python?

For regression, one of the matrices we've to get the score (ambiguously termed as accuracy) is R-squared (R2). You can get the R2 score (i.e accuracy) of your prediction using the score(X, y, sample_weight=None) function from LinearRegression as follows by changing the logic accordingly.

How do you calculate linear regression accuracy?

Mathematically, the RMSE is the square root of the mean squared error (MSE), which is the average squared difference between the observed actual outome values and the values predicted by the model. So, MSE = mean((observeds - predicteds)^2) and RMSE = sqrt(MSE ). The lower the RMSE, the better the model.

How does Python calculate accuracy?

How to Calculate Balanced Accuracy in Python Using sklearn.
Balanced accuracy = (Sensitivity + Specificity) / 2..
Balanced accuracy = (0.75 + 9868) / 2..
Balanced accuracy = 0.8684..

Does linear regression have accuracy?

Show activity on this post. Linear Regression have simple numbers it is common to have 100% accuracy on large dataset. Try with other datasets once.