Hướng dẫn minimize loss function python

In this exercise you'll implement linear regression "from scratch" using scipy.optimize.minimize.

We'll train a model on the Boston housing price data set, which is already loaded into the variables X and y. For simplicity, we won't include an intercept in our regression model.

Im looking forward to minimize this function below to estimate the parameters of the normal distribution

Function image

My code looks like this:

import numpy as np
from scipy import stats
from scipy.optimize import minimize
x = [1,2,3,4,5]
def oro(theta, x):
    norma = 0 
    b = 1
    u = theta[0]
    o = theta[1]
    x = np.array(x)
    x0 = 0
    f0 = -(((1/(o*(2*3.14)**(0.5)))*(2.718)**-(((x0-u)**2)/(2*(o**2))))**b)**-1
    for i in range(x.size):
        f = (1/(o*(2*3.14)**(0.5)))*(2.718)**-(((x[i]-u)**2)/(2*(o**2)))**b
        norma += f0*f
    return norma
theta_init = [0, 1]
res = minimize(oro, theta_init, args=x)
res

But in the end I get this:

<ipython-input-81-ee81472a023a>:8: RuntimeWarning: divide by zero encountered in double_scalars
  f0 = -(((1/(o*(2*3.14)**(0.5)))*(2.718)**-(((x0-u)**2)/(2*(o**2))))**b)**-1
<ipython-input-81-ee81472a023a>:11: RuntimeWarning: invalid value encountered in double_scalars
  norma += f0*f
<ipython-input-81-ee81472a023a>:8: RuntimeWarning: divide by zero encountered in double_scalars
  f0 = -(((1/(o*(2*3.14)**(0.5)))*(2.718)**-(((x0-u)**2)/(2*(o**2))))**b)**-1
<ipython-input-81-ee81472a023a>:11: RuntimeWarning: invalid value encountered in double_scalars
  norma += f0*f
<ipython-input-81-ee81472a023a>:8: RuntimeWarning: divide by zero encountered in double_scalars
  f0 = -(((1/(o*(2*3.14)**(0.5)))*(2.718)**-(((x0-u)**2)/(2*(o**2))))**b)**-1
<ipython-input-81-ee81472a023a>:11: RuntimeWarning: invalid value encountered in double_scalars
  norma += f0*f
      fun: nan
 hess_inv: array([[9.57096191e+02, 2.41349815e+01],
       [2.41349815e+01, 8.33412317e-01]])
      jac: array([nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 357
      nit: 4
     njev: 119
   status: 2
  success: False
        x: array([165623.69347712,   1751.95100725])

Tell me, please, what am I doing wrong?

Update after 1 answer(added bounds). I get less errors but still unsuccessful:

<ipython-input-271-b51d0c455468>:8: RuntimeWarning: divide by zero encountered in double_scalars
  f0 = -(((1/(std*(2*np.pi)**(0.5)))*(np.exp(1))**-(((x0-mean)**2)/(2*(std**2))))**b)**-1
<ipython-input-271-b51d0c455468>:11: RuntimeWarning: invalid value encountered in double_scalars
  norma += f0*f
      fun: nan
 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
      jac: array([-0.00012861,  0.00018581])
  message: 'ABNORMAL_TERMINATION_IN_LNSRCH'
     nfev: 75
      nit: 2
     njev: 25
   status: 2
  success: False
        x: array([250.13040562, 343.06899721])

In the previous lesson, we learned how to calculate the loss of a regression line, and we used a manual method for minimizing the loss function, in order to find the optimal slope for our regression line.

Our goal in this lesson is to learn how to use automated methods to minimize the loss function. These methods will help us automatically find the regression slope associated with the lowest sum of squared errors. We'll explore two separate methods.

First we'll use a manual approach for minimizing the loss function. We'll do this by making multiple guesses for the slope and comparing the sum of squared errors. This is an unsophisticated approach, which we'd never use in a real-world case, but working through the problem this way will make all the following approaches much easier to understand.

Second we'll learn about the normal equation. This is a formula that calculates the optimal regression coefficients using matrix multiplication from linear algebra.

We'll start step one by drawing a scatterplot, which we'll use to track the sum of squared errors for our guesses. We'll place the sum of squared errors on the Y axis and the slope on the X axis. We'll place our most recent guess, which gave us a sum of squared error of 31.5, for our guess regression slope of two. Let's say we're interested in comparing this to a regression slope of 2.5. We'll skip past the sum of squared errors calculation to get the result, 99. That tells us we're moving in the wrong direction. So let's try something lower, like 1.5.

That gives us a result of nine.

This looks very low, but let's try a slope of one to see if we get an even lower sum of squared errors. Now we're back up to 31.5.

And 0.5 brings us back to 99. If we were to pick a slope of zero or three, we'd get an even higher sum of squared errors of 211.5. With these guesses, we can see a clear pattern emerging. When plotted out like this, our guesses have a noticeable curve that resembles a valley. The value at the bottom of the valley, 1.5, appears to be the optimal value for our slope.

This isn't surprising as we knew this before, but now we've proven it. However, as you can see, this has been a very hectic and tedious process. And this was working with a very small data set that happened to have very simple numbers for our coefficients. If our slope was something like 1.4572, it would take quite a while to manually find this value, and keep in mind that we made it easy on ourselves by already having the correct intercept. When we try to find both a slope and intercept, we need a 3D chart that compares our guess for the regression slope with our guess for the intercept and the sum of squared errors for the combination of those guesses. When we do that, our slope from the 2D chart looks like an inverted hill or a hollow.

We'll now move on to step two, where we'll learn of an alternative approach called the normal equation. This is a closed-form solution, which means that we simply need to plug in the values and make the calculation. Understanding precisely why this formula will output our regression coefficients requires a good understanding of Calculus, but that's beyond the scope of this course, so we'll just jump to the equation and its components. This equation will likely not make much sense to those without knowledge of linear algebra. If you're struggling to follow, feel free to skip this section and jump ahead to the next lesson, and don't worry about missing anything important. When deploying a linear regression algorithm in Python, all of these calculations are handled in the background.

Fortunately, we can make sense of this equation if we recall our knowledge of multi-dimensional arrays in Python. The Xs in the equation refer to a multi-dimensional array of values, just like a multi-dimensional NumPy array. In linear algebra terms, this is called a matrix.

The first column in this matrix will all contain the number one. And the second column will contain our predictor variable values. We can also add further columns to this matrix for any additional predictor variables.

Some of the X matrices have a T next to them, meaning that they are transposed. This means that a matrix with 10 columns and two rows becomes a matrix with two columns and 10 rows, but otherwise the values stay the same. The power to the negative one component means that we inverse the matrix. This is a little complex, but we'll cover it in more detail in the lesson summary, if you're interested in learning more. The periods represent dot matrix multiplication, which is a method for multiplying two matrices by each other. Finally, the Y refers to an array containing the values for umbrella sales.

If we open a Jupiter notebook, we can manually recreate this formula using NumPy.

We'll start by writing the first cell, which creates our Y variable based on umbrella sales.

Next, we have our X variable.

Note how this is an array made up of multiple pairings of one and each of the values from the predictor variable rainfall.

Below that, we have the XT variable.

This simply transposes the X matrix.

At the bottom, we have our formula.

This starts by using the INV function, which is short for inverse.

This is from the linalg or linear algebra portion of the NumPy library.

This function is applied to the dot multiplication of X and XT.

We then dot multiply this by another transposed X matrix.

And finally, we dot multiply by the Y variable.

When we run the code, our output is an array containing the optimal values for the intercept and slope.

We'll stop the lesson here and summarize what we've learned. First, we used a manual approach for minimizing the loss function.

We did this by making multiple guesses for the slope and comparing the sum of squared errors.

Second, we learned about the normal equation.

This is a formula that calculates the optimal regression coefficients using matrix multiplication from linear algebra.

Although this formula is a little complex, we can see that the closed-form solution of a single formula has its benefits.

That's why this is the approach used by the Python function we'll cover in a later lesson.

The one drawback is that it's inefficient when dealing with data sets that have a very large number of predictor variables.

In the next lesson, we'll explore the use case for a regression model that we'll deploy in Jupiter notebook and make some modifications to our data.