FAB Futures - Data Science
Home About

Research > Numpy-Fitting Function¶

Tutorial > Using Numpy Polyfit Function¶

Numpy Tutorial: Curve Fitting using Numpy's Polyfit Function (https://www.youtube.com/watch?v=Dggl0fJJ81k)

  • In this tutorial, we will describe a distribution of points with y-values that is a polynomial curve shape that starts low (at lowest x-value), rises to a peak in the middle and drops down to a low value again at the end (max x-value)
  • np.polyfit requires 3 parameters: x-range, y-range, polynomial degree
In [5]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
print(curve) #prints out an array that represents a polynomial function for the specific dataset model
[-0.09398496  1.97368421 -1.73684211]

The resulting array values represents multiplication values for x^2, x^1 and x^0

Here the code is modified to show the equation...

In [7]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
poly = np.poly1d(curve)
print(poly) #prints out the polynomial equation
          2
-0.09398 x + 1.974 x - 1.737
In [7]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
poly = np.poly1d(curve)
print(poly) #prints out the polynomial equation
          2
-0.09398 x + 1.974 x - 1.737

Now, we can print out the Fitting Function curve value relative to specific points in the polynomial curve. For example, what is the y-value of the Fitting Line at the polynomial y-value '10'?

In [8]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
poly = np.poly1d(curve)
print(poly(10)) #prints out the polynomial equation
8.6015037593985

...at polynomial x-value '10', the Fitting Line y-value is 8.6...

Let's see what the Fitting Line y-value is at polynomial y-value '1'?

In [9]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
poly = np.poly1d(curve)
print(poly(1)) #prints out the polynomial equation
0.14285714285715123

...at polynomial x-value '1', the Fitting Line y-value is 0.1428...

Let's see what the Fitting Line y-value is at polynomial y-value '19.5'?

In [10]:
# Polynomial Curve Fitting

import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [1,2,3,4,5,6,7,8,9,10,10,9,8,7,6,5,4,3,2,1]

curve = np.polyfit(x,y,2)
poly = np.poly1d(curve)
print(poly(19.5)) #prints out the polynomial equation
1.0122180451127814

...at polynomial x-value '19.5', the Fitting Line y-value is 1.012...

Final Observation
-The resulting fitting line approximates the shape of the data distribtion we created

  • ...with a low y-value (0.14285714285715123) at a low x-value (1)
  • ...rising to a high y-value (8.6015037593985) at the middle x-value in the range (10)
  • ...and heading to a low y-value at a high x-value (19.5).

Study > Dissecting Neil's Numpy Polynomial Fit Function Example¶

Neil provided this Polynomial Fit Function example...

In [11]:
import numpy as np
import matplotlib.pyplot as plt
xmin = 0
xmax = 2
noise = 0.05
npts = 100
a = 0.5
b = 1
c = -.3
np.random.seed(0)
x = xmin+(xmax-xmin)*np.random.rand(npts) # generate random x
y = a+b*x+c*x*x+np.random.normal(0,noise,npts) # evaluate polynomial at x and add noise
coeff1 = np.polyfit(x,y,1) # fit first-order polynomial
coeff2 = np.polyfit(x,y,2) # fit second-order polynomial
xfit = np.arange(xmin,xmax,(xmax-xmin)/npts)
pfit1 = np.poly1d(coeff1)
yfit1 = pfit1(xfit) # evaluate first-order fit
print(f"first-order fit coefficients: {coeff1}")
pfit2 = np.poly1d(coeff2)
yfit2 = pfit2(xfit) # evaluate second-order fit
print(f"second-order fit coefficients: {coeff2}")
plt.plot(x,y,'o')
plt.plot(xfit,yfit1,'g-',label='linear')
plt.plot(xfit,yfit2,'r-',label='quadratic')
plt.legend()
plt.show()
first-order fit coefficients: [0.41918275 0.69084816]
second-order fit coefficients: [-0.3225953   1.04205042  0.49756991]
No description has been provided for this image

Let's try to understand it line by line.

The first section of the code just defines variables to be used in the program...

In [12]:
import numpy as np #imports numpy library and assigns it a shortened variable name 'np'
import matplotlib.pyplot as plt #imports matplotlib library and assigns it a shortened variable name 'plt'
xmin = 0 #creates an 'xmin' (x range minimum) variable and assigns it the value zero
xmax = 2 #creates an 'xmax' (x range maximum) variable and assigns it the value 2
noise = 0.05 #creates an 'noise' variable and assigns it the value 0.05
npts = 100 #creates an 'npts' (number of points) variable and assigns it the value 100
a = 0.5 #creates an 'a' variable and assigns it the value 0.5
b = 1 #creates an 'b' variable and assigns it the value 1
c = -0.3 #creates an 'a' variable and assigns it the value -0.3
np.random.seed(0) #provide the numpy random seed command with a value of zero

The second section of the code defines 2 more variables as the result of 2 equations.

  • 'x' is defined as 'xmin'(0) plus 'x-range' (2) times a random number between 0 and 100.
  • *'y'** is defined as a polynomimal equation utilizing defined a, b, c and x values...plus a numpy random normal (normally distributed random numbers) command with...
    • zero as the mean (loc) of the normal distribution of points
    • noise value as the standard deviation (scale) of the normal distribution of points
    • npts value as the Shape (size, number of samples) of the normal distribution of points
In [14]:
x = xmin+(xmax-xmin)*np.random.rand(npts) # generate random x
y = a+b*x+c*x*x+np.random.normal(0,noise,npts) # evaluate polynomial at x and add noise

The third section of the code...

Runs the polynomial fit function for the first and second order polynomial, and assigns the result to variables coeff1 and coeff2, respectively.

In [15]:
coeff1 = np.polyfit(x,y,1) # fit first-order polynomial
coeff2 = np.polyfit(x,y,2) # fit second-order polynomial

The forth section of the code...

Defines 2 new variables...

  • xfit which is defined by the numpy arrange command that generates an array of evenly spaced values within a range. Three parameters are 'xmin' for the start value, 'xmax' for the stop value and '(xmax-xmin)/npts' as the step value
  • pfit1 which is defined by the numpy polynomial object command which creates a polynomial object that behaves like a function.

And then evaluates the 1st Order Fit in the equation...yfit1 = pfit1 * xfit

And lastly prints out the statement "first-order fit coefficient: " and the value assigned to the variable 'coeff1'.

In [16]:
xfit = np.arange(xmin,xmax,(xmax-xmin)/npts)
pfit1 = np.poly1d(coeff1)
yfit1 = pfit1(xfit) # evaluate first-order fit
print(f"first-order fit coefficients: {coeff1}")
first-order fit coefficients: [0.38760327 0.69691175]

The fifth section of the code...

A process similar to that of the previous section is written, but this time for the valuation of the second-order fit evaluation.

  • Using the same 'xfit' variable defined above, a polynomial object is created for the second coefficient value (variable 'coeff2') and assigned to the variable 'pfit2'.
  • Then evaluates the 2nd Order Fit in the equation...yfit2 = pfit2 * xfit
  • And prints out the statement "second-order fit coefficient: " and the value assigned to the variable 'coeff2'.
In [17]:
pfit2 = np.poly1d(coeff2)
yfit2 = pfit2(xfit) # evaluate second-order fit
print(f"second-order fit coefficients: {coeff2}")
second-order fit coefficients: [-0.31057752  1.01020344  0.49833173]

The Sixth section of the code...

Generates visualizations of the results above. First the Normal Distribution of Points...

In [21]:
plt.plot(x,y,'o')
Out[21]:
[<matplotlib.lines.Line2D at 0xf6e1bbb716d0>]
No description has been provided for this image

Then the Linear Fit Function Line over the normal distribution of points...

In [24]:
plt.plot(x,y,'o')
plt.plot(xfit,yfit1,'g-',label='linear')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
And finally the **Quadratic Fit Function Line** over the same normal distribution of points.
In [26]:
plt.plot(x,y,'o')
plt.plot(xfit,yfit1,'g-',label='linear')
plt.plot(xfit,yfit2,'r-',label='quadratic')
plt.legend()
plt.show()
No description has been provided for this image

Observation

It is clear when 2 types of Fit Functions were applied to the same Normal Distribution of Points, that the curved Quadratic line (second order fit) better fits (visually) than the straight Linear line (first order fit)