you can get cdf easily. so pdf via cdf
import numpy as np import matplotlib.pyplot as plt import scipy.interpolate import scipy.stats def setGridLine(ax): #//jonathansoma.com/lede/data-studio/matplotlib/adding-grid-lines-to-a-matplotlib-chart/ ax.set_axisbelow(True) ax.minorticks_on() ax.grid(which='major', linestyle='-', linewidth=0.5, color='grey') ax.grid(which='minor', linestyle=':', linewidth=0.5, color='#a6a6a6') ax.tick_params(which='both', # Options for both major and minor ticks top=False, # turn off top ticks left=False, # turn off left ticks right=False, # turn off right ticks bottom=False) # turn off bottom ticks data1 = np.random.normal(0,1,1000000) x=np.sort(data1) y=np.arange(x.shape[0])/(x.shape[0]+1) f2 = scipy.interpolate.interp1d(x, y,kind='linear') x2 = np.linspace(x[0],x[-1],1001) y2 = f2(x2) y2b = np.diff(y2)/np.diff(x2) x2b=(x2[1:]+x2[:-1])/2. f3 = scipy.interpolate.interp1d(x, y,kind='cubic') x3 = np.linspace(x[0],x[-1],1001) y3 = f3(x3) y3b = np.diff(y3)/np.diff(x3) x3b=(x3[1:]+x3[:-1])/2. bins=np.arange(-4,4,0.1) bins_centers=0.5*(bins[1:]+bins[:-1]) cdf = scipy.stats.norm.cdf(bins_centers) pdf = scipy.stats.norm.pdf(bins_centers) plt.rcParams["font.size"] = 18 fig, ax = plt.subplots(3,1,figsize=(10,16)) ax[0].set_title("cdf") ax[0].plot(x,y,label="data") ax[0].plot(x2,y2,label="linear") ax[0].plot(x3,y3,label="cubic") ax[0].plot(bins_centers,cdf,label="ans") ax[1].set_title("pdf:linear") ax[1].plot(x2b,y2b,label="linear") ax[1].plot(bins_centers,pdf,label="ans") ax[2].set_title("pdf:cubic") ax[2].plot(x3b,y3b,label="cubic") ax[2].plot(bins_centers,pdf,label="ans") for idx in range(3): ax[idx].legend() setGridLine(ax[idx]) plt.show() plt.clf() plt.close()View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Prerequisites:
- Matplotlib
- Numpy
- Scipy
- Statistics
Normal Distribution is a probability function used in statistics that tells about how the data values are distributed. It is the most important probability distribution function used in statistics because of its advantages in real case scenarios. For example, the height of the population, shoe size, IQ level, rolling a die, and many more.
The probability density function of normal or Gaussian distribution is given by:
Probability Density Function
Where, x is the variable, mu is the mean, and sigma standard deviation
Modules Needed
- Matplotlib is python’s data visualization library which is widely used for the purpose of data visualization.
- Numpy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays. It is the fundamental package for scientific computing with Python.
- Scipy is a python library that is useful in solving many mathematical equations and algorithms.
- Statistics module provides functions for calculating mathematical statistics of numeric data.
Functions used
- To calculate mean of the data
Syntax:
mean(data)- To calculate standard deviation of the data
Syntax:
stdev(data)- To calculate normal probability density of the data norm.pdf is used, it refers to the normal probability density function which is a module in scipy library that uses the above probability density function to calculate the value.
Syntax:
norm.pdf(Data, loc, scale)
Here, loc parameter is also known as the mean and the scale parameter is also known as standard deviation.
Approach
- Import module
- Create data
- Calculate mean and deviation
- Calculate normal probability density
- Plot using above calculated values
- Display plot
Below is the implementation.
Python3
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
x_axis = np.arange(-20, 20, 0.01)
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()
Output:
The output of above code