Definitions¶
Frequentist¶
$ \begin{eqnarray} p(x_i) &=& \cfrac{n_i}{N} \\ \mathrm{probability}(\mathrm{state}~i) &=& \cfrac{\mathrm{number~of~observation~of}~i}{\mathrm{total~number~of~observations}} \end{eqnarray} $
- this is the intuitive definition of probability from observations
Bayesian¶
$ \begin{eqnarray} p(\mathrm{parameters}|\mathrm{data,model}) &=& \cfrac{p(\mathrm{data}|\mathrm{parameters,model})~p(\mathrm{parameters}|\mathrm{model})}{p(\mathrm{data}|\mathrm{model})} \\ &=& \cfrac{\mathrm{likelihood} \times \mathrm{prior}}{\mathrm{evidence}} \end{eqnarray} $
- priors are almost always present, even if not specified
- they're essential for generalization from observations
log likelihood¶
$ \log p(\mathrm{parameters}|\mathrm{data,model}) = \log \mathrm{likelihood} + \log \mathrm{prior} - \log \mathrm{evidence} $
- maximizing log likelhood with a Gaussian distribution results in least-squares
- the prior serves as a penalty term
Statistics¶
expectation¶
$\langle f(x) \rangle = \lim_{N\rightarrow\infty} \frac{1}{N} \sum_{i=1}^N f(x_i) = \int_{-\infty}^\infty f(x)p(x)dx$
- sum over samples
- equivalence with integral over distribution is a frequentist definision of $p(x)$
mean¶
$\mu_x = \langle x \rangle$
- the middle of a distribution
variance¶
$\sigma_x^2 = \langle (x-\mu_x)^2 \rangle$
- neasures the spread of a distribution
standard deviation¶
$\sigma = \sqrt{\sigma^2}$
- corresponds to error bars on a distribution
Distributions¶
long tail¶
- a small number of events occur with high probability, a large number of events occur with low probability
- streaming tracks, video games, market sellers, inventions, ...
multi-modal¶
- a distribution with more than one peak
Gaussian (normal)¶
- Central Limit Theorem: the distribution around the mean value of many samples taken from any distribution converges to a Gaussian
- 1D Gaussian mean $\mu$, variance $\sigma$
$p(x) = \cfrac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}$
Modeling¶
histogram¶
- count points in bins
- Benford's Law
density estimation¶
- fit a function to the distribution
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(10)
npts = 1000
mean = 1
stddev = 2
#
# generate Gaussian samples
#
x = np.random.normal(mean,stddev,npts)
#
# plot histogram and points
#
plt.hist(x,bins=npts//50,density=True)
plt.plot(x,0*x,'|',ms=npts/20)
#
# plot Gaussian
#
xi = np.linspace(mean-3*stddev,mean+3*stddev,100)
yi = np.exp(-(xi-mean)**2/(2*stddev**2))/np.sqrt(2*np.pi*stddev**2)
plt.plot(xi,yi,'r')
plt.show()
Averaging¶
- averaging Gaussian samples reduces errors by $\sqrt{N}$
import numpy as np
import matplotlib.pyplot as plt
mean = 1
width = 2
trials = 100
points = np.arange(10,500,25)
means = np.zeros((trials,len(points)))
#
# loop over number of points
#
for point in range(len(points)):
#
# loop over trials
#
for trial in range(trials):
means[trial,point] = np.mean(np.random.normal(mean,width,points[point]))
#
# plot calculated width/sqrt(N)
#
plt.plot(points,mean+width/np.sqrt(points),'r',label='calculated')
plt.plot(points,mean-width/np.sqrt(points),'r')
#
# plot estimated means and stddevs
#
mean = np.mean(means,0)
stddev = np.std(means,0)
plt.errorbar(points,mean,yerr=stddev,fmt='k-o',capsize=7,label='estimated')
#
# plot points
#
for point in range(len(points)):
plt.plot(np.full(trials,points[point]),means[:,point],'o',markersize=2)
plt.xlabel('number of samples averaged')
plt.ylabel('mean estimates')
plt.legend()
plt.show()
Multidimensional distributions¶
covariance matrix¶
$C_{ij} = \langle (x_i-\mu_i)(x_j-\mu_j)\rangle$
- diagonal is variance
- off-diagonal measures linear dependence between variables
- $|C|$ measures the volume of the distribution, where the magnitude is found from the determinant
- eigenvectors of $C$ point in directions of greatest variance
correlation != causation¶
- tooth brushing and sleeping
D-dimensional Gaussian¶
- with variances
- distribution is aligned with coordinate axes
$ p({\vec x}) = \prod_{d=1}^D \frac{1}{\sqrt{2\pi\sigma_d^2}} e^{\displaystyle-(x_d-\mu_d)^2/2\sigma_d^2} $
- with covariances
- distribution is rotated from coordinate axes
$ p({\vec x}) = \cfrac{|{\bf C}_m^{-1}|^{1/2}}{(2\pi)^{D/2}} e^{\displaystyle-({\vec x}-\vec\mu)^T \cdot {\bf C}^{-1} \cdot ({\vec x}-\vec\mu)/2} ~~~~. $
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(10)
np.set_printoptions(precision=2)
np.set_printoptions(suppress=True)
npts = 2000
#
# generate variance samples
#
mean = [1,1]
stddev = [0.75,2]
varsamples = np.random.normal(mean,stddev,(npts,2))
#
# find mean and variance
#
varmean = np.mean(varsamples,axis=0)
varstd = np.sqrt(np.var(varsamples,axis=0))
varplotx = [varmean[0]-varstd[0],varmean[0]+varstd[0],None,varmean[0],varmean[0]]
varploty = [varmean[1],varmean[1],None,varmean[1]-varstd[1],varmean[1]+varstd[1]]
#
# generate covariance samples
#
mean = [7,5]
covariance = [[2.5,-2.1],[-2.1,2.5]]
covarsamples = np.random.multivariate_normal(mean,covariance,npts)
#
# find mean, covariance, eigenvalues, and eigenvectors
#
covarmean = np.mean(covarsamples,axis=0)
covar = np.cov(covarsamples,rowvar=False)
evalu,evect = np.linalg.eig(covar)
dx0 = evect[0,0]*np.sqrt(evalu[0])
dx1 = evect[1,0]*np.sqrt(evalu[1])
dy0 = evect[0,1]*np.sqrt(evalu[0])
dy1 = evect[1,1]*np.sqrt(evalu[1])
covarplotx = [covarmean[0]-dx0,covarmean[0]+dx0,None,covarmean[0]-dx1,covarmean[0]+dx1]
covarploty = [covarmean[1]+dy0,covarmean[1]-dy0,None,covarmean[1]+dy1,covarmean[1]-dy1]
#
# plot and print
#
print("covariance matrix:")
print(covar)
samples = np.vstack((varsamples,covarsamples))
plt.figure()
plt.hist2d(samples[:,0],samples[:,1],bins=30,cmap='binary')
plt.plot(samples[:,0],samples[:,1],'o',markersize=1.5,alpha=0.3)
plt.plot(varmean[0],varmean[1],'ro')
plt.plot(covarmean[0],covarmean[1],'ro')
plt.plot(varplotx,varploty,'r')
plt.plot(covarplotx,covarploty,'r')
plt.text(2.5,-4,"variance",fontsize=15)
plt.text(-1.25,8.5,"covariance",fontsize=15)
plt.axis('off')
plt.show()
#
# print covariance matrices
#
print("covariance matrix:")
varcovar = np.cov(varsamples,rowvar=False)
print(varcovar)
covariance matrix: [[ 2.46 -2.11] [-2.11 2.53]]
covariance matrix: [[ 0.57 -0.01] [-0.01 3.77]]
import numpy as np
import matplotlib.pyplot as plt
nbins = 256
xmin = -4
xmax = 4
x = np.linspace(xmin,xmax,nbins)
print(f"{nbins} bins = {np.log2(nbins):.0f} bits")
#
def entropy(dist):
positives = dist[np.where(dist > 0)] # 0 log(0) = 0
return -np.sum(positives*np.log2(positives))
#
uniform = np.ones(nbins)/nbins
#
mean = 0
std = 1
normal = np.exp(-(-x-mean)**2/(2*std**2))
normal = normal/np.sum(normal)
#
onehot = np.zeros(nbins)
onehot[nbins//2] = 1
#
fig,axs = plt.subplots(3,1)
fig.canvas.header_visible = False
width = 1.1*(xmax-xmin)/nbins
axs[0].bar(x,uniform)
axs[0].set_title(f"uniform entropy: {entropy(uniform):.1f} bits")
axs[1].bar(x,normal,width=width)
axs[1].set_title(f"Gaussian entropy: {entropy(normal):.1f} bits")
axs[2].bar(x,onehot,width=width)
axs[2].set_title(f"one-hot entropy: {entropy(onehot):.1f} bits")
plt.tight_layout()
256 bins = 8 bits
mutual information¶
- $H(x)+H(y)-H(x,y) = \sum_i p(x_i)\log_2 p(x_i) + \sum_j p(y_j)\log_2 p(y_j) - \sum_i \sum_j p(x_i,y_j)\log_2 p(x_i,y_j)$
- equal to the difference in the information between the samples separately and together
- measures nonlinear as well as linear relationships
- without priors the data dependence for estimation goes up exponentially with dimension
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(10)
np.set_printoptions(precision=1)
np.set_printoptions(suppress=True)
npts = int(1e5)
nbins = 256
print(f"{nbins} bins\n")
#
def entropy(dist):
index = np.where(dist > 0) # 0 log(0) = 0
positives = dist[index]
return -np.sum(positives*np.log2(positives))
def entropy2(dist):
indexx,indexy = np.where(dist > 0) # 0 log(0) = 0
positives = dist[indexx,indexy]
return -np.sum(positives*np.log2(positives))
def information(x,y):
xhist,xedges = np.histogram(x,nbins)
xdist = xhist/np.sum(xhist)
yhist,yedges = np.histogram(y,nbins)
ydist = yhist/np.sum(yhist)
xyhist,xedges,yedges = np.histogram2d(x,y,[nbins,nbins])
xydist = xyhist/np.sum(xyhist)
Hx = entropy(xdist)
Hy = entropy(ydist)
Hxy = entropy2(xydist)
return Hx+Hy-Hxy
#
xuniform = np.random.uniform(-1,1,npts)
yuniform = np.random.uniform(-1,1,npts)
covar = np.cov(np.c_[xuniform,yuniform],rowvar=False)
print(f"{npts:.0e} points")
print(f"uniform covariance:\n{covar}")
I = information(xuniform,yuniform)
plt.plot(xuniform,yuniform,'o')
plt.title(f"uniform mutual information: {I:.1f} bits")
plt.show()
#
xuniform = np.random.uniform(-1,1,npts//100)
yuniform = np.random.uniform(-1,1,npts//100)
covar = np.cov(np.c_[xuniform,yuniform],rowvar=False)
print(f"{npts//100:.0e} points")
print(f"uniform covariance:\n{covar}")
I = information(xuniform,yuniform)
plt.plot(xuniform,yuniform,'o')
plt.title(f"uniform mutual information: {I:.1f} bits")
plt.show()
#
angles = np.random.uniform(0,2*np.pi,npts)
r = 1
xcircle = r*np.cos(angles)
ycircle = r*np.sin(angles)
covar = np.cov(np.c_[xcircle,ycircle],rowvar=False)
print(f"{npts:.0e} points")
print(f"circle covariance:\n{covar}")
I = information(xcircle,ycircle)
plt.plot(xcircle,ycircle,'o')
plt.title(f"circle mutual information: {I:.1f} bits")
plt.show()
#
xlinear = np.random.uniform(-1,1,npts)
ylinear = xlinear
covar = np.cov(np.c_[xlinear,ylinear],rowvar=False)
print(f"{npts:.0e} points")
print(f"linear covariance:\n{covar}")
I = information(xlinear,ylinear)
plt.plot(xlinear,ylinear,'o')
plt.title(f"linear mutual information: {I:.1f} bits")
plt.show()
256 bins 1e+05 points uniform covariance: [[0.3 0. ] [0. 0.3]]
1e+03 points uniform covariance: [[ 0.3 -0. ] [-0. 0.3]]
1e+05 points circle covariance: [[ 0.5 -0. ] [-0. 0.5]]
1e+05 points linear covariance: [[0.3 0.3] [0.3 0.3]]
(c) Neil Gershenfeld for Fab Futures, 12/14/25