What is Density Estimation in Data Science?¶
Density estimation is the process of estimating the probability distribution of a dataset—that is, figuring out how data values are spread across possible values
Why Density Estimation is Important¶
In data science, density estimation helps to:
Understand data distribution
Detect outliers and anomalies
Perform probabilistic modeling
Support clustering and classification
Estimate probabilities for unseen data
Types of Density Estimation 1️⃣ Parametric Density Estimati Non-Parametric Density Estimationon
a) Histogram
Simplest form of density estimation
Depends on bin size
In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
# Total marks data
total_marks = [234, 230, 255, 196, 179, 206, 284, 222, 262, 251]
# Plot histogram with density
plt.figure(figsize=(7,5))
sns.histplot(total_marks, bins=6, kde=True, color='skyblue')
plt.xlabel("Total Marks")
plt.ylabel("Density")
plt.title("Density Estimation using Histogram (Total Marks)")
plt.grid(True)
plt.show()
b) Kernel Density Estimation (KDE)
Smooth curve over data
Most widely used in data science
In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
# Total marks data
total_marks = [234, 230, 255, 196, 179, 206, 284, 222, 262, 251]
# KDE Plot
plt.figure(figsize=(7,5))
sns.kdeplot(total_marks, fill=True, color='purple')
plt.xlabel("Total Marks")
plt.ylabel("Density")
plt.title("Kernel Density Estimation (KDE) of Total Marks")
plt.grid(True)
plt.show()
In [ ]: