[Dorji Tshezom] - Fab Futures - Data Science
Home About

What is Density Estimation in Data Science?¶

Density estimation is the process of estimating the probability distribution of a dataset—that is, figuring out how data values are spread across possible values

Why Density Estimation is Important¶

In data science, density estimation helps to:

Understand data distribution

Detect outliers and anomalies

Perform probabilistic modeling

Support clustering and classification

Estimate probabilities for unseen data

Types of Density Estimation 1️⃣ Parametric Density Estimati Non-Parametric Density Estimationon

a) Histogram

Simplest form of density estimation

Depends on bin size

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns

# Total marks data
total_marks = [234, 230, 255, 196, 179, 206, 284, 222, 262, 251]

# Plot histogram with density
plt.figure(figsize=(7,5))
sns.histplot(total_marks, bins=6, kde=True, color='skyblue')

plt.xlabel("Total Marks")
plt.ylabel("Density")
plt.title("Density Estimation using Histogram (Total Marks)")
plt.grid(True)
plt.show() 
No description has been provided for this image

b) Kernel Density Estimation (KDE)

Smooth curve over data

Most widely used in data science

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

# Total marks data
total_marks = [234, 230, 255, 196, 179, 206, 284, 222, 262, 251]

# KDE Plot
plt.figure(figsize=(7,5))
sns.kdeplot(total_marks, fill=True, color='purple')

plt.xlabel("Total Marks")
plt.ylabel("Density")
plt.title("Kernel Density Estimation (KDE) of Total Marks")
plt.grid(True)
plt.show() 
No description has been provided for this image
In [ ]: