[Karma Tshering] - Fab Futures - Data Science
Home About

Session 6: Density Estimation¶

Assignment¶

Fit a probability distribution to your data.

In [2]:
import pandas as pd

# Load the data
df = pd.read_csv("datasets/firecounts.csv")

# Display the data
print(df)

# Optional: access the columns
years = df["Year"].values
fire_counts = df["Fire Counts"].values

print("Years:", years)
print("Fire Counts:", fire_counts)
    Year  Fire Counts
0   2001          199
1   2002          127
2   2003          170
3   2004          219
4   2005          204
5   2006          354
6   2007          339
7   2008          225
8   2009          369
9   2010          442
10  2011          264
11  2012          425
12  2013          236
13  2014          342
14  2015          260
15  2016          221
16  2017          231
17  2018          139
18  2019          116
19  2020          138
20  2021          250
21  2022           97
22  2023          227
23  2024          169
Years: [2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024]
Fire Counts: [199 127 170 219 204 354 339 225 369 442 264 425 236 342 260 221 231 139
 116 138 250  97 227 169]

For me, understanding the concept of K-means clustering and plotting clusters and their Voronoi-like partition is a weird lesson, without having the basics of coding. However, I am grateful to know how to use AI prompts for self learning. I have used Chatgpt for my lessons with the following prompts.¶

No description has been provided for this image

No description has been provided for this image

K-means Clustering-Density Estimation¶

In [1]:
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
import numpy as np
import pandas as pd

#
# k-means parameters
#
nclusters = 3      # choose the number of clusters
nsteps = 10
np.random.seed(0)

#
# load data
#
df = pd.read_csv("datasets/firecounts.csv")

# extract Year and Fire Counts as coordinates
x = df["Year"].values.astype(float)
y = df["Fire Counts"].values.astype(float)

#
# choose starting cluster centers randomly from data
#
indices = np.random.choice(len(x), nclusters, replace=False)
mux = x[indices].astype(float)
muy = y[indices].astype(float)

#
# plot before iteration
#
fig, ax = plt.subplots()
plt.scatter(x, y)
vor = Voronoi(np.stack((mux, muy), axis=1))
voronoi_plot_2d(vor, ax=ax, show_points=True, show_vertices=False, point_size=20)
plt.title("Before k-means iterations")
plt.xlabel("Year")
plt.ylabel("Fire Counts")
plt.show()

#
# k-means iteration
#
for step in range(nsteps):
    # compute distances of every point to every center
    xm = np.outer(x, np.ones(len(mux)))
    ym = np.outer(y, np.ones(len(muy)))
    muxm = np.outer(np.ones(len(x)), mux)
    muym = np.outer(np.ones(len(x)), muy)

    distances = np.sqrt((xm - muxm)**2 + (ym - muym)**2)

    # assign each point to the nearest cluster
    mins = np.argmin(distances, axis=1)

    # update cluster centers
    for k in range(nclusters):
        index = np.where(mins == k)[0]
        if len(index) > 0:
            mux[k] = np.mean(x[index])
            muy[k] = np.mean(y[index])

#
# plot after iteration
#
fig, ax = plt.subplots()
plt.scatter(x, y)

vor = Voronoi(np.stack((mux, muy), axis=1))
voronoi_plot_2d(vor, ax=ax, show_points=True, show_vertices=False, point_size=20)

plt.title("After k-means iterations")
plt.xlabel("Year")
plt.ylabel("Fire Counts")
plt.show()
No description has been provided for this image
No description has been provided for this image