Svavar Konráðsson - Fab Futures - Data Science
Home About

Class 6: Density Estimation¶

I fit a probability distribution to the Æðey island ocean temperature data. Let's load the data:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('datasets/ocean-temperature-westfjords-2024/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_1763979809373_noheader.csv')
time = df["time"]
temperature = df["value"]

iso_date_string = "2023-05-29T10:30:00Z"
parsed_date = datetime.fromisoformat(iso_date_string[:-1])  # Removing the 'Z' at the end
parsed_time = [0] * len(time)
for i in range(len(time)):
    parsed_time[i] = datetime.fromisoformat(time[i])

I continued to use Neil's code for density estimation on the ocean temperature data and fit a Gaussian probability distribution to it:

In [2]:
npts = 1000
mean = np.mean(temperature)
stddev = np.std(temperature)
plt.hist(temperature,bins=npts//50,density=True)
plt.plot(temperature,0*temperature,'|',ms=npts/20)
plt.text(6,0.23,'Mean = ' + str(round(mean,3)))
plt.text(6,0.22, 'Standard deviation = ' + str(round(stddev,3)))
plt.title('Histogram of Æðey ocean temperature 2024')
plt.xlabel('Temperature [ËšC]')
plt.ylabel('Occurance')
#
# plot Gaussian
#
xi = np.linspace(mean-3*stddev,mean+3*stddev,100)
yi = np.exp(-(xi-mean)**2/(2*stddev**2))/np.sqrt(2*np.pi*stddev**2)
plt.plot(xi,yi,'r')
plt.show()
No description has been provided for this image

The Gaussian curve doesn't fit very well because there are two peaks in the data. If I can split the data into summer and winter and fit a normal distrbution to each of them, then I'm sure the distribution will describe the data much better.

In [24]:
temp_summer = temperature[17280:34992]
temp_winter1 = temperature[:17280]
temp_winter2 = temperature[34992:]
temp_winter = [temp_winter1 + temp_winter2]

mean_summer = np.mean(temp_summer)
stddev_summer = np.std(temp_summer)
plt.hist(temp_summer,bins=npts//50,density=True)
plt.plot(temp_summer,0*temp_summer,'|',ms=npts/20)
plt.text(6,0.24,'Summer mean = ' + str(round(mean_summer,3)))
plt.text(6,0.22, 'Summer standard deviation = ' + str(round(stddev_summer,3)))

mean_winter = np.mean(temp_winter)
stddev_winter = np.std(temp_winter)
#plt.hist(temp_winter,bins=npts//50,density=True)
#plt.plot(temp_winter,0*temp_winter,'|',ms=npts/20)
#plt.text(6,0.23,'Winter mean = ' + str(round(mean_winter,3)))
#plt.text(6,0.22, 'Winter standard deviation = ' + str(round(stddev_winter,3)))


plt.title('Histogram of Æðey ocean temperature 2024')
plt.xlabel('Temperature [ËšC]')
plt.ylabel('Occurance')
#
# plot Gaussian
#
xi_summer = np.linspace(mean_summer-3*stddev_summer,mean_summer+3*stddev_summer,100)
yi_summer = np.exp(-(xi-mean_summer)**2/(2*stddev_summer**2))/np.sqrt(2*np.pi*stddev_summer**2)
plt.plot(xi_summer,yi_summer,'r')

xi_winter = np.linspace(mean_winter-3*stddev_winter,mean_winter+3*stddev_winter,100)
yi_winter = np.exp(-(xi-mean_winter)**2/(2*stddev_winter**2))/np.sqrt(2*np.pi*stddev_winter**2)
plt.plot(xi_winter,yi_winter,'g')

plt.show()
No description has been provided for this image

I'm having trouble getting the winter data to work, but that's just because I'm not used to separating and joining lists in Python. I'll get there.