Class 6: Density Estimation¶
I fit a probability distribution to the Æðey island ocean temperature data. Let's load the data:
In [1]:
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('datasets/ocean-temperature-westfjords-2024/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_1763979809373_noheader.csv')
time = df["time"]
temperature = df["value"]
iso_date_string = "2023-05-29T10:30:00Z"
parsed_date = datetime.fromisoformat(iso_date_string[:-1]) # Removing the 'Z' at the end
parsed_time = [0] * len(time)
for i in range(len(time)):
parsed_time[i] = datetime.fromisoformat(time[i])
I continued to use Neil's code for density estimation on the ocean temperature data and fit a Gaussian probability distribution to it:
In [2]:
npts = 1000
mean = np.mean(temperature)
stddev = np.std(temperature)
plt.hist(temperature,bins=npts//50,density=True)
plt.plot(temperature,0*temperature,'|',ms=npts/20)
plt.text(6,0.23,'Mean = ' + str(round(mean,3)))
plt.text(6,0.22, 'Standard deviation = ' + str(round(stddev,3)))
plt.title('Histogram of Æðey ocean temperature 2024')
plt.xlabel('Temperature [ËšC]')
plt.ylabel('Occurance')
#
# plot Gaussian
#
xi = np.linspace(mean-3*stddev,mean+3*stddev,100)
yi = np.exp(-(xi-mean)**2/(2*stddev**2))/np.sqrt(2*np.pi*stddev**2)
plt.plot(xi,yi,'r')
plt.show()
The Gaussian curve doesn't fit very well because there are two peaks in the data. If I can split the data into summer and winter and fit a normal distrbution to each of them, then I'm sure the distribution will describe the data much better.
In [24]:
temp_summer = temperature[17280:34992]
temp_winter1 = temperature[:17280]
temp_winter2 = temperature[34992:]
temp_winter = [temp_winter1 + temp_winter2]
mean_summer = np.mean(temp_summer)
stddev_summer = np.std(temp_summer)
plt.hist(temp_summer,bins=npts//50,density=True)
plt.plot(temp_summer,0*temp_summer,'|',ms=npts/20)
plt.text(6,0.24,'Summer mean = ' + str(round(mean_summer,3)))
plt.text(6,0.22, 'Summer standard deviation = ' + str(round(stddev_summer,3)))
mean_winter = np.mean(temp_winter)
stddev_winter = np.std(temp_winter)
#plt.hist(temp_winter,bins=npts//50,density=True)
#plt.plot(temp_winter,0*temp_winter,'|',ms=npts/20)
#plt.text(6,0.23,'Winter mean = ' + str(round(mean_winter,3)))
#plt.text(6,0.22, 'Winter standard deviation = ' + str(round(stddev_winter,3)))
plt.title('Histogram of Æðey ocean temperature 2024')
plt.xlabel('Temperature [ËšC]')
plt.ylabel('Occurance')
#
# plot Gaussian
#
xi_summer = np.linspace(mean_summer-3*stddev_summer,mean_summer+3*stddev_summer,100)
yi_summer = np.exp(-(xi-mean_summer)**2/(2*stddev_summer**2))/np.sqrt(2*np.pi*stddev_summer**2)
plt.plot(xi_summer,yi_summer,'r')
xi_winter = np.linspace(mean_winter-3*stddev_winter,mean_winter+3*stddev_winter,100)
yi_winter = np.exp(-(xi-mean_winter)**2/(2*stddev_winter**2))/np.sqrt(2*np.pi*stddev_winter**2)
plt.plot(xi_winter,yi_winter,'g')
plt.show()
I'm having trouble getting the winter data to work, but that's just because I'm not used to separating and joining lists in Python. I'll get there.