< Home
Class 2: Tools¶
I found a log of ocean temperature in the Westfjords of Iceland in the Copernicus Marine Data Store. The assignment for this class is to visualize the data.
2024 ocean temperature at Æðey weather station¶
This is a CSV file with ten columns, but Pandas only recognizes one column. I looked at the CSV file in a text file editor, and made a copy of it with the header removed. The header starts with a # sign and it has no useful info as far as I can see.
import pandas as pd
df = pd.read_csv('datasets/ocean-temperature-westfjords-2024/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_1763979809373_noheader.csv')
print(df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 45965 entries, 0 to 45964 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 parameter 45965 non-null object 1 platformId 45965 non-null object 2 platformType 45965 non-null object 3 time 45965 non-null object 4 longitude 45965 non-null float64 5 latitude 45965 non-null float64 6 depth 45965 non-null float64 7 pressure 0 non-null float64 8 value 45965 non-null float64 9 valueQc 45965 non-null int64 dtypes: float64(5), int64(1), object(4) memory usage: 3.5+ MB None
Now I've successfully imported the data and I can start working with it. Let's see what it looks like:
print(df)
parameter platformId platformType time longitude \
0 TEMP AEdey MO 2024-01-02T00:00:00.000Z -22.6667
1 TEMP AEdey MO 2024-01-02T00:10:00.000Z -22.6667
2 TEMP AEdey MO 2024-01-02T00:20:00.000Z -22.6667
3 TEMP AEdey MO 2024-01-02T00:30:00.000Z -22.6667
4 TEMP AEdey MO 2024-01-02T00:40:00.000Z -22.6667
... ... ... ... ... ...
45960 TEMP 6401759 DB 2024-12-28T15:00:00.000Z -24.5645
45961 TEMP 6401759 DB 2024-12-28T16:00:00.000Z -24.5969
45962 TEMP 6401759 DB 2024-12-28T17:00:00.000Z -24.6290
45963 TEMP 6401759 DB 2024-12-28T18:00:00.000Z -24.6527
45964 TEMP 6401759 DB 2024-12-28T19:00:00.000Z -24.6749
latitude depth pressure value valueQc
0 66.1000 0.0 NaN 2.865 1
1 66.1000 0.0 NaN 2.876 1
2 66.1000 0.0 NaN 2.856 1
3 66.1000 0.0 NaN 2.867 1
4 66.1000 0.0 NaN 2.876 1
... ... ... ... ... ...
45960 65.5043 0.5 NaN 3.710 1
45961 65.4985 0.5 NaN 3.690 1
45962 65.4954 0.5 NaN 3.740 1
45963 65.5000 0.5 NaN 4.170 1
45964 65.5111 0.5 NaN 4.350 1
[45965 rows x 10 columns]
display(df.head(11))
| parameter | platformId | platformType | time | longitude | latitude | depth | pressure | value | valueQc | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | TEMP | AEdey | MO | 2024-01-02T00:00:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.865 | 1 |
| 1 | TEMP | AEdey | MO | 2024-01-02T00:10:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.876 | 1 |
| 2 | TEMP | AEdey | MO | 2024-01-02T00:20:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.856 | 1 |
| 3 | TEMP | AEdey | MO | 2024-01-02T00:30:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.867 | 1 |
| 4 | TEMP | AEdey | MO | 2024-01-02T00:40:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.876 | 1 |
| 5 | TEMP | AEdey | MO | 2024-01-02T00:50:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.898 | 1 |
| 6 | TEMP | AEdey | MO | 2024-01-02T01:00:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.927 | 1 |
| 7 | TEMP | AEdey | MO | 2024-01-02T01:10:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.955 | 1 |
| 8 | TEMP | AEdey | MO | 2024-01-02T01:20:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.969 | 1 |
| 9 | TEMP | AEdey | MO | 2024-01-02T01:30:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.971 | 1 |
| 10 | TEMP | AEdey | MO | 2024-01-02T01:40:00.000Z | -22.6667 | 66.1 | 0.0 | NaN | 2.974 | 1 |
time = df["time"]
temperature = df["value"]
display(time)
0 2024-01-02T00:00:00.000Z
1 2024-01-02T00:10:00.000Z
2 2024-01-02T00:20:00.000Z
3 2024-01-02T00:30:00.000Z
4 2024-01-02T00:40:00.000Z
...
45960 2024-12-28T15:00:00.000Z
45961 2024-12-28T16:00:00.000Z
45962 2024-12-28T17:00:00.000Z
45963 2024-12-28T18:00:00.000Z
45964 2024-12-28T19:00:00.000Z
Name: time, Length: 45965, dtype: object
display(temperature)
0 2.865
1 2.876
2 2.856
3 2.867
4 2.876
...
45960 3.710
45961 3.690
45962 3.740
45963 4.170
45964 4.350
Name: value, Length: 45965, dtype: float64
I used the W3Schools tutorial to learn to use Matplotlib. I'll plot the first thousand values in this 46 thousand value dataset:
import matplotlib.pyplot as plt
plt.plot(time[0:1000],temperature[0:1000])
plt.xlabel("Date")
plt.ylabel("Temperature [ËšC]")
plt.show()
Now I need to figure out how to display the date properly. I'll try to format date ticks using ConciseDateFormatter:
import matplotlib.dates as mdates
import numpy as np
fig, axs = plt.subplots(3, 1, layout='constrained', figsize=(6, 6))
# January: lims = [2024-01-02T00:00:00.000Z,2024-01-31T23:50:00.000Z]
lims = [(np.datetime64('2024-01'), np.datetime64('2024-02'))]
for nn, ax in enumerate(axs):
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(time, temperature)
ax.set_xlim(lims[nn])
axs[0].set_title('Concise Date Formatter')
plt.show()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1811, in Axis.convert_units(self, x) 1810 try: -> 1811 ret = self._converter.convert(x, self.units, self) 1812 except Exception as e: File /opt/conda/lib/python3.13/site-packages/matplotlib/category.py:57, in StrCategoryConverter.convert(value, unit, axis) 56 # force an update so it also does type checking ---> 57 unit.update(values) 58 s = np.vectorize(unit._mapping.__getitem__, otypes=[float])(values) File /opt/conda/lib/python3.13/site-packages/matplotlib/category.py:217, in UnitData.update(self, data) 215 for val in OrderedDict.fromkeys(data): 216 # OrderedDict just iterates over unique values in data. --> 217 _api.check_isinstance((str, bytes), value=val) 218 if convertible: 219 # this will only be called so long as convertible is True. File /opt/conda/lib/python3.13/site-packages/matplotlib/_api/__init__.py:92, in check_isinstance(types, **kwargs) 91 names.append("None") ---> 92 raise TypeError( 93 "{!r} must be an instance of {}, not a {}".format( 94 k, 95 ", ".join(names[:-1]) + " or " + names[-1] 96 if len(names) > 1 else names[0], 97 type_name(type(v)))) TypeError: 'value' must be an instance of str or bytes, not a numpy.datetime64 The above exception was the direct cause of the following exception: ConversionError Traceback (most recent call last) Cell In[16], line 13 10 ax.xaxis.set_major_formatter(formatter) 12 ax.plot(time, temperature) ---> 13 ax.set_xlim(lims[nn]) 14 axs[0].set_title('Concise Date Formatter') 16 plt.show() File /opt/conda/lib/python3.13/site-packages/matplotlib/axes/_base.py:3828, in _AxesBase.set_xlim(self, left, right, emit, auto, xmin, xmax) 3826 raise TypeError("Cannot pass both 'right' and 'xmax'") 3827 right = xmax -> 3828 return self.xaxis._set_lim(left, right, emit=emit, auto=auto) File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1216, in Axis._set_lim(self, v0, v1, emit, auto) 1213 name = self._get_axis_name() 1215 self.axes._process_unit_info([(name, (v0, v1))], convert=False) -> 1216 v0 = self.axes._validate_converted_limits(v0, self.convert_units) 1217 v1 = self.axes._validate_converted_limits(v1, self.convert_units) 1219 if v0 is None or v1 is None: 1220 # Axes init calls set_xlim(0, 1) before get_xlim() can be called, 1221 # so only grab the limits if we really need them. File /opt/conda/lib/python3.13/site-packages/matplotlib/axes/_base.py:3744, in _AxesBase._validate_converted_limits(self, limit, convert) 3734 """ 3735 Raise ValueError if converted limits are non-finite. 3736 (...) 3741 The limit value after call to convert(), or None if limit is None. 3742 """ 3743 if limit is not None: -> 3744 converted_limit = convert(limit) 3745 if isinstance(converted_limit, np.ndarray): 3746 converted_limit = converted_limit.squeeze() File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1813, in Axis.convert_units(self, x) 1811 ret = self._converter.convert(x, self.units, self) 1812 except Exception as e: -> 1813 raise munits.ConversionError('Failed to convert value(s) to axis ' 1814 f'units: {x!r}') from e 1815 return ret ConversionError: Failed to convert value(s) to axis units: np.datetime64('2024-01')
I'm starting to miss MATLAB and its wonderful documentation. I think I need to format the date better, so that Matplotlib can parse it. The datetime strings in the file all end with the letter Z. I did some searching and found this:
The ‘Z’ at the end of an ISO 8601 date indicates that the time is in UTC (Coordinated Universal Time).
Parse ISO 8601 date ending in Z with Python Here's a simple example that is given on that page:
from datetime import datetime
iso_date_string = "2023-05-29T10:30:00Z"
parsed_date = datetime.fromisoformat(iso_date_string[:-1]) # Removing the 'Z' at the end
print(parsed_date)
2023-05-29 10:30:00
Let's try this on the first datetime entry in my data:
print(time[0])
2024-01-02T00:00:00.000Z
parsed_date_test = datetime.fromisoformat(time[0])
print(parsed_date_test)
2024-01-02 00:00:00+00:00
OK, let's convert the whole time list:
# Creating a list the same length as time, filled with 0
#parsed_time = [0] * len(time)
#parsed_time = datetime.fromisoformat(time[:])
I don't understand. fromisoformat parsed a single value just fine. Why not the whole list? Fine, let's try a for loop instead:
#print(parsed_time[0:10])
parsed_time = [0] * len(time)
for i in range(len(time)):
parsed_time[i] = datetime.fromisoformat(time[i])
print(parsed_time)
IOPub data rate exceeded. The Jupyter server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--ServerApp.iopub_data_rate_limit`. Current values: ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec) ServerApp.rate_limit_window=3.0 (secs)
I get the error IOPub data rate exceeded when I try to print the whole parsed_time list. Let's get a few samples instead:
print(parsed_time[0:10])
print(parsed_time[10000:10010])
print(parsed_time[40000:40010])
[datetime.datetime(2024, 1, 2, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 30, tzinfo=datetime.timezone.utc)] [datetime.datetime(2024, 3, 11, 10, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 10, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 12, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 12, 10, tzinfo=datetime.timezone.utc)] [datetime.datetime(2024, 10, 7, 1, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 20, tzinfo=datetime.timezone.utc)]
OK, this seems to have worked. Let's see if I can plot this:
plt.plot(parsed_time[0:10],temperature[0:10])
[<matplotlib.lines.Line2D at 0xe5dc8032bc50>]
fig, axs = plt.subplots(12, 1, layout='constrained', figsize=(6, 30))
# January: lims = [2024-01-02T00:00:00.000Z,2024-01-31T23:50:00.000Z]
lims = [(np.datetime64('2024-01'), np.datetime64('2024-02')),
(np.datetime64('2024-02'), np.datetime64('2024-03')),
(np.datetime64('2024-03'), np.datetime64('2024-04')),
(np.datetime64('2024-04'), np.datetime64('2024-05')),
(np.datetime64('2024-05'), np.datetime64('2024-06')),
(np.datetime64('2024-06'), np.datetime64('2024-07')),
(np.datetime64('2024-07'), np.datetime64('2024-08')),
(np.datetime64('2024-08'), np.datetime64('2024-09')),
(np.datetime64('2024-09'), np.datetime64('2024-10')),
(np.datetime64('2024-10'), np.datetime64('2024-11')),
(np.datetime64('2024-11'), np.datetime64('2024-12')),
(np.datetime64('2024-12'), np.datetime64('2025-01'))]
for nn, ax in enumerate(axs):
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
ax.plot(parsed_time, temperature)
ax.set_xlim(lims[nn])
axs[0].set_title('Æðey sea temperature 2024')
plt.ylabel("Temperature [ËšC]")
plt.show()
Either the data is bad (unlikely) or I'm doing something strange with it.