Svavar Konráðsson - Fab Futures - Data Science
Home About

< Home

Class 2: Tools¶

I found a log of ocean temperature in the Westfjords of Iceland in the Copernicus Marine Data Store. The assignment for this class is to visualize the data.

2024 ocean temperature at Æðey weather station¶

This is a CSV file with ten columns, but Pandas only recognizes one column. I looked at the CSV file in a text file editor, and made a copy of it with the header removed. The header starts with a # sign and it has no useful info as far as I can see.

In [9]:
import pandas as pd

df = pd.read_csv('datasets/ocean-temperature-westfjords-2024/cmems_obs-ins_nws_phybgcwav_mynrt_na_irr_1763979809373_noheader.csv')

print(df.info()) 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45965 entries, 0 to 45964
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   parameter     45965 non-null  object 
 1   platformId    45965 non-null  object 
 2   platformType  45965 non-null  object 
 3   time          45965 non-null  object 
 4   longitude     45965 non-null  float64
 5   latitude      45965 non-null  float64
 6   depth         45965 non-null  float64
 7   pressure      0 non-null      float64
 8   value         45965 non-null  float64
 9   valueQc       45965 non-null  int64  
dtypes: float64(5), int64(1), object(4)
memory usage: 3.5+ MB
None

Now I've successfully imported the data and I can start working with it. Let's see what it looks like:

In [10]:
print(df)
      parameter platformId platformType                      time  longitude  \
0          TEMP      AEdey           MO  2024-01-02T00:00:00.000Z   -22.6667   
1          TEMP      AEdey           MO  2024-01-02T00:10:00.000Z   -22.6667   
2          TEMP      AEdey           MO  2024-01-02T00:20:00.000Z   -22.6667   
3          TEMP      AEdey           MO  2024-01-02T00:30:00.000Z   -22.6667   
4          TEMP      AEdey           MO  2024-01-02T00:40:00.000Z   -22.6667   
...         ...        ...          ...                       ...        ...   
45960      TEMP    6401759           DB  2024-12-28T15:00:00.000Z   -24.5645   
45961      TEMP    6401759           DB  2024-12-28T16:00:00.000Z   -24.5969   
45962      TEMP    6401759           DB  2024-12-28T17:00:00.000Z   -24.6290   
45963      TEMP    6401759           DB  2024-12-28T18:00:00.000Z   -24.6527   
45964      TEMP    6401759           DB  2024-12-28T19:00:00.000Z   -24.6749   

       latitude  depth  pressure  value  valueQc  
0       66.1000    0.0       NaN  2.865        1  
1       66.1000    0.0       NaN  2.876        1  
2       66.1000    0.0       NaN  2.856        1  
3       66.1000    0.0       NaN  2.867        1  
4       66.1000    0.0       NaN  2.876        1  
...         ...    ...       ...    ...      ...  
45960   65.5043    0.5       NaN  3.710        1  
45961   65.4985    0.5       NaN  3.690        1  
45962   65.4954    0.5       NaN  3.740        1  
45963   65.5000    0.5       NaN  4.170        1  
45964   65.5111    0.5       NaN  4.350        1  

[45965 rows x 10 columns]
In [11]:
display(df.head(11))
parameter platformId platformType time longitude latitude depth pressure value valueQc
0 TEMP AEdey MO 2024-01-02T00:00:00.000Z -22.6667 66.1 0.0 NaN 2.865 1
1 TEMP AEdey MO 2024-01-02T00:10:00.000Z -22.6667 66.1 0.0 NaN 2.876 1
2 TEMP AEdey MO 2024-01-02T00:20:00.000Z -22.6667 66.1 0.0 NaN 2.856 1
3 TEMP AEdey MO 2024-01-02T00:30:00.000Z -22.6667 66.1 0.0 NaN 2.867 1
4 TEMP AEdey MO 2024-01-02T00:40:00.000Z -22.6667 66.1 0.0 NaN 2.876 1
5 TEMP AEdey MO 2024-01-02T00:50:00.000Z -22.6667 66.1 0.0 NaN 2.898 1
6 TEMP AEdey MO 2024-01-02T01:00:00.000Z -22.6667 66.1 0.0 NaN 2.927 1
7 TEMP AEdey MO 2024-01-02T01:10:00.000Z -22.6667 66.1 0.0 NaN 2.955 1
8 TEMP AEdey MO 2024-01-02T01:20:00.000Z -22.6667 66.1 0.0 NaN 2.969 1
9 TEMP AEdey MO 2024-01-02T01:30:00.000Z -22.6667 66.1 0.0 NaN 2.971 1
10 TEMP AEdey MO 2024-01-02T01:40:00.000Z -22.6667 66.1 0.0 NaN 2.974 1
In [12]:
time = df["time"]
temperature = df["value"]
display(time)
0        2024-01-02T00:00:00.000Z
1        2024-01-02T00:10:00.000Z
2        2024-01-02T00:20:00.000Z
3        2024-01-02T00:30:00.000Z
4        2024-01-02T00:40:00.000Z
                   ...           
45960    2024-12-28T15:00:00.000Z
45961    2024-12-28T16:00:00.000Z
45962    2024-12-28T17:00:00.000Z
45963    2024-12-28T18:00:00.000Z
45964    2024-12-28T19:00:00.000Z
Name: time, Length: 45965, dtype: object
In [13]:
display(temperature)
0        2.865
1        2.876
2        2.856
3        2.867
4        2.876
         ...  
45960    3.710
45961    3.690
45962    3.740
45963    4.170
45964    4.350
Name: value, Length: 45965, dtype: float64

I used the W3Schools tutorial to learn to use Matplotlib. I'll plot the first thousand values in this 46 thousand value dataset:

In [15]:
import matplotlib.pyplot as plt
plt.plot(time[0:1000],temperature[0:1000])
plt.xlabel("Date")
plt.ylabel("Temperature [ËšC]")
plt.show()
No description has been provided for this image

Now I need to figure out how to display the date properly. I'll try to format date ticks using ConciseDateFormatter:

In [16]:
import matplotlib.dates as mdates
import numpy as np
fig, axs = plt.subplots(3, 1, layout='constrained', figsize=(6, 6))
# January: lims = [2024-01-02T00:00:00.000Z,2024-01-31T23:50:00.000Z]
lims = [(np.datetime64('2024-01'), np.datetime64('2024-02'))]
for nn, ax in enumerate(axs):
    locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
    formatter = mdates.ConciseDateFormatter(locator)
    ax.xaxis.set_major_locator(locator)
    ax.xaxis.set_major_formatter(formatter)

    ax.plot(time, temperature)
    ax.set_xlim(lims[nn])
axs[0].set_title('Concise Date Formatter')

plt.show()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1811, in Axis.convert_units(self, x)
   1810 try:
-> 1811     ret = self._converter.convert(x, self.units, self)
   1812 except Exception as e:

File /opt/conda/lib/python3.13/site-packages/matplotlib/category.py:57, in StrCategoryConverter.convert(value, unit, axis)
     56 # force an update so it also does type checking
---> 57 unit.update(values)
     58 s = np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)

File /opt/conda/lib/python3.13/site-packages/matplotlib/category.py:217, in UnitData.update(self, data)
    215 for val in OrderedDict.fromkeys(data):
    216     # OrderedDict just iterates over unique values in data.
--> 217     _api.check_isinstance((str, bytes), value=val)
    218     if convertible:
    219         # this will only be called so long as convertible is True.

File /opt/conda/lib/python3.13/site-packages/matplotlib/_api/__init__.py:92, in check_isinstance(types, **kwargs)
     91     names.append("None")
---> 92 raise TypeError(
     93     "{!r} must be an instance of {}, not a {}".format(
     94         k,
     95         ", ".join(names[:-1]) + " or " + names[-1]
     96         if len(names) > 1 else names[0],
     97         type_name(type(v))))

TypeError: 'value' must be an instance of str or bytes, not a numpy.datetime64

The above exception was the direct cause of the following exception:

ConversionError                           Traceback (most recent call last)
Cell In[16], line 13
     10     ax.xaxis.set_major_formatter(formatter)
     12     ax.plot(time, temperature)
---> 13     ax.set_xlim(lims[nn])
     14 axs[0].set_title('Concise Date Formatter')
     16 plt.show()

File /opt/conda/lib/python3.13/site-packages/matplotlib/axes/_base.py:3828, in _AxesBase.set_xlim(self, left, right, emit, auto, xmin, xmax)
   3826         raise TypeError("Cannot pass both 'right' and 'xmax'")
   3827     right = xmax
-> 3828 return self.xaxis._set_lim(left, right, emit=emit, auto=auto)

File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1216, in Axis._set_lim(self, v0, v1, emit, auto)
   1213 name = self._get_axis_name()
   1215 self.axes._process_unit_info([(name, (v0, v1))], convert=False)
-> 1216 v0 = self.axes._validate_converted_limits(v0, self.convert_units)
   1217 v1 = self.axes._validate_converted_limits(v1, self.convert_units)
   1219 if v0 is None or v1 is None:
   1220     # Axes init calls set_xlim(0, 1) before get_xlim() can be called,
   1221     # so only grab the limits if we really need them.

File /opt/conda/lib/python3.13/site-packages/matplotlib/axes/_base.py:3744, in _AxesBase._validate_converted_limits(self, limit, convert)
   3734 """
   3735 Raise ValueError if converted limits are non-finite.
   3736 
   (...)   3741 The limit value after call to convert(), or None if limit is None.
   3742 """
   3743 if limit is not None:
-> 3744     converted_limit = convert(limit)
   3745     if isinstance(converted_limit, np.ndarray):
   3746         converted_limit = converted_limit.squeeze()

File /opt/conda/lib/python3.13/site-packages/matplotlib/axis.py:1813, in Axis.convert_units(self, x)
   1811     ret = self._converter.convert(x, self.units, self)
   1812 except Exception as e:
-> 1813     raise munits.ConversionError('Failed to convert value(s) to axis '
   1814                                  f'units: {x!r}') from e
   1815 return ret

ConversionError: Failed to convert value(s) to axis units: np.datetime64('2024-01')
No description has been provided for this image

I'm starting to miss MATLAB and its wonderful documentation. I think I need to format the date better, so that Matplotlib can parse it. The datetime strings in the file all end with the letter Z. I did some searching and found this:

The ‘Z’ at the end of an ISO 8601 date indicates that the time is in UTC (Coordinated Universal Time).

Parse ISO 8601 date ending in Z with Python Here's a simple example that is given on that page:

In [17]:
from datetime import datetime

iso_date_string = "2023-05-29T10:30:00Z"
parsed_date = datetime.fromisoformat(iso_date_string[:-1])  # Removing the 'Z' at the end

print(parsed_date)
2023-05-29 10:30:00

Let's try this on the first datetime entry in my data:

In [18]:
print(time[0])
2024-01-02T00:00:00.000Z
In [19]:
parsed_date_test = datetime.fromisoformat(time[0])
print(parsed_date_test)
2024-01-02 00:00:00+00:00

OK, let's convert the whole time list:

In [ ]:
# Creating a list the same length as time, filled with 0
#parsed_time = [0] * len(time)
#parsed_time = datetime.fromisoformat(time[:])

I don't understand. fromisoformat parsed a single value just fine. Why not the whole list? Fine, let's try a for loop instead:

In [ ]:
#print(parsed_time[0:10])
In [20]:
parsed_time = [0] * len(time)
for i in range(len(time)):
    parsed_time[i] = datetime.fromisoformat(time[i])
print(parsed_time)
IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

I get the error IOPub data rate exceeded when I try to print the whole parsed_time list. Let's get a few samples instead:

In [21]:
print(parsed_time[0:10])
print(parsed_time[10000:10010])
print(parsed_time[40000:40010])
[datetime.datetime(2024, 1, 2, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 0, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 2, 1, 30, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2024, 3, 11, 10, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 10, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 11, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 12, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 3, 11, 12, 10, tzinfo=datetime.timezone.utc)]
[datetime.datetime(2024, 10, 7, 1, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 20, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 30, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 40, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 2, 50, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 10, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 10, 7, 3, 20, tzinfo=datetime.timezone.utc)]

OK, this seems to have worked. Let's see if I can plot this:

In [22]:
plt.plot(parsed_time[0:10],temperature[0:10])
Out[22]:
[<matplotlib.lines.Line2D at 0xe5dc8032bc50>]
No description has been provided for this image
In [23]:
fig, axs = plt.subplots(12, 1, layout='constrained', figsize=(6, 30))
# January: lims = [2024-01-02T00:00:00.000Z,2024-01-31T23:50:00.000Z]
lims = [(np.datetime64('2024-01'), np.datetime64('2024-02')),
        (np.datetime64('2024-02'), np.datetime64('2024-03')),
        (np.datetime64('2024-03'), np.datetime64('2024-04')),
        (np.datetime64('2024-04'), np.datetime64('2024-05')),
        (np.datetime64('2024-05'), np.datetime64('2024-06')),
        (np.datetime64('2024-06'), np.datetime64('2024-07')),
        (np.datetime64('2024-07'), np.datetime64('2024-08')),
        (np.datetime64('2024-08'), np.datetime64('2024-09')),
        (np.datetime64('2024-09'), np.datetime64('2024-10')),
        (np.datetime64('2024-10'), np.datetime64('2024-11')),
        (np.datetime64('2024-11'), np.datetime64('2024-12')),
        (np.datetime64('2024-12'), np.datetime64('2025-01'))]
for nn, ax in enumerate(axs):
    locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
    formatter = mdates.ConciseDateFormatter(locator)
    ax.xaxis.set_major_locator(locator)
    ax.xaxis.set_major_formatter(formatter)

    ax.plot(parsed_time, temperature)
    ax.set_xlim(lims[nn])
axs[0].set_title('Æðey sea temperature 2024')
plt.ylabel("Temperature [ËšC]")
plt.show()
No description has been provided for this image

Either the data is bad (unlikely) or I'm doing something strange with it.