Luis Diaz-Faes - Fab Futures - Data Science
Home About A Industriosa

Day 8: Presentation¶

As Pontes Lake and the Power Plant

In [2]:
import pandas as pd
from IPython.display import display, HTML


df = pd.read_csv('datasets/1363X-20190215-20200416.csv', delimiter=';')

display(HTML(f"<h4>Show first 10 columns</h4>"))
print(df.head(10)) ## the number limit the result to show

display(HTML(f"<h4>Table and column information:</h4>"))
df.info()

display(HTML(f"<h4>Show select data</h4>"))
df[["UTC", "Temp", "Prec", "Hum"]]

Show first 10 columns

      Id       Lon       Lat    Alt     Nombre                  UTC  Prec  \
0  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-15T21:00:00   0.0   
1  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-15T22:00:00   0.0   
2  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-15T23:00:00   0.0   
3  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T00:00:00   0.0   
4  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T01:00:00   0.0   
5  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T02:00:00   0.0   
6  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T03:00:00   0.0   
7  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T04:00:00   0.0   
8  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T05:00:00   0.0   
9  1363X -7.861476  43.44597  343.0  AS PONTES  2019-02-16T06:00:00   0.0   

    Hum  Temp  TempMin  TempMax  
0  81.0   7.3      7.3      8.9  
1  85.0   5.7      5.7      7.1  
2  89.0   4.3      4.3      5.4  
3  90.0   3.4      3.4      4.2  
4  93.0   2.6      2.6      3.2  
5  94.0   2.5      2.5      2.7  
6  94.0   1.7      1.7      2.3  
7  95.0   1.1      1.1      1.6  
8  96.0   0.7      0.7      1.1  
9  97.0   0.9      0.7      0.9  

Table and column information:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52884 entries, 0 to 52883
Data columns (total 11 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Id       52884 non-null  object 
 1   Lon      52884 non-null  float64
 2   Lat      52884 non-null  float64
 3   Alt      52884 non-null  float64
 4   Nombre   52884 non-null  object 
 5   UTC      52884 non-null  object 
 6   Prec     52556 non-null  float64
 7   Hum      52247 non-null  float64
 8   Temp     52248 non-null  float64
 9   TempMin  52248 non-null  float64
 10  TempMax  52248 non-null  float64
dtypes: float64(8), object(3)
memory usage: 4.4+ MB

Show select data

Out[2]:
UTC Temp Prec Hum
0 2019-02-15T21:00:00 7.3 0.0 81.0
1 2019-02-15T22:00:00 5.7 0.0 85.0
2 2019-02-15T23:00:00 4.3 0.0 89.0
3 2019-02-16T00:00:00 3.4 0.0 90.0
4 2019-02-16T01:00:00 2.6 0.0 93.0
... ... ... ... ...
52879 2025-06-23T01:00:00+0000 13.5 0.0 91.0
52880 2025-06-23T02:00:00+0000 12.4 0.0 92.0
52881 2025-06-23T03:00:00+0000 11.8 0.0 94.0
52882 2025-06-23T04:00:00+0000 11.5 0.0 95.0
52883 2025-06-23T05:00:00+0000 11.4 0.0 95.0

52884 rows × 4 columns

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from matplotlib.gridspec import GridSpec
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.decomposition import PCA

# ----------------------------------------------------
# 1. LOAD AND PREPARE DATA
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")

df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Temp", "Hum", "Prec"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)

df["YEAR"] = df["UTC"].dt.year
df["DATE"] = df["UTC"].dt.date
df["HOUR"] = df["UTC"].dt.hour

# Focus on 2019–2024
df = df[df["YEAR"].between(2019, 2024)].copy()

print("Years present:", sorted(df["YEAR"].unique()))
print("Total rows:", len(df))

# Fog-like condition
fog_mask = (
    (df["Hum"] >= 95.0) &
    (df["Temp"] >= -2.0) &
    (df["Temp"] <= 15.0)
)
df["FOG_FLAG"] = fog_mask
df["RAIN_FLAG"] = df["Prec"] > 0

# ----------------------------------------------------
# 2. YEARLY SUMMARY (for humidity and fog)
# ----------------------------------------------------
summary = (
    df.groupby("YEAR")
    .agg(
        temp_mean=("Temp", "mean"),
        hum_mean=("Hum", "mean"),
        hum_median=("Hum", "median"),
        hum_p95=("Hum", lambda x: np.percentile(x, 95)),
        fog_hours=("FOG_FLAG", "sum"),
        total_hours=("FOG_FLAG", "size"),
        rainy_hours=("RAIN_FLAG", "sum"),
        prec_mean=("Prec", "mean")
    )
    .reset_index()
)

summary["fog_pct"] = 100.0 * summary["fog_hours"] / summary["total_hours"]

years_arr = summary["YEAR"].values

# ----------------------------------------------------
# 3. FOG SUBSET AND TRANSFORMS (K-means, GMM, PCA)
# ----------------------------------------------------
df_fog = df[df["FOG_FLAG"]].copy()
print("Fog-like hourly records:", len(df_fog))

# Features for clustering / density: (Hum, Temp)
X_fog_raw = df_fog[["Hum", "Temp"]].values
scaler_fog = StandardScaler()
X_fog = scaler_fog.fit_transform(X_fog_raw)

# --- K-means on fog (Hum, Temp) ---
K_kmeans = 4
K_kmeans = min(K_kmeans, len(df_fog))  # safety
kmeans = KMeans(n_clusters=K_kmeans, random_state=0, n_init=10)
labels_km = kmeans.fit_predict(X_fog)

centers_scaled = kmeans.cluster_centers_
centers = scaler_fog.inverse_transform(centers_scaled)

# --- GMM density on fog (Hum, Temp) ---
max_components = min(5, len(df_fog))
bics = []
gmms = []
components_range = range(1, max_components + 1)

for m in components_range:
    gmm = GaussianMixture(
        n_components=m,
        covariance_type="full",
        random_state=0
    )
    gmm.fit(X_fog)
    gmms.append(gmm)
    bics.append(gmm.bic(X_fog))

best_idx = int(np.argmin(bics))
best_M = components_range[best_idx]
gmm_best = gmms[best_idx]
print("Best GMM components for fog (by BIC):", best_M)

# Grid for density
hum_min, hum_max = df_fog["Hum"].min(), df_fog["Hum"].max()
temp_min, temp_max = df_fog["Temp"].min(), df_fog["Temp"].max()

hum_vals = np.linspace(hum_min, hum_max, 80)
temp_vals = np.linspace(temp_min, temp_max, 80)
HH, TT = np.meshgrid(hum_vals, temp_vals)
grid_points = np.column_stack([HH.ravel(), TT.ravel()])
grid_scaled = scaler_fog.transform(grid_points)
logp = gmm_best.score_samples(grid_scaled)
P = np.exp(logp).reshape(HH.shape)

# --- PCA on fog (Hum, Temp, Prec) ---
X_pca_raw = df_fog[["Hum", "Temp", "Prec"]].values
scaler_pca = StandardScaler()
X_pca_scaled = scaler_pca.fit_transform(X_pca_raw)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_pca_scaled)
expl_var = pca.explained_variance_ratio_

# ----------------------------------------------------
# 4. COMPRESSED-SENSING STYLE RECONSTRUCTION (TEMP)
# ----------------------------------------------------
df_2019 = df[df["YEAR"] == 2019].copy()
temp_series_2019 = df_2019["Temp"].values
N_max = 512
N = min(N_max, len(temp_series_2019))
signal = temp_series_2019[:N]
t_seg = np.arange(N)

# FFT and sparse reconstruction
F = np.fft.rfft(signal)
K_sparse = 20
idx_sorted = np.argsort(np.abs(F))[::-1]
F_sparse = np.zeros_like(F, dtype=complex)
F_sparse[idx_sorted[:K_sparse]] = F[idx_sorted[:K_sparse]]
signal_recon = np.fft.irfft(F_sparse, n=N)
error_signal = signal - signal_recon

# ----------------------------------------------------
# 5. BUILD 1920x1080 SUMMARY FIGURE
# ----------------------------------------------------
# 1920x1080 at 100 dpi -> 19.2 x 10.8 inches
fig = plt.figure(figsize=(19.2, 10.8), dpi=100)
gs = GridSpec(2, 3, figure=fig)

# --- Panel 1: yearly mean Temp & Hum ---
ax1 = fig.add_subplot(gs[0, 0])
ax1.plot(summary["YEAR"], summary["temp_mean"], marker="o", label="Mean Temp (°C)")
ax1.plot(summary["YEAR"], summary["hum_mean"], marker="s", label="Mean Hum (%)")
ax1.set_xlabel("Year")
ax1.set_ylabel("Value")
ax1.set_title("Yearly mean temperature and humidity\n(2019–2024)")
ax1.grid(True, alpha=0.3)
ax1.set_xticks(years_arr)
ax1.legend(fontsize=9)

# --- Panel 2: humidity + fog% with plant shutdown ---
ax2 = fig.add_subplot(gs[0, 1])
ax2.plot(summary["YEAR"], summary["hum_mean"], marker="o", linewidth=2, label="Mean Humidity")
ax2.fill_between(
    summary["YEAR"],
    summary["hum_median"],
    summary["hum_p95"],
    alpha=0.15,
    label="Median–95th percentile"
)
ax2.set_xlabel("Year")
ax2.set_ylabel("Humidity (%)")
ax2.set_title("Humidity and fog-like hours\nAs Pontes power plant shutdown (2023)")
ax2.grid(True, alpha=0.3)
ax2.set_xticks(years_arr)

ax2.axvline(x=2023, color="gray", linestyle="--", alpha=0.7)

ax2b = ax2.twinx()
ax2b.plot(summary["YEAR"], summary["fog_pct"], marker="s", linewidth=2, color="tab:red",
          label="Fog-like hours (%)")
ax2b.set_ylabel("Fog-like hours (%)", color="tab:red")
ax2b.tick_params(axis="y", labelcolor="tab:red")

lines1, labels1 = ax2.get_legend_handles_labels()
lines2, labels2 = ax2b.get_legend_handles_labels()
ax2.legend(lines1 + lines2, labels1 + labels2, loc="upper left", fontsize=8)

# --- Panel 3: K-means clustering of fog (Hum, Temp) ---
ax3 = fig.add_subplot(gs[0, 2])
sc3 = ax3.scatter(
    df_fog["Hum"],
    df_fog["Temp"],
    c=labels_km,
    s=5,
    alpha=0.5
)
ax3.scatter(
    centers[:, 0],
    centers[:, 1],
    marker="X",
    s=80,
    edgecolor="white",
    linewidth=1.0,
    c=range(K_kmeans)
)
ax3.set_xlabel("Humidity (Hum, %)")
ax3.set_ylabel("Temperature (Temp, °C)")
ax3.set_title("Fog regimes: K-means clustering\nin (Hum, Temp) space")
ax3.grid(True, alpha=0.3)

# --- Panel 4: GMM density for fog (Hum, Temp) ---
ax4 = fig.add_subplot(gs[1, 0])
cont = ax4.contourf(HH, TT, P, levels=15)
fig.colorbar(cont, ax=ax4, fraction=0.046, pad=0.04)
ax4.scatter(
    df_fog["Hum"],
    df_fog["Temp"],
    s=3,
    alpha=0.25,
    edgecolors="none"
)
ax4.set_xlabel("Humidity (Hum, %)")
ax4.set_ylabel("Temperature (Temp, °C)")
ax4.set_title(f"GMM density estimation for fog\nM = {best_M} components")
ax4.grid(True, alpha=0.3)

# --- Panel 5: PCA of fog (Hum, Temp, Prec) ---
ax5 = fig.add_subplot(gs[1, 1])
ax5.scatter(
    X_pca[:, 0],
    X_pca[:, 1],
    s=5,
    alpha=0.6
)
ax5.set_xlabel(f"PC1 ({expl_var[0]*100:.1f}% var)")
ax5.set_ylabel(f"PC2 ({expl_var[1]*100:.1f}% var)")
ax5.set_title("PCA of fog conditions\n(Hum, Temp, Prec)")
ax5.grid(True, alpha=0.3)

# --- Panel 6: compressed-sensing style reconstruction (Temp) ---
ax6 = fig.add_subplot(gs[1, 2])
ax6.plot(t_seg, signal, label="Original Temp (2019 segment)", linewidth=1.0)
ax6.plot(t_seg, signal_recon, label=f"Reconstructed (K={K_sparse} Fourier coeffs)",
         linewidth=1.0)
ax6.set_xlabel("Sample index")
ax6.set_ylabel("Temp (°C)")
ax6.set_title("Sparse Fourier reconstruction of temperature")
ax6.grid(True, alpha=0.3)
ax6.legend(fontsize=8)

# Overall title (optional)
fig.suptitle("Data Science for Fab Futures – As Pontes climate analysis (2019–2024)",
             fontsize=16, y=0.98)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()
Years present: [np.int32(2019), np.int32(2020), np.int32(2021), np.int32(2022), np.int32(2023), np.int32(2024)]
Total rows: 43343
Fog-like hourly records: 11988
Best GMM components for fog (by BIC): 5
No description has been provided for this image

The summary figure brings together the main ideas explored during the course using my AEMET data for As Pontes (2019–2024):

  • Top left: yearly mean temperature and humidity, showing the overall climate context of the period.
  • Top centre: mean humidity and percentage of fog-like hours per year, with a vertical line marking 2023, when the As Pontes thermal power plant stopped operating and its cooling towers no longer contributed extra moisture to the atmosphere.
  • Top right: a K-means clustering of fog records in the (Hum, Temp) plane, revealing distinct fog regimes (colder vs slightly warmer, different humidity bands).
  • Bottom left: a GMM density estimation over the same (Hum, Temp) space, showing where fog conditions are most likely to occur.
  • Bottom centre: a PCA projection of fog conditions using (Hum, Temp, Prec), compressing three correlated variables into two principal components and highlighting structure within fog episodes.
  • Bottom right: a compressed-sensing style reconstruction of temperature, where a short segment of the 2019 series is rebuilt from only a small number of Fourier coefficients, illustrating sparsity and transform-based representations.

Together, these panels summarise the path of the project: from basic statistics and visualisation, through clustering and probabilistic modelling of fog, to advanced transforms (PCA, density estimation and sparse reconstructions) used to explore how local climate patterns may have changed around the shutdown of the As Pontes power plant.

See you in the Fab Labs of the Future.

In [ ]: