Day 7: Transform¶
Assignment 9/12/2025¶
- Analyze your data set
- prepare a notebook with the analysis of your data set, store it in your repo, and call it presentation.ipynb
- include a 1920x1080 summary slide describing you, your data, and your analysis, store it your repo's images folder, and call it presentation.png
PCA of fog conditions (Hum, Temp, Prec)¶
I applied Principal Component Analysis (PCA) to the fog-only records using the features (Hum, Temp, Prec). After standardising the variables, I projected the data onto the first two principal components. The PCA plot shows how the fog situations are distributed in a low-dimensional space that captures most of the variance. When coloured by the K-means clusters, the figure also reveals how the different fog regimes are separated along the main directions of variability in humidity, temperature and precipitation.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# ----------------------------------------------------
# 0. LOAD AND PREPARE DATA (FOG CONDITIONS)
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Hum", "Temp", "Prec"]).copy()
df["YEAR"] = df["UTC"].dt.year
# 2019–2024
df = df[df["YEAR"].between(2019, 2024)].copy()
# Fog-like condition
fog_mask = (
(df["Hum"] >= 95.0) &
(df["Temp"] >= -2.0) &
(df["Temp"] <= 15.0)
)
df_fog = df[fog_mask].copy()
print("Fog-like hourly records for PCA:", len(df_fog))
# ----------------------------------------------------
# 1. FEATURE MATRIX FOR PCA: X = [Hum, Temp, Prec]
# ----------------------------------------------------
X_raw = df_fog[["Hum", "Temp", "Prec"]].values
# Standardize before PCA
scaler_pca = StandardScaler()
X_scaled = scaler_pca.fit_transform(X_raw)
# ----------------------------------------------------
# 2. PCA: REDUCE TO 2 COMPONENTS
# ----------------------------------------------------
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
expl_var = pca.explained_variance_ratio_
print("Explained variance ratio (PC1, PC2):", expl_var)
# ----------------------------------------------------
# 3. TRY TO USE K-MEANS LABELS IF ALREADY DEFINED
# ----------------------------------------------------
try:
# If labels_km already exists from previous K-means on fog
color_labels = labels_km[:len(df_fog)]
use_clusters = True
except NameError:
# Otherwise, use a single color
color_labels = "tab:blue"
use_clusters = False
# ----------------------------------------------------
# 4. PLOT: PCA SCATTER
# ----------------------------------------------------
plt.figure(figsize=(7, 5))
plt.scatter(
X_pca[:, 0],
X_pca[:, 1],
c=color_labels,
s=8,
alpha=0.6
)
plt.xlabel(f"PC1 ({expl_var[0]*100:.1f}% var)")
plt.ylabel(f"PC2 ({expl_var[1]*100:.1f}% var)")
title_extra = " (coloured by K-means cluster)" if use_clusters else ""
plt.title("PCA of fog conditions (Hum, Temp, Prec)" + title_extra)
plt.grid(True)
plt.tight_layout()
plt.show()
Fog-like hourly records for PCA: 11988 Explained variance ratio (PC1, PC2): [0.42381013 0.31488568]
Principal Components Analysis (PCA)¶
Principal Components Analysis (PCA) is a linear transform that finds a new set of axes (the principal components) such that:
- The components are uncorrelated with each other.
- The first components maximize the explained variance of the data.
In practice, PCA is used to:
- Reduce dimensionality, keeping only the most informative directions.
- Discover features or patterns in high-dimensional data.
- Make low-dimensional plots (2D or 3D) of high-dimensional datasets, so that structure and clusters are easier to visualise.
Mathematically, PCA is based on diagonalising the covariance matrix of the data. The principal components correspond to the eigenvectors of this covariance matrix, and the eigenvalues indicate how much variance is explained by each component.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
# ----------------------------------------------------
# 1. LOAD DATA AND BASIC PREPROCESSING
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
# Parse UTC datetime (with timezone)
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
# Keep only rows with valid main variables
df = df.dropna(subset=["UTC", "Temp", "Hum", "Prec"]).copy()
# Time fields
df["YEAR"] = df["UTC"].dt.year
# Restrict to 2019–2024
df = df[df["YEAR"].between(2019, 2024)].copy()
df = df.sort_values("UTC").reset_index(drop=True)
print("Rows used:", len(df))
print("Years present:", df["YEAR"].unique())
# ----------------------------------------------------
# 2. BUILD FEATURE MATRIX: X = [Temp, Hum, Prec]
# ----------------------------------------------------
X_raw = df[["Temp", "Hum", "Prec"]].values
# Standardize features (zero mean, unit variance)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_raw)
# ----------------------------------------------------
# 3. PCA: REDUCE TO 2 PRINCIPAL COMPONENTS
# ----------------------------------------------------
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
expl_var = pca.explained_variance_ratio_
print(f"Explained variance ratio: PC1={expl_var[0]:.3f}, PC2={expl_var[1]:.3f}")
# ----------------------------------------------------
# 4. PLOT: PCA SCATTER (PC1 vs PC2), COLOURED BY YEAR
# ----------------------------------------------------
plt.figure(figsize=(8, 6))
scatter = plt.scatter(
X_pca[:, 0],
X_pca[:, 1],
c=df["YEAR"],
s=8,
alpha=0.6,
cmap="viridis"
)
cbar = plt.colorbar(scatter)
cbar.set_label("Year")
plt.xlabel(f"PC1 ({expl_var[0]*100:.1f}% variance)")
plt.ylabel(f"PC2 ({expl_var[1]*100:.1f}% variance)")
plt.title("PCA of weather data (Temp, Hum, Prec)\n2019–2024, coloured by year")
plt.grid(True, alpha=0.4)
plt.tight_layout()
plt.show()
Rows used: 43343 Years present: [2019 2020 2021 2022 2023 2024] Explained variance ratio: PC1=0.565, PC2=0.321
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X = mnist.data # shape (N, 784)
y = mnist.target # labels 0–9
print("MNIST X shape:", X.shape) # e.g. (70000, 784)
# If you want to see one image as 28x28 again:
img0 = X[0].reshape(28, 28)
MNIST X shape: (70000, 784)
X_raw = df[["Temp", "Hum", "Prec"]].values
print(X_raw.shape) # (N_hours, 3)
(43343, 3)
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
# 1. Load MNIST
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X = mnist.data # shape (N, 784)
y = mnist.target.astype(int)
print("MNIST X shape:", X.shape) # e.g. (70000, 784)
# (Optionally subsample to go faster)
N = 5000
X_sub = X[:N]
y_sub = y[:N]
# 2. PCA to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_sub)
print("Explained variance ratio:", pca.explained_variance_ratio_)
# 3. Plot PC1 vs PC2 coloured by digit
plt.figure(figsize=(7, 6))
scatter = plt.scatter(
X_pca[:, 0],
X_pca[:, 1],
c=y_sub,
s=5,
alpha=0.6,
cmap="tab10"
)
plt.colorbar(scatter, label="Digit label")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("PCA of MNIST (subsample)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
MNIST X shape: (70000, 784) Explained variance ratio: [0.09867566 0.07404546]
Although PCA is often introduced with image datasets like MNIST, the same idea applies directly to my weather and fog data. In MNIST, each 28×28 image is flattened into a 784-dimensional vector, so the data matrix has shape (N, 784); PCA is then used to find a low-dimensional embedding (e.g. 2D) that preserves as much variance as possible and reveals structure between digit classes. In my case, each hourly record is a much smaller feature vector, for example (Temp, Hum, Prec), so the data matrix has shape (N, 3). The dimensionality reduction is therefore less dramatic, but PCA still plays the same role: it produces uncorrelated principal components that concentrate most of the variability in fog conditions and makes it easier to visualise different “weather regimes” in a 2D plot, in exactly the same spirit as visualising clusters of digits in MNIST.
Independent Components Analysis (ICA)¶
Independent Components Analysis (ICA) is a statistical technique used to recover latent sources from observed mixed signals. A classic example is the cocktail party problem: several people are speaking at the same time, microphones record linear mixtures of their voices, and we want to recover the individual speakers from those mixed recordings.
In general, ICA assumes that:
- The observed data are linear mixtures of some unknown source signals.
- These latent sources are statistically independent and non-Gaussian.
- The mixing process is unknown.
The goal is to find a linear transform
[ X = A , S \quad \Rightarrow \quad S = W , X ]
where:
- (X) is the matrix of observed signals (each row/column is a mixed signal),
- (S) is the matrix of recovered independent components (estimated sources),
- (A) is the unknown mixing matrix, and
- (W) is an unmixing matrix learned from the data.
Unlike PCA, which decorrelates the data and orders components by variance, ICA goes further and tries to make the components as statistically independent as possible. This makes ICA particularly useful for blind source separation.
In practice, ICA is often implemented using algorithms like FastICA, which iteratively finds directions that maximize a measure of non-Gaussianity (such as kurtosis or negentropy). The resulting components can be interpreted as underlying “sources” that, when linearly combined, explain the observed signals.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# -----------------------------------------
# LOAD DATA
# -----------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
# Keep only rows with valid values
df = df.dropna(subset=["UTC", "Hum", "Prec"]).copy()
df["YEAR"] = df["UTC"].dt.year
print("Rows:", len(df))
print("Years:", df["YEAR"].unique())
print(df[["UTC", "Hum", "Prec"]].head())
Rows: 43343
Years: [2019 2020 2021 2022 2023 2024]
UTC Hum Prec
0 2019-02-15 21:00:00+00:00 81.0 0.0
1 2019-02-15 22:00:00+00:00 85.0 0.0
2 2019-02-15 23:00:00+00:00 89.0 0.0
3 2019-02-16 00:00:00+00:00 90.0 0.0
4 2019-02-16 01:00:00+00:00 93.0 0.0
# -----------------------------------------
# SCATTER: HUMIDITY vs PRECIPITATION
# -----------------------------------------
plt.figure(figsize=(7, 5))
plt.scatter(
df["Hum"],
df["Prec"],
s=5,
alpha=0.3
)
plt.xlabel("Humidity (Hum, %)")
plt.ylabel("Precipitation (Prec, mm)")
plt.title("Humidity vs precipitation (all records)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Now we want to quantify the relationship:
- For each humidity band (e.g. 0–5%, 5–10%, … 95–100%):
- Probability of rain (fraction of records with Prec > 0)
- Mean precipitation when it rains.
# -----------------------------------------
# BINNED RELATION: HUMIDITY → RAIN PROBABILITY & MEAN PREC
# -----------------------------------------
# Define humidity bins (0–100% in steps of 5)
bins = np.arange(0, 105, 5) # 0,5,10,...,100
df["hum_bin"] = pd.cut(df["Hum"], bins=bins, include_lowest=True)
# For each bin, compute:
# - rain_freq: fraction of times with Prec > 0
# - prec_mean: mean Prec (including zeros)
# - count: number of samples in that bin
stats = (
df.groupby("hum_bin", observed=False) # <-- añadido observed=False
.agg(
rain_freq=("Prec", lambda x: (x > 0).mean()),
prec_mean=("Prec", "mean"),
count=("Prec", "size")
)
.reset_index()
)
# Compute bin centres for plotting
bin_centers = np.array([interval.mid for interval in stats["hum_bin"]])
print(stats.head())
fig, ax1 = plt.subplots(figsize=(8, 5))
# Left axis: probability of rain vs humidity
ax1.plot(bin_centers, stats["rain_freq"], marker="o", linewidth=2, label="P(Prec > 0)")
ax1.set_xlabel("Humidity (%)")
ax1.set_ylabel("Rain probability", color="tab:blue")
ax1.tick_params(axis="y", labelcolor="tab:blue")
ax1.set_ylim(0, 1)
ax1.grid(True, axis="both", alpha=0.3)
# Right axis: mean precipitation vs humidity
ax2 = ax1.twinx()
ax2.plot(bin_centers, stats["prec_mean"], marker="s", linewidth=2, color="tab:red", label="Mean Prec (mm)")
ax2.set_ylabel("Mean precipitation (mm)", color="tab:red")
ax2.tick_params(axis="y", labelcolor="tab:red")
plt.title("Relation between humidity and precipitation")
plt.tight_layout()
plt.show()
hum_bin rain_freq prec_mean count 0 (-0.001, 5.0] NaN NaN 0 1 (5.0, 10.0] NaN NaN 0 2 (10.0, 15.0] NaN NaN 0 3 (15.0, 20.0] 0.0 0.0 1 4 (20.0, 25.0] 0.0 0.0 20
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# ----------------------------------------------------
# 1. LOAD DATA AND EXTRACT A 1D TEMPERATURE SIGNAL
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
# Parse time
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Temp"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)
# Optional: restrict to one specific year (e.g. 2019) for a cleaner segment
df["YEAR"] = df["UTC"].dt.year
df_year = df[df["YEAR"] == 2019].copy()
# Extract a contiguous segment of N points from the temperature series
N = 512 # length of segment (power of 2 works nicely with FFT)
temp_series = df_year["Temp"].values
if len(temp_series) < N:
raise ValueError(f"Not enough samples in selected year for N={N}")
signal = temp_series[:N] # take the first N samples
t = np.arange(N) # time index (sample index)
print("Segment length:", N)
# ----------------------------------------------------
# 2. FFT: TRANSFORM TO FREQUENCY DOMAIN
# ----------------------------------------------------
# Real FFT: we work with complex frequency coefficients
F = np.fft.rfft(signal) # shape ~ (N/2+1,)
freqs = np.fft.rfftfreq(N, d=1.0) # not strictly needed, but good to have
# ----------------------------------------------------
# 3. COMPRESSED REPRESENTATION: KEEP ONLY K LARGEST COEFFICIENTS
# ----------------------------------------------------
K = 20 # number of non-zero frequency components to keep
print(f"Keeping only the {K} largest frequency coefficients (by magnitude).")
# Indices of coefficients sorted by magnitude (descending)
idx_sorted = np.argsort(np.abs(F))[::-1]
# Build a sparse spectrum: zeros except the K largest coefficients
F_sparse = np.zeros_like(F, dtype=complex)
F_sparse[idx_sorted[:K]] = F[idx_sorted[:K]]
# ----------------------------------------------------
# 4. RECONSTRUCTION FROM SPARSE COEFFICIENTS (INVERSE FFT)
# ----------------------------------------------------
signal_recon = np.fft.irfft(F_sparse, n=N)
# ----------------------------------------------------
# 5. PLOTS: ORIGINAL VS COMPRESSED-SENSING-STYLE RECONSTRUCTION
# ----------------------------------------------------
fig, axes = plt.subplots(3, 1, figsize=(10, 8), sharex=True)
# Original signal
axes[0].plot(t, signal, label="Original Temp")
axes[0].set_ylabel("Temp (°C)")
axes[0].set_title("Original temperature segment")
axes[0].grid(True, alpha=0.3)
# Reconstructed signal (from K coefficients)
axes[1].plot(t, signal_recon, color="tab:orange", label="Reconstructed Temp (K sparse)")
axes[1].set_ylabel("Temp (°C)")
axes[1].set_title(f"Reconstruction from {K} largest Fourier coefficients")
axes[1].grid(True, alpha=0.3)
# Error (difference)
axes[2].plot(t, signal - signal_recon)
axes[2].set_xlabel("Sample index")
axes[2].set_ylabel("Error")
axes[2].set_title("Reconstruction error (Original - Reconstructed)")
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Segment length: 512 Keeping only the 20 largest frequency coefficients (by magnitude).
Compressed sensing–style sparse reconstruction of temperature¶
To illustrate the idea of compressed sensing and sparse representations, I took a 1D segment of the hourly temperature signal Temp (length (N = 512)) and transformed it to the frequency domain using a real FFT. In the Fourier domain, each coefficient represents a sinusoidal component of a specific frequency.
Instead of using all frequency coefficients, I kept only the (K) coefficients with the largest magnitude (e.g. (K = 20)) and set all the others to zero. This produces a sparse representation of the signal in the Fourier basis. I then reconstructed the time-domain signal using the inverse FFT applied to this sparse spectrum.
The resulting plots show:
- The original temperature segment.
- The reconstructed signal using only (K) non-zero Fourier coefficients.
- The reconstruction error (original minus reconstructed).
Even though most of the frequency coefficients are discarded, the reconstructed signal is still a good approximation of the original. This demonstrates the core idea behind compressed sensing: if a signal is sparse (or compressible) in a suitable transform domain (here, the Fourier domain), it can be represented and reconstructed accurately from a relatively small number of coefficients.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# ----------------------------------------------------
# 1. LOAD DATA AND EXTRACT HUMIDITY & PRECIP SIGNALS
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
# Parse time
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Hum", "Prec"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)
# Pick one year to get a reasonably continuous segment
df["YEAR"] = df["UTC"].dt.year
df_year = df[df["YEAR"] == 2019].copy()
hum_series = df_year["Hum"].values
prec_series = df_year["Prec"].values
# Segment length
N = 512 # power of 2 works nicely
N = min(N, len(hum_series), len(prec_series))
if N < 32:
raise ValueError(f"Not enough samples in 2019 for N={N}")
hum_signal = hum_series[:N]
prec_signal = prec_series[:N]
t = np.arange(N)
print("Segment length used:", N)
# ----------------------------------------------------
# 2. HELPER FUNCTION: SPARSE FFT RECONSTRUCTION
# ----------------------------------------------------
def sparse_fft_reconstruction(signal, K, var_name, y_label):
"""
signal: 1D numpy array (length N)
K: number of largest Fourier coefficients to keep
var_name: string for titles
y_label: label for y-axis
"""
N = len(signal)
# Full FFT (real-valued input -> rfft)
F = np.fft.rfft(signal)
freqs = np.fft.rfftfreq(N, d=1.0)
# Keep only K largest coefficients by magnitude
idx_sorted = np.argsort(np.abs(F))[::-1]
F_sparse = np.zeros_like(F, dtype=complex)
F_sparse[idx_sorted[:K]] = F[idx_sorted[:K]]
# Reconstruct from sparse spectrum
recon = np.fft.irfft(F_sparse, n=N)
# Plot
fig, axes = plt.subplots(3, 1, figsize=(10, 8), sharex=True)
t = np.arange(N)
# Original
axes[0].plot(t, signal)
axes[0].set_ylabel(y_label)
axes[0].set_title(f"Original {var_name} segment")
axes[0].grid(True, alpha=0.3)
# Reconstructed
axes[1].plot(t, recon, color="tab:orange")
axes[1].set_ylabel(y_label)
axes[1].set_title(f"Reconstruction of {var_name} from {K} largest Fourier coefficients")
axes[1].grid(True, alpha=0.3)
# Error
axes[2].plot(t, signal - recon)
axes[2].set_xlabel("Sample index")
axes[2].set_ylabel("Error")
axes[2].set_title(f"Reconstruction error ({var_name} original - reconstructed)")
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return recon
# ----------------------------------------------------
# 3. APPLY TO HUMIDITY AND PRECIPITATION
# ----------------------------------------------------
K = 20 # number of non-zero Fourier coefficients to keep
print(f"\n=== Humidity (Hum) sparse reconstruction with K={K} ===")
hum_recon = sparse_fft_reconstruction(hum_signal, K, "Humidity", "Hum (%)")
print(f"\n=== Precipitation (Prec) sparse reconstruction with K={K} ===")
prec_recon = sparse_fft_reconstruction(prec_signal, K, "Precipitation", "Prec (mm)")
Segment length used: 512 === Humidity (Hum) sparse reconstruction with K=20 ===
=== Precipitation (Prec) sparse reconstruction with K=20 ===
Compressed sensing–style sparse reconstruction of humidity and precipitation¶
I repeated the sparse Fourier reconstruction experiment for the humidity Hum and precipitation Prec signals. Using a segment of length (N) from 2019, I computed the FFT of each variable and then kept only the (K) frequency coefficients with the largest magnitude (e.g. (K = 20)). All other coefficients were set to zero, and I reconstructed the time-domain signal using the inverse FFT.
For humidity, the reconstructed signal is very close to the original one: the main slow variations of Hum can be represented by a relatively small number of Fourier modes, and the reconstruction error remains small. This indicates that humidity is quite compressible in the frequency domain, dominated by low-frequency components such as the daily cycle and slower trends.
For precipitation, the behaviour is different: Prec is much more spiky and intermittent, with many zeros and sudden peaks. A sparse reconstruction with only a few Fourier coefficients can approximate the general level, but it tends to smooth out sharp rainfall events. This contrast illustrates how compressed sensing works better for signals that are naturally sparse or smooth in the chosen transform domain (like humidity and temperature in the Fourier basis) than for highly irregular, bursty signals like precipitation.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# ----------------------------------------------------
# 1. LOAD DATA
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Hum", "Prec"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)
df["YEAR"] = df["UTC"].dt.year
print("Years present:", df["YEAR"].unique())
# ----------------------------------------------------
# 2. SELECT YEARS 2019 AND 2024
# ----------------------------------------------------
df_2019 = df[df["YEAR"] == 2019].copy()
df_2024 = df[df["YEAR"] == 2024].copy()
print("Samples 2019:", len(df_2019))
print("Samples 2024:", len(df_2024))
if len(df_2019) == 0 or len(df_2024) == 0:
raise ValueError("No data for 2019 or 2024 in the CSV.")
# ----------------------------------------------------
# 3. CHOOSE SEGMENT LENGTH N (SAME FOR BOTH YEARS)
# ----------------------------------------------------
N_max = 1024 # máximo deseado
N = min(N_max, len(df_2019), len(df_2024))
if N < 64:
raise ValueError(f"Not enough samples for a meaningful comparison, N={N}")
print("Segment length used for each year:", N)
hum_2019 = df_2019["Hum"].values[:N]
hum_2024 = df_2024["Hum"].values[:N]
prec_2019 = df_2019["Prec"].values[:N]
prec_2024 = df_2024["Prec"].values[:N]
t = np.arange(N)
# ----------------------------------------------------
# 4. HELPER FUNCTION: SPARSE FFT RECONSTRUCTION + ERROR
# ----------------------------------------------------
def sparse_fft_reconstruction(signal, K):
"""
signal: 1D array of length N
K: number of largest Fourier coefficients to keep
Returns: reconstructed_signal, rmse
"""
N = len(signal)
# Real FFT
F = np.fft.rfft(signal)
# indices sorted by magnitude (descending)
idx_sorted = np.argsort(np.abs(F))[::-1]
# sparse spectrum
F_sparse = np.zeros_like(F, dtype=complex)
F_sparse[idx_sorted[:K]] = F[idx_sorted[:K]]
# inverse FFT
recon = np.fft.irfft(F_sparse, n=N)
# root mean squared error
rmse = np.sqrt(np.mean((signal - recon) ** 2))
return recon, rmse
# ----------------------------------------------------
# 5. APPLY TO HUMIDITY AND PRECIPITATION (2019 vs 2024)
# ----------------------------------------------------
K = 20 # número de coeficientes no nulos
print(f"Using K={K} largest Fourier coefficients for each signal/year")
# Humidity
hum_2019_recon, hum_2019_rmse = sparse_fft_reconstruction(hum_2019, K)
hum_2024_recon, hum_2024_rmse = sparse_fft_reconstruction(hum_2024, K)
print(f"Humidity RMSE 2019: {hum_2019_rmse:.3f}")
print(f"Humidity RMSE 2024: {hum_2024_rmse:.3f}")
# Precipitation
prec_2019_recon, prec_2019_rmse = sparse_fft_reconstruction(prec_2019, K)
prec_2024_recon, prec_2024_rmse = sparse_fft_reconstruction(prec_2024, K)
print(f"Precipitation RMSE 2019: {prec_2019_rmse:.3f}")
print(f"Precipitation RMSE 2024: {prec_2024_rmse:.3f}")
# ----------------------------------------------------
# 6. PLOTS: HUMIDITY 2019 vs 2024
# ----------------------------------------------------
fig, axes = plt.subplots(3, 2, figsize=(12, 8), sharex=True)
# Column 0: 2019, Column 1: 2024
# Row 0: original, Row 1: reconstructed, Row 2: error
# --- 2019 HUMIDITY ---
axes[0, 0].plot(t, hum_2019)
axes[0, 0].set_title("Humidity 2019 - Original")
axes[0, 0].set_ylabel("Hum (%)")
axes[0, 0].grid(True, alpha=0.3)
axes[1, 0].plot(t, hum_2019_recon)
axes[1, 0].set_title(f"Humidity 2019 - Reconstructed (K={K})")
axes[1, 0].set_ylabel("Hum (%)")
axes[1, 0].grid(True, alpha=0.3)
axes[2, 0].plot(t, hum_2019 - hum_2019_recon)
axes[2, 0].set_title(f"Humidity 2019 - Error (RMSE={hum_2019_rmse:.2f})")
axes[2, 0].set_xlabel("Sample index")
axes[2, 0].set_ylabel("Error")
axes[2, 0].grid(True, alpha=0.3)
# --- 2024 HUMIDITY ---
axes[0, 1].plot(t, hum_2024)
axes[0, 1].set_title("Humidity 2024 - Original")
axes[0, 1].grid(True, alpha=0.3)
axes[1, 1].plot(t, hum_2024_recon)
axes[1, 1].set_title(f"Humidity 2024 - Reconstructed (K={K})")
axes[1, 1].grid(True, alpha=0.3)
axes[2, 1].plot(t, hum_2024 - hum_2024_recon)
axes[2, 1].set_title(f"Humidity 2024 - Error (RMSE={hum_2024_rmse:.2f})")
axes[2, 1].set_xlabel("Sample index")
axes[2, 1].grid(True, alpha=0.3)
plt.suptitle("Sparse Fourier reconstruction of humidity (2019 vs 2024)", y=1.02)
plt.tight_layout()
plt.show()
# ----------------------------------------------------
# 7. PLOTS: PRECIPITATION 2019 vs 2024
# ----------------------------------------------------
fig, axes = plt.subplots(3, 2, figsize=(12, 8), sharex=True)
# --- 2019 PRECIP ---
axes[0, 0].plot(t, prec_2019)
axes[0, 0].set_title("Precipitation 2019 - Original")
axes[0, 0].set_ylabel("Prec (mm)")
axes[0, 0].grid(True, alpha=0.3)
axes[1, 0].plot(t, prec_2019_recon)
axes[1, 0].set_title(f"Precipitation 2019 - Reconstructed (K={K})")
axes[1, 0].set_ylabel("Prec (mm)")
axes[1, 0].grid(True, alpha=0.3)
axes[2, 0].plot(t, prec_2019 - prec_2019_recon)
axes[2, 0].set_title(f"Precipitation 2019 - Error (RMSE={prec_2019_rmse:.2f})")
axes[2, 0].set_xlabel("Sample index")
axes[2, 0].set_ylabel("Error")
axes[2, 0].grid(True, alpha=0.3)
# --- 2024 PRECIP ---
axes[0, 1].plot(t, prec_2024)
axes[0, 1].set_title("Precipitation 2024 - Original")
axes[0, 1].grid(True, alpha=0.3)
axes[1, 1].plot(t, prec_2024_recon)
axes[1, 1].set_title(f"Precipitation 2024 - Reconstructed (K={K})")
axes[1, 1].grid(True, alpha=0.3)
axes[2, 1].plot(t, prec_2024 - prec_2024_recon)
axes[2, 1].set_title(f"Precipitation 2024 - Error (RMSE={prec_2024_rmse:.2f})")
axes[2, 1].set_xlabel("Sample index")
axes[2, 1].grid(True, alpha=0.3)
plt.suptitle("Sparse Fourier reconstruction of precipitation (2019 vs 2024)", y=1.02)
plt.tight_layout()
plt.show()
Years present: [2019 2020 2021 2022 2023 2024] Samples 2019: 7198 Samples 2024: 1763 Segment length used for each year: 1024 Using K=20 largest Fourier coefficients for each signal/year Humidity RMSE 2019: 8.851 Humidity RMSE 2024: 5.980 Precipitation RMSE 2019: 0.367 Precipitation RMSE 2024: 0.967
Compressed sensing–style comparison between 2019 and 2024 (humidity and precipitation)¶
To compare how compressible the humidity Hum and precipitation Prec signals are in different years, I took equally long segments of length (N) from 2019 and 2024 and applied a sparse Fourier reconstruction to each of them. For each variable and year, I computed the FFT, kept only the (K) largest Fourier coefficients (e.g. (K = 20)) and reconstructed the signal using the inverse FFT.
The resulting figures show, for both 2019 and 2024:
- the original time series segment,
- the reconstructed signal using only (K) non-zero Fourier coefficients,
- and the reconstruction error together with the RMSE.
Humidity is relatively smooth in time, so both years can be approximated reasonably well with a small number of Fourier modes. Precipitation is more intermittent and spiky, so the sparse reconstruction tends to smooth out individual rainfall events and the RMSE is larger. Comparing 2019 and 2024 side by side highlights how the same compressed-sensing style method behaves under different climatic conditions, even though the reconstruction algorithm is exactly the same.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# ----------------------------------------------------
# 1. LOAD DATA
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Hum", "Prec"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)
df["YEAR"] = df["UTC"].dt.year
# Restrict to 2019–2024
years = list(range(2019, 2025))
df = df[df["YEAR"].isin(years)].copy()
print("Years present:", sorted(df["YEAR"].unique()))
print("Total rows:", len(df))
# ----------------------------------------------------
# 2. DEFINE "FOG-LIKE" HUMID CONDITIONS (same criterion as before)
# Hum ≥ 95 % and -2 °C ≤ Temp ≤ 15 °C
# ----------------------------------------------------
if "Temp" not in df.columns:
raise KeyError("Column 'Temp' not found in dataframe; check CSV columns.")
fog_mask = (
(df["Hum"] >= 95.0) &
(df["Temp"] >= -2.0) &
(df["Temp"] <= 15.0)
)
df["FOG_FLAG"] = fog_mask
# Also define "rainy" as Prec > 0 for reference
df["RAIN_FLAG"] = df["Prec"] > 0
# ----------------------------------------------------
# 3. YEARLY SUMMARY STATISTICS
# ----------------------------------------------------
summary = (
df.groupby("YEAR")
.agg(
hum_mean=("Hum", "mean"),
hum_median=("Hum", "median"),
hum_p95=("Hum", lambda x: np.percentile(x, 95)),
fog_hours=("FOG_FLAG", "sum"),
total_hours=("FOG_FLAG", "size"),
rainy_hours=("RAIN_FLAG", "sum"),
prec_mean=("Prec", "mean")
)
.reset_index()
)
summary["fog_pct"] = 100.0 * summary["fog_hours"] / summary["total_hours"]
summary["rainy_pct"] = 100.0 * summary["rainy_hours"] / summary["total_hours"]
print(summary)
# ----------------------------------------------------
# 4. PLOTS: HUMIDITY + FOG PERCENTAGE 2019–2024
# ----------------------------------------------------
years_arr = summary["YEAR"].values
fig, ax1 = plt.subplots(figsize=(9, 5))
# --- Left axis: mean humidity per year ---
ax1.plot(years_arr, summary["hum_mean"], marker="o", linewidth=2, label="Mean Humidity")
ax1.fill_between(
years_arr,
summary["hum_median"],
summary["hum_p95"],
alpha=0.15,
label="Median–95th percentile range"
)
ax1.set_xlabel("Year")
ax1.set_ylabel("Humidity (%)")
ax1.grid(True, axis="both", alpha=0.3)
ax1.set_xticks(years_arr)
# Mark the power plant shutdown year (2023)
ax1.axvline(x=2023, color="gray", linestyle="--", alpha=0.7)
ax1.text(2023 + 0.05, ax1.get_ylim()[1] * 0.95,
"Power plant shutdown\n(cooling towers off)",
fontsize=9, va="top", ha="left", color="gray")
# --- Right axis: percentage of fog-like hours ---
ax2 = ax1.twinx()
ax2.plot(
years_arr,
summary["fog_pct"],
marker="s",
linewidth=2,
color="tab:red",
label="Fog-like hours (%)"
)
ax2.set_ylabel("Fog-like hours (%)", color="tab:red")
ax2.tick_params(axis="y", labelcolor="tab:red")
# Combine legends from both axes
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc="upper left")
plt.title("Humidity and fog-like conditions per year (2019–2024)")
plt.tight_layout()
plt.show()
# ----------------------------------------------------
# 5. OPTIONAL: PRECIPITATION / RAINY HOURS PER YEAR
# ----------------------------------------------------
fig, ax3 = plt.subplots(figsize=(9, 4))
ax3.bar(years_arr - 0.15, summary["rainy_pct"], width=0.3, label="Rainy hours (%)")
ax3.bar(years_arr + 0.15, summary["prec_mean"], width=0.3, label="Mean Prec (mm)")
ax3.set_xticks(years_arr)
ax3.set_xlabel("Year")
ax3.set_ylabel("Value")
ax3.grid(True, axis="y", alpha=0.3)
ax3.axvline(x=2023, color="gray", linestyle="--", alpha=0.7)
ax3.set_title("Rain statistics per year (2019–2024)")
ax3.legend()
plt.tight_layout()
plt.show()
Years present: [np.int32(2019), np.int32(2020), np.int32(2021), np.int32(2022), np.int32(2023), np.int32(2024)] Total rows: 43343 YEAR hum_mean hum_median hum_p95 fog_hours total_hours rainy_hours \ 0 2019 83.189219 89.0 100.0 2015 7198 1145 1 2020 85.372612 90.0 100.0 2916 8690 1501 2 2021 84.572478 90.0 100.0 2811 8713 1573 3 2022 82.986023 87.0 99.0 1910 8228 1116 4 2023 83.006056 88.0 99.0 1926 8751 1436 5 2024 85.655133 88.0 99.0 410 1763 477 prec_mean fog_pct rainy_pct 0 0.191803 27.993887 15.907196 1 0.195075 33.555811 17.272727 2 0.209388 32.262137 18.053483 3 0.151920 23.213418 13.563442 4 0.252063 22.008913 16.409553 5 0.416563 23.255814 27.056154
Yearly comparison 2019–2024: humidity, fog and the power plant shutdown¶
To highlight the impact of the local power plant on humidity and fog, I summarised my hourly weather data year by year from 2019 to 2024. For each year I computed:
- the mean relative humidity,
- the median and 95th percentile of
Hum(to characterise the upper tail of very humid situations), - the percentage of fog-like hours, defined as records with
Hum ≥ 95 %and−2 °C ≤ Temp ≤ 15 °C, - and, for reference, the percentage of rainy hours and the mean precipitation.
The main plot shows mean humidity per year together with a shaded band between the median and the 95th percentile, and a second curve with the percentage of fog-like hours. A vertical dashed line marks 2023, the year when the thermal power plant was shut down and the cooling towers stopped injecting additional moisture into the local atmosphere.
This year-by-year view makes it easier to see whether there is a systematic drop in high-humidity and fog conditions after 2023. If the cooling towers were significantly increasing local humidity, we would expect:
- lower mean humidity and/or a lower 95th percentile in 2023–2024 compared to previous years, and
- a reduction in the percentage of fog-like hours after the shutdown.
Although this analysis is purely observational and does not prove causality by itself, it provides a clear visual summary of how the humidity and fog regime evolved between 2019 and 2024 around the power plant shutdown.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# ----------------------------------------------------
# 1. LOAD DATA
# ----------------------------------------------------
df = pd.read_csv("datasets/1363X-20190215-20200416.csv", sep=";")
df["UTC"] = pd.to_datetime(df["UTC"], errors="coerce", utc=True)
df = df.dropna(subset=["UTC", "Hum", "Temp", "Prec"]).copy()
df = df.sort_values("UTC").reset_index(drop=True)
df["YEAR"] = df["UTC"].dt.year
df["HOUR"] = df["UTC"].dt.hour
# Restrict to 2019–2024
years = list(range(2019, 2025))
df = df[df["YEAR"].isin(years)].copy()
print("Years present:", sorted(df["YEAR"].unique()))
print("Total rows:", len(df))
# ----------------------------------------------------
# 2. DEFINE FOG-LIKE CONDITIONS
# same criterion as before: Hum ≥ 95 %, -2 °C ≤ Temp ≤ 15 °C
# ----------------------------------------------------
fog_mask = (
(df["Hum"] >= 95.0) &
(df["Temp"] >= -2.0) &
(df["Temp"] <= 15.0)
)
df["FOG_FLAG"] = fog_mask
df["RAIN_FLAG"] = df["Prec"] > 0
# ----------------------------------------------------
# 3. YEARLY SUMMARY (ALL HOURS)
# ----------------------------------------------------
summary_all = (
df.groupby("YEAR")
.agg(
hum_mean=("Hum", "mean"),
hum_median=("Hum", "median"),
hum_p95=("Hum", lambda x: np.percentile(x, 95)),
fog_hours=("FOG_FLAG", "sum"),
total_hours=("FOG_FLAG", "size"),
rainy_hours=("RAIN_FLAG", "sum"),
prec_mean=("Prec", "mean"),
)
.reset_index()
)
summary_all["fog_pct"] = 100.0 * summary_all["fog_hours"] / summary_all["total_hours"]
summary_all["rainy_pct"] = 100.0 * summary_all["rainy_hours"] / summary_all["total_hours"]
print("\n=== Yearly summary (all hours) ===")
print(summary_all)
# ----------------------------------------------------
# 4. YEARLY SUMMARY (DAWN HOURS ONLY: 00–07)
# ----------------------------------------------------
dawn_mask = df["HOUR"].between(0, 7) # 0,1,2,3,4,5,6,7
df_dawn = df[dawn_mask].copy()
summary_dawn = (
df_dawn.groupby("YEAR")
.agg(
hum_mean=("Hum", "mean"),
hum_median=("Hum", "median"),
hum_p95=("Hum", lambda x: np.percentile(x, 95)),
fog_hours=("FOG_FLAG", "sum"),
total_hours=("FOG_FLAG", "size"),
rainy_hours=("RAIN_FLAG", "sum"),
prec_mean=("Prec", "mean"),
)
.reset_index()
)
summary_dawn["fog_pct"] = 100.0 * summary_dawn["fog_hours"] / summary_dawn["total_hours"]
summary_dawn["rainy_pct"] = 100.0 * summary_dawn["rainy_hours"] / summary_dawn["total_hours"]
print("\n=== Yearly summary (dawn hours 00–07) ===")
print(summary_dawn)
# ----------------------------------------------------
# 5. SIMPLE PRE vs POST AGGREGATION
# pre: 2019–2022, post: 2023–2024
# ----------------------------------------------------
pre_years = [2019, 2020, 2021, 2022]
post_years = [2023, 2024]
def aggregate_period(summary_df, years, label):
sub = summary_df[summary_df["YEAR"].isin(years)]
out = {
"period": label,
"years": f"{years[0]}–{years[-1]}",
"hum_mean": sub["hum_mean"].mean(),
"hum_p95": sub["hum_p95"].mean(),
"fog_pct": sub["fog_pct"].mean(),
"rainy_pct": sub["rainy_pct"].mean(),
"prec_mean": sub["prec_mean"].mean(),
}
return out
# Global (all hours)
pre_all = aggregate_period(summary_all, pre_years, "pre_all")
post_all = aggregate_period(summary_all, post_years, "post_all")
comparison_all = pd.DataFrame([pre_all, post_all])
print("\n=== Pre vs Post comparison (ALL HOURS) ===")
print(comparison_all)
# Dawn only
pre_dawn = aggregate_period(summary_dawn, pre_years, "pre_dawn")
post_dawn = aggregate_period(summary_dawn, post_years, "post_dawn")
comparison_dawn = pd.DataFrame([pre_dawn, post_dawn])
print("\n=== Pre vs Post comparison (DAWN HOURS 00–07) ===")
print(comparison_dawn)
# ----------------------------------------------------
# 6. PLOTS: GLOBAL AND DAWN COMPARISON (HUMIDITY + FOG%)
# ----------------------------------------------------
years_arr = summary_all["YEAR"].values
# Global
fig, ax1 = plt.subplots(figsize=(9, 5))
ax1.plot(years_arr, summary_all["hum_mean"], marker="o", linewidth=2, label="Mean Humidity (all hours)")
ax1.set_xlabel("Year")
ax1.set_ylabel("Humidity (%)")
ax1.grid(True, alpha=0.3)
ax1.set_xticks(years_arr)
ax2 = ax1.twinx()
ax2.plot(years_arr, summary_all["fog_pct"], marker="s", linewidth=2, color="tab:red",
label="Fog-like hours (%) (all hours)")
ax2.set_ylabel("Fog-like hours (%)", color="tab:red")
ax2.tick_params(axis="y", labelcolor="tab:red")
ax1.axvline(x=2023, color="gray", linestyle="--", alpha=0.7)
ax1.text(2023 + 0.05, ax1.get_ylim()[1] * 0.95,
"Power plant shutdown\ncooling towers off (2023)",
fontsize=9, va="top", ha="left", color="gray")
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc="upper left")
plt.title("2019–2024: humidity and fog-like hours (all hours)")
plt.tight_layout()
plt.show()
# Dawn-only plot
years_arr_dawn = summary_dawn["YEAR"].values
fig, ax3 = plt.subplots(figsize=(9, 5))
ax3.plot(years_arr_dawn, summary_dawn["hum_mean"], marker="o", linewidth=2,
label="Mean Humidity (dawn 00–07)")
ax3.set_xlabel("Year")
ax3.set_ylabel("Humidity (%)")
ax3.grid(True, alpha=0.3)
ax3.set_xticks(years_arr_dawn)
ax4 = ax3.twinx()
ax4.plot(years_arr_dawn, summary_dawn["fog_pct"], marker="s", linewidth=2, color="tab:red",
label="Fog-like hours (%) (dawn 00–07)")
ax4.set_ylabel("Fog-like hours (%)", color="tab:red")
ax4.tick_params(axis="y", labelcolor="tab:red")
ax3.axvline(x=2023, color="gray", linestyle="--", alpha=0.7)
ax3.text(2023 + 0.05, ax3.get_ylim()[1] * 0.95,
"Power plant shutdown (2023)",
fontsize=9, va="top", ha="left", color="gray")
lines3, labels3 = ax3.get_legend_handles_labels()
lines4, labels4 = ax4.get_legend_handles_labels()
ax3.legend(lines3 + lines4, labels3 + labels4, loc="upper left")
plt.title("2019–2024: humidity and fog-like hours (dawn 00–07)")
plt.tight_layout()
plt.show()
Years present: [np.int32(2019), np.int32(2020), np.int32(2021), np.int32(2022), np.int32(2023), np.int32(2024)]
Total rows: 43343
=== Yearly summary (all hours) ===
YEAR hum_mean hum_median hum_p95 fog_hours total_hours rainy_hours \
0 2019 83.189219 89.0 100.0 2015 7198 1145
1 2020 85.372612 90.0 100.0 2916 8690 1501
2 2021 84.572478 90.0 100.0 2811 8713 1573
3 2022 82.986023 87.0 99.0 1910 8228 1116
4 2023 83.006056 88.0 99.0 1926 8751 1436
5 2024 85.655133 88.0 99.0 410 1763 477
prec_mean fog_pct rainy_pct
0 0.191803 27.993887 15.907196
1 0.195075 33.555811 17.272727
2 0.209388 32.262137 18.053483
3 0.151920 23.213418 13.563442
4 0.252063 22.008913 16.409553
5 0.416563 23.255814 27.056154
=== Yearly summary (dawn hours 00–07) ===
YEAR hum_mean hum_median hum_p95 fog_hours total_hours rainy_hours \
0 2019 93.409532 95.0 100.0 1184 2371 380
1 2020 94.368949 96.0 100.0 1696 2892 539
2 2021 93.401721 96.0 100.0 1622 2905 551
3 2022 91.895560 94.0 100.0 1196 2748 390
4 2023 92.030822 94.0 100.0 1146 2920 510
5 2024 90.705387 92.0 99.0 210 594 156
prec_mean fog_pct rainy_pct
0 0.173260 49.936736 16.026993
1 0.205325 58.644537 18.637621
2 0.210396 55.834768 18.967298
3 0.154585 43.522562 14.192140
4 0.239315 39.246575 17.465753
5 0.353199 35.353535 26.262626
=== Pre vs Post comparison (ALL HOURS) ===
period years hum_mean hum_p95 fog_pct rainy_pct prec_mean
0 pre_all 2019–2022 84.030083 99.75 29.256313 16.199212 0.187047
1 post_all 2023–2024 84.330595 99.00 22.632364 21.732854 0.334313
=== Pre vs Post comparison (DAWN HOURS 00–07) ===
period years hum_mean hum_p95 fog_pct rainy_pct prec_mean
0 pre_dawn 2019–2022 93.268941 100.0 51.984650 16.956013 0.185892
1 post_dawn 2023–2024 91.368105 99.5 37.300055 21.864190 0.296257
Simple pre/post comparison around the power plant shutdown¶
To quantify the potential impact of the thermal power plant and its cooling towers on local humidity and fog, I split the data into two periods:
- Pre-shutdown: 2019–2022
- Post-shutdown: 2023–2024
For each period, I computed simple averages over the years:
- mean relative humidity
Hum, - 95th percentile of humidity (to characterise very humid conditions),
- percentage of fog-like hours, defined as
Hum ≥ 95 %and−2 °C ≤ Temp ≤ 15 °C, - percentage of rainy hours and mean precipitation
Prec(for context).
I did this both for all hours and for dawn hours only (00:00–07:00), as fog is especially sensitive to humidity during the night and early morning.
The pre/post tables and plots show how these quantities change between the two periods. In particular, I looked for:
- any drop in mean humidity and in the 95th percentile after 2023,
- and any reduction in the fraction of fog-like hours, especially during dawn.
These trends are consistent with the qualitative expectation that, once the cooling towers stop operating, the additional moisture input to the local atmosphere should decrease.
However, this analysis has an important limitation: I did not have access to detailed production or operation data for the power plant. In reality, the different units were progressively shut down between 2022 and 2023, so the effect on humidity is likely gradual rather than instantaneous. The pre/post comparison in this assignment should therefore be interpreted as an exploratory signal in the meteorological data, not as a definitive causal proof of the impact of the plant.
After almost 47 years in operation, the As Pontes thermal power plant stopped producing electricity. Beyond its industrial role, the plant and its cooling towers were a defining element of the local landscape and microclimate. In this analysis I use meteorological data from 2019–2024 to explore whether the shutdown and progressive phase-out of the units between 2022 and 2023 left a detectable signal in humidity and fog conditions.