import pandas as pd
import matplotlib.pyplot as plt.
URL: https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv
df = pd.read_csv(https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv)
print("Data shape:", df.shape)
df.head()
Cell In[1], line 3 URL: https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv ^ SyntaxError: invalid decimal literal
import pandas as pd
import matplotlib.pyplot as plt
# URL of the dataset
url = "https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv"
# Read the CSV file
df = pd.read_csv(url)
# Display basic information
print("Data shape:", df.shape)
df.head()
Data shape: (11784, 6)
| Country Name | Country ISO3 | Year | Indicator Name | Indicator Code | Value | |
|---|---|---|---|---|---|---|
| 0 | #country+name | #country+code | #date+year | #indicator+name | #indicator+code | #indicator+value+num |
| 1 | Bhutan | BTN | 2020 | Enrolment in pre-primary education, both sexes... | SE.PRE.ENRL | 8025 |
| 2 | Bhutan | BTN | 2018 | Enrolment in pre-primary education, both sexes... | SE.PRE.ENRL | 8499 |
| 3 | Bhutan | BTN | 2017 | Enrolment in pre-primary education, both sexes... | SE.PRE.ENRL | 7250 |
| 4 | Bhutan | BTN | 2016 | Enrolment in pre-primary education, both sexes... | SE.PRE.ENRL | 7125 |
import pandas as pd
# URL of the dataset
url = "https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv"
# Read the CSV file
df = pd.read_csv(url)
# Set pandas options to display all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# Display the entire dataset
print(df)
Wavelet¶
A wavelet is a small, oscillatory waveform that is used to analyze signals or data at multiple scales and locations. Unlike sine and cosine waves (used in Fourier analysis), wavelets are localized in both time and frequency, which makes them especially effective for analyzing non-stationary signals.
Wavelet Concept A wavelet is a small wave-like function that is localized in time (or space) and frequency. Unlike sine and cosine waves used in Fourier analysis, which extend infinitely, wavelets exist for a short duration and are used to analyze signals locally, not just globally.
- Key Idea Wavelets are “zoomable”: you can stretch them to look at slow trends or compress them to see fast changes. They allow you to study both frequency (what) and time/position (when/where) simultaneously.
!pip install pywavelets
Requirement already satisfied: pywavelets in /opt/conda/lib/python3.13/site-packages (1.9.0) Requirement already satisfied: numpy<3,>=1.25 in /opt/conda/lib/python3.13/site-packages (from pywavelets) (2.3.3)
Discrete Wavelet Transform (DWT)¶
import pywt
import numpy as np
import matplotlib.pyplot as plt
# Sample signal
t = np.linspace(0, 1, 400)
signal = np.sin(2 * np.pi * 5 * t) + np.sin(2 * np.pi * 20 * t)
# Perform DWT
coeffs = pywt.dwt(signal, 'db4')
cA, cD = coeffs # Approximation and Detail coefficients
# Plot results
plt.figure(figsize=(10, 6))
plt.subplot(3, 1, 1)
plt.plot(signal)
plt.title("Original Signal")
plt.subplot(3, 1, 2)
plt.plot(cA)
plt.title("Approximation Coefficients (cA)")
plt.subplot(3, 1, 3)
plt.plot(cD)
plt.title("Detail Coefficients (cD)")
plt.tight_layout()
plt.show()
Wavelets separate signals into low- and high-frequency parts. Approximation (cA) = general trend / smooth part. Detail (cD) = fine features / rapid changes. DWT is useful for denoising, compression, and feature extraction. The choice of wavelet (here db4) affects how the signal is decomposed. Wavelet analysis gives time-localized frequency information, unlike FFT which only gives global frequencies.
Filters¶
A filter modifies a signal by either allowing certain components to pass or blocking others. Think of it as a sieve for data: Low-pass filter → allows low frequencies (slow changes) and removes high frequencies (fast noise). High-pass filter → allows high frequencies (fast changes) and removes low frequencies (slow trends). Band-pass filter → allows a specific frequency range to pass. Band-stop filter → blocks a specific frequency range.
!pip install scipy matplotlib numpy
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
Requirement already satisfied: scipy in /opt/conda/lib/python3.13/site-packages (1.16.2) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.13/site-packages (3.10.7) Requirement already satisfied: numpy in /opt/conda/lib/python3.13/site-packages (2.3.3) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (1.3.3) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (4.60.1) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (1.4.9) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (25.0) Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (11.3.0) Requirement already satisfied: pyparsing>=3 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (3.2.5) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (2.9.0.post0) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.13/site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)
Signal composed of two sine waves¶
- One slow (5Hz and one fast(50 Hz)) at 500Hz
fs = 500 # Sampling frequency (Hz)
t = np.linspace(0, 1, fs, endpoint=False) # 1 second
signal_data = np.sin(2*np.pi*5*t) + 0.5*np.sin(2*np.pi*50*t)
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data)
plt.title("Original Signal")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()
Low-Passed Filter¶
Keeps slow-changing signals and removes high-frequency noise.
# Design Butterworth low-pass filter
cutoff = 10 # Hz
b, a = signal.butter(4, cutoff / (0.5*fs), btype='low')
filtered_signal = signal.filtfilt(b, a, signal_data)
# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, filtered_signal, label='Low-Pass Filtered', color='red')
plt.title("Low-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
High-Pass Filters¶
Keeps fast-changing signals and removes slow trends.
cutoff = 20 # Hz
b, a = signal.butter(4, cutoff / (0.5*fs), btype='high')
high_passed_signal = signal.filtfilt(b, a, signal_data)
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, high_passed_signal, label='High-Pass Filtered', color='green')
plt.title("High-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
Band-Pass Filter¶
Keeps only a specific frequency range
lowcut = 5
highcut = 20
b, a = signal.butter(4, [lowcut / (0.5*fs), highcut / (0.5*fs)], btype='band')
band_passed_signal = signal.filtfilt(b, a, signal_data)
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, band_passed_signal, label='Band-Pass Filtered', color='purple')
plt.title("Band-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
Band-Stop Filter(Notch Filter)¶
Removes a specific frequency range (e.g., 50 Hz noise).
lowcut = 48
highcut = 52
b, a = signal.butter(4, [lowcut / (0.5*fs), highcut / (0.5*fs)], btype='bandstop')
band_stopped_signal = signal.filtfilt(b, a, signal_data)
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, band_stopped_signal, label='Band-Stop Filtered', color='orange')
plt.title("Band-Stop Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
signal.butter() → designs a Butterworth filter.
signal.filtfilt() → applies the filter forward and backward for zero-phase distortion. btype → 'low', 'high', 'band', or 'bandstop'. Filters can remove noise, extract trends, or isolate specific frequencies.
Noisy Signal¶
import numpy as np
import pywt
import matplotlib.pyplot as plt
fs = 500 # sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)
# Original clean signal: 5 Hz sine wave
clean_signal = np.sin(2 * np.pi * 5 * t)
# Add high-frequency noise
noisy_signal = clean_signal + 0.5*np.random.randn(len(t))
# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, noisy_signal, label='Noisy Signal')
plt.plot(t, clean_signal, label='Clean Signal', linewidth=2)
plt.title("Noisy vs Clean Signal")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
Wavelet Denoising¶
# Perform Discrete Wavelet Transform
wavelet = 'db4'
coeffs = pywt.wavedec(noisy_signal, wavelet, level=4)
# Thresholding function
def thresholding(detail_coeffs, threshold):
return pywt.threshold(detail_coeffs, threshold, mode='soft')
# Apply threshold to detail coefficients (cD)
threshold = 0.3
coeffs[1:] = [thresholding(cD, threshold) for cD in coeffs[1:]]
# Reconstruct the denoised signal
denoised_signal = pywt.waverec(coeffs, wavelet)
# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, noisy_signal, label='Noisy Signal')
plt.plot(t, denoised_signal, label='Denoised Signal', color='red')
plt.plot(t, clean_signal, label='Original Clean Signal', linewidth=2, color='green')
plt.title("Wavelet Denoising")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
Wavelet transform decomposes the signal into multiple levels:
Approximation → low-frequency (signal trend)
Detail → high-frequency (noise or fine features)
Thresholding removes noise from the detail coefficients.
Inverse wavelet transform reconstructs a cleaned signal.
This method works better than standard low-pass filters because it removes noise without blurring the main signal.
Cell In[22], line 3 Approximation → low-frequency (signal trend) ^ SyntaxError: invalid character '→' (U+2192)