[Wangd Lhamo] - Fab Futures - Data Science
Home About
In [1]:
import pandas as pd
import matplotlib.pyplot as plt.
URL: https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv
df = pd.read_csv(https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv)
print("Data shape:", df.shape)
df.head()
  Cell In[1], line 3
    URL: https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv
                                                   ^
SyntaxError: invalid decimal literal
In [2]:
import pandas as pd
import matplotlib.pyplot as plt

# URL of the dataset
url = "https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv"

# Read the CSV file
df = pd.read_csv(url)

# Display basic information
print("Data shape:", df.shape)
df.head()
Data shape: (11784, 6)
Out[2]:
Country Name Country ISO3 Year Indicator Name Indicator Code Value
0 #country+name #country+code #date+year #indicator+name #indicator+code #indicator+value+num
1 Bhutan BTN 2020 Enrolment in pre-primary education, both sexes... SE.PRE.ENRL 8025
2 Bhutan BTN 2018 Enrolment in pre-primary education, both sexes... SE.PRE.ENRL 8499
3 Bhutan BTN 2017 Enrolment in pre-primary education, both sexes... SE.PRE.ENRL 7250
4 Bhutan BTN 2016 Enrolment in pre-primary education, both sexes... SE.PRE.ENRL 7125
In [ ]:
 
In [ ]:
import pandas as pd

# URL of the dataset
url = "https://data.humdata.org/dataset/be41fcba-9ca4-461e-8628-549f330e6eee/resource/90da9bc3-e18b-4dc9-bd35-39131115e4af/download/education_btn.csv"

# Read the CSV file
df = pd.read_csv(url)

# Set pandas options to display all rows and columns
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Display the entire dataset
print(df)
In [ ]:
 

Wavelet¶

A wavelet is a small, oscillatory waveform that is used to analyze signals or data at multiple scales and locations. Unlike sine and cosine waves (used in Fourier analysis), wavelets are localized in both time and frequency, which makes them especially effective for analyzing non-stationary signals.

Wavelet Concept A wavelet is a small wave-like function that is localized in time (or space) and frequency. Unlike sine and cosine waves used in Fourier analysis, which extend infinitely, wavelets exist for a short duration and are used to analyze signals locally, not just globally.

  1. Key Idea Wavelets are “zoomable”: you can stretch them to look at slow trends or compress them to see fast changes. They allow you to study both frequency (what) and time/position (when/where) simultaneously.
In [1]:
!pip install pywavelets
Requirement already satisfied: pywavelets in /opt/conda/lib/python3.13/site-packages (1.9.0)
Requirement already satisfied: numpy<3,>=1.25 in /opt/conda/lib/python3.13/site-packages (from pywavelets) (2.3.3)

Discrete Wavelet Transform (DWT)¶

In [2]:
import pywt
import numpy as np
import matplotlib.pyplot as plt

# Sample signal
t = np.linspace(0, 1, 400)
signal = np.sin(2 * np.pi * 5 * t) + np.sin(2 * np.pi * 20 * t)

# Perform DWT
coeffs = pywt.dwt(signal, 'db4')
cA, cD = coeffs  # Approximation and Detail coefficients

# Plot results
plt.figure(figsize=(10, 6))

plt.subplot(3, 1, 1)
plt.plot(signal)
plt.title("Original Signal")

plt.subplot(3, 1, 2)
plt.plot(cA)
plt.title("Approximation Coefficients (cA)")

plt.subplot(3, 1, 3)
plt.plot(cD)
plt.title("Detail Coefficients (cD)")

plt.tight_layout()
plt.show()
No description has been provided for this image

Wavelets separate signals into low- and high-frequency parts. Approximation (cA) = general trend / smooth part. Detail (cD) = fine features / rapid changes. DWT is useful for denoising, compression, and feature extraction. The choice of wavelet (here db4) affects how the signal is decomposed. Wavelet analysis gives time-localized frequency information, unlike FFT which only gives global frequencies.

In [ ]:
 

Filters¶

A filter modifies a signal by either allowing certain components to pass or blocking others. Think of it as a sieve for data: Low-pass filter → allows low frequencies (slow changes) and removes high frequencies (fast noise). High-pass filter → allows high frequencies (fast changes) and removes low frequencies (slow trends). Band-pass filter → allows a specific frequency range to pass. Band-stop filter → blocks a specific frequency range.

In [12]:
!pip install scipy matplotlib numpy
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
Requirement already satisfied: scipy in /opt/conda/lib/python3.13/site-packages (1.16.2)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.13/site-packages (3.10.7)
Requirement already satisfied: numpy in /opt/conda/lib/python3.13/site-packages (2.3.3)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (4.60.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (1.4.9)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (25.0)
Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (11.3.0)
Requirement already satisfied: pyparsing>=3 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (3.2.5)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.13/site-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.13/site-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)
In [ ]:
 

Signal composed of two sine waves¶

  • One slow (5Hz and one fast(50 Hz)) at 500Hz
In [13]:
fs = 500  # Sampling frequency (Hz)
t = np.linspace(0, 1, fs, endpoint=False)  # 1 second
signal_data = np.sin(2*np.pi*5*t) + 0.5*np.sin(2*np.pi*50*t)

plt.figure(figsize=(10, 4))
plt.plot(t, signal_data)
plt.title("Original Signal")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()
No description has been provided for this image
In [ ]:
 

Low-Passed Filter¶

Keeps slow-changing signals and removes high-frequency noise.

In [14]:
# Design Butterworth low-pass filter
cutoff = 10  # Hz
b, a = signal.butter(4, cutoff / (0.5*fs), btype='low')
filtered_signal = signal.filtfilt(b, a, signal_data)

# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, filtered_signal, label='Low-Pass Filtered', color='red')
plt.title("Low-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image

High-Pass Filters¶

Keeps fast-changing signals and removes slow trends.

In [15]:
cutoff = 20  # Hz
b, a = signal.butter(4, cutoff / (0.5*fs), btype='high')
high_passed_signal = signal.filtfilt(b, a, signal_data)

plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, high_passed_signal, label='High-Pass Filtered', color='green')
plt.title("High-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image

Band-Pass Filter¶

Keeps only a specific frequency range

In [16]:
lowcut = 5
highcut = 20
b, a = signal.butter(4, [lowcut / (0.5*fs), highcut / (0.5*fs)], btype='band')
band_passed_signal = signal.filtfilt(b, a, signal_data)

plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, band_passed_signal, label='Band-Pass Filtered', color='purple')
plt.title("Band-Pass Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image

Band-Stop Filter(Notch Filter)¶

Removes a specific frequency range (e.g., 50 Hz noise).

In [17]:
lowcut = 48
highcut = 52
b, a = signal.butter(4, [lowcut / (0.5*fs), highcut / (0.5*fs)], btype='bandstop')
band_stopped_signal = signal.filtfilt(b, a, signal_data)

plt.figure(figsize=(10, 4))
plt.plot(t, signal_data, label='Original')
plt.plot(t, band_stopped_signal, label='Band-Stop Filtered', color='orange')
plt.title("Band-Stop Filter")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image

signal.butter() → designs a Butterworth filter.

signal.filtfilt() → applies the filter forward and backward for zero-phase distortion. btype → 'low', 'high', 'band', or 'bandstop'. Filters can remove noise, extract trends, or isolate specific frequencies.

In [ ]:
 

Noisy Signal¶

In [19]:
import numpy as np
import pywt
import matplotlib.pyplot as plt
In [20]:
fs = 500  # sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)
# Original clean signal: 5 Hz sine wave
clean_signal = np.sin(2 * np.pi * 5 * t)
# Add high-frequency noise
noisy_signal = clean_signal + 0.5*np.random.randn(len(t))

# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, noisy_signal, label='Noisy Signal')
plt.plot(t, clean_signal, label='Clean Signal', linewidth=2)
plt.title("Noisy vs Clean Signal")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
 

Wavelet Denoising¶

In [21]:
# Perform Discrete Wavelet Transform
wavelet = 'db4'
coeffs = pywt.wavedec(noisy_signal, wavelet, level=4)

# Thresholding function
def thresholding(detail_coeffs, threshold):
    return pywt.threshold(detail_coeffs, threshold, mode='soft')

# Apply threshold to detail coefficients (cD)
threshold = 0.3
coeffs[1:] = [thresholding(cD, threshold) for cD in coeffs[1:]]

# Reconstruct the denoised signal
denoised_signal = pywt.waverec(coeffs, wavelet)

# Plot
plt.figure(figsize=(10, 4))
plt.plot(t, noisy_signal, label='Noisy Signal')
plt.plot(t, denoised_signal, label='Denoised Signal', color='red')
plt.plot(t, clean_signal, label='Original Clean Signal', linewidth=2, color='green')
plt.title("Wavelet Denoising")
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
 
In [22]:
Wavelet transform decomposes the signal into multiple levels:

Approximation → low-frequency (signal trend)

Detail → high-frequency (noise or fine features)

Thresholding removes noise from the detail coefficients.

Inverse wavelet transform reconstructs a cleaned signal.

This method works better than standard low-pass filters because it removes noise without blurring the main signal.
  Cell In[22], line 3
    Approximation → low-frequency (signal trend)
                  ^
SyntaxError: invalid character '→' (U+2192)
In [ ]: