< Home
Assignments¶
- Fit a function
- Fit a machine learning model to your data
Fit a Function¶
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import linregress
# Load dataset
df = pd.read_csv("datasets/data.csv")
# Choose columns
x = df["Kindly Rate your Sleep Quality 😴"]
y = df["How would you rate your stress levels?"]
size = df["How many times a week do you suffer headaches 🤕?"] * 50 # scale bubble size
# Fit linear regression
slope, intercept, r_value, p_value, std_err = linregress(x, y)
print(f"Slope: {slope:.2f}, Intercept: {intercept:.2f}, R²: {r_value**2:.2f}")
# Predicted values
y_pred = slope * x + intercept
# Create bubble chart
plt.figure(figsize=(10,6))
scatter = plt.scatter(x, y, s=size, alpha=0.6, c=size, cmap='viridis', edgecolors='w', linewidth=0.5)
# Plot regression line
plt.plot(x, y_pred, color='red', linewidth=2, label=f"Linear Fit: y = {slope:.2f}x + {intercept:.2f}")
plt.title("Bubble Chart: Sleep Quality vs Stress Levels")
plt.xlabel("Sleep Quality")
plt.ylabel("Stress Levels")
plt.colorbar(scatter, label="Number of headaches per week")
plt.legend()
plt.grid(True)
plt.show()
Slope: 0.33, Intercept: 1.75, R²: 0.08
Computes the best-fit line using linear least squares.
Prints the slope, intercept, and R² for the fit.
Plots the regression line on top of your bubble chart in red.
fit a polynomial function to the daily CO₂ emissions in Sweden using numpy.polyfit.¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 1. Load dataset
df = pd.read_csv("datasets/climate.csv")
df['date'] = pd.to_datetime(df['date'])
# 2. Filter for Sweden
df_sweden = df[df['country'] == "Sweden"].sort_values('date')
# 3. Convert dates to numeric values (ordinal)
x = df_sweden['date'].map(pd.Timestamp.toordinal).values
y = df_sweden['co2_emission'].values
# 4. Fit a polynomial (degree 3)
degree = 3
coeffs = np.polyfit(x, y, degree)
poly = np.poly1d(coeffs)
# 5. Generate fitted values
y_fit = poly(x)
# 6. Plot original data and fitted curve
plt.figure(figsize=(14,6))
plt.scatter(df_sweden['date'], y, s=10, alpha=0.5, label="Daily CO2 Emission")
plt.plot(df_sweden['date'], y_fit, color='red', linewidth=2, label=f"Polynomial Fit (degree {degree})")
plt.title("Daily CO2 Emission in Sweden with Polynomial Fit")
plt.xlabel("Date")
plt.ylabel("CO2 Emission")
plt.legend()
plt.show()
What I did:¶
I looked at daily COâ‚‚ emissions in Sweden.
Since the data is for many days, we wanted to see a smooth trend instead of just individual points.
We used a polynomial function (a curved line) to approximate the trend.
The line is calculated using polyfit in Python.
degree=3 means it’s a cubic curve, which can bend a few times to follow the data.
Reading the graph:¶
Dots (scatter points): Each dot shows the actual COâ‚‚ emission on that day.
Red curve (polynomial fit): This line shows the general trend of COâ‚‚ emissions over time.
When the line goes up, it means emissions are generally increasing.
When the line goes down, it means emissions are generally decreasing.
The curve smooths out daily fluctuations so you can see the big picture instead of tiny day-to-day changes.
Daily CO2 Emission of China with Polynomial Trend using bar graph¶
# %% [markdown]
# # Daily CO2 Emission of China with Polynomial Trend (Fixed)
# %%
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_theme(style="whitegrid")
plt.rcParams["figure.figsize"] = (16,6)
# %%
# Load dataset
df = pd.read_csv("datasets/climate.csv")
df['date'] = pd.to_datetime(df['date'])
# Filter for China
df_china = df[df['country'] == "China"].sort_values('date')
# %%
# Prepare data for polynomial fit
x = np.arange(len(df_china)) # numeric x for polyfit
y = df_china['co2_emission'].values
# Fit polynomial of degree 5
degree = 5
coeffs = np.polyfit(x, y, degree)
poly = np.poly1d(coeffs)
y_fit = poly(x)
# %%
# Plot bar graph (numeric x) with trend line
plt.figure(figsize=(16,6))
plt.bar(x, y, color='skyblue', edgecolor='black', width=1, label='Daily CO2 Emission')
plt.plot(x, y_fit, color='red', linewidth=2, label=f'Polynomial Fit (deg={degree})')
# Set x-ticks to show some dates (e.g., every 30 days for readability)
step = 30
plt.xticks(x[::step], df_china['date'].dt.strftime('%Y-%m-%d')[::step], rotation=45)
plt.title("Daily CO2 Emission of China with Polynomial Trend", fontsize=16, fontweight='bold')
plt.xlabel("Date")
plt.ylabel("CO2 Emission")
plt.legend()
plt.tight_layout()
plt.show()
Taking two years data¶
# %% [markdown]
# # Daily CO2 Emission of China (Two Years) with Polynomial Trend
# %%
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_theme(style="whitegrid")
plt.rcParams["figure.figsize"] = (16,6)
# %%
# Load dataset
df = pd.read_csv("datasets/climate.csv")
df['date'] = pd.to_datetime(df['date'])
# Filter for China and only two years (2020 and 2021)
df_china = df[(df['country'] == "China") & (df['date'].dt.year.isin([2020, 2021]))].sort_values('date')
# %%
# Prepare data for polynomial fit
x = np.arange(len(df_china)) # numeric x for polyfit
y = df_china['co2_emission'].values
# Fit polynomial of degree 5
degree = 5
coeffs = np.polyfit(x, y, degree)
poly = np.poly1d(coeffs)
y_fit = poly(x)
# %%
# Plot bar graph with polynomial trend
plt.figure(figsize=(16,6))
plt.bar(x, y, color='skyblue', edgecolor='black', width=1, label='Daily CO2 Emission')
plt.plot(x, y_fit, color='red', linewidth=2, label=f'Polynomial Fit (deg={degree})')
# Set x-ticks every 15 days for readability
step = 15
plt.xticks(x[::step], df_china['date'].dt.strftime('%Y-%m-%d')[::step], rotation=45)
plt.title("Daily CO2 Emission of China (2020-2021) with Polynomial Trend", fontsize=16, fontweight='bold')
plt.xlabel("Date")
plt.ylabel("CO2 Emission")
plt.legend()
plt.tight_layout()
plt.show()
Graph Explanation¶
Bars (blue): Each bar shows the daily CO2 emission in China for the years 2020 and 2021. Taller bars mean more CO2 was emitted that day.
Red line: This is a polynomial trend line (degree 5). It smooths out daily ups and downs to show the overall pattern in emissions over time.
X-axis: Shows the date, with labels every 15 days to make it easier to read.
Y-axis: Shows the amount of CO2 emitted each day.
Explaination:
The daily emissions go up and down a lot—this is normal because of day-to-day changes in industrial activity, energy use, or other factors.
The red trend line gives a clear picture of the overall pattern, showing when emissions were generally rising or falling during these two years.
You can use this trend to see periods of higher or lower CO2 emissions, even if individual days vary a lot.
References¶
Polynomial Fitting and Regression¶
Polynomial & Linear Regression in Python (Real Python)
Polynomial Regression Tutorial
Working with Time-Series Data¶
Seaborn Time Series Line Plot Example
Data Visualisation Best Practices¶
Reference Prompts for Function Fitting¶
- Polynomial Fit
Prompt: "Fit a polynomial function to the daily CO₂ emissions of China using numpy.polyfit. Use the date converted to numeric values (ordinal) as the x-axis. Overlay the polynomial trend on a bar graph of daily emissions. For readability, consider using only two years of data (e.g., 2020–2021)."
- Polynomial Degree Selection
Prompt: "Explore polynomial fits of different degrees (e.g., 3, 5, 7) to the daily COâ‚‚ emissions of China. Compare the fitted curves and evaluate which degree balances trend capture and overfitting. Plot each polynomial curve over the bar graph of actual emissions."
- Linear Regression Fit
Prompt: "Fit a linear regression model to predict daily COâ‚‚ emissions of China using features such as date (converted to ordinal), energy consumption, average temperature, and humidity. Split the data into training and test sets. Overlay the predicted values on a bar graph of actual emissions for comparison."
- Train/Test Evaluation
Prompt: "After fitting a regression model (linear or polynomial) to daily CO₂ emissions, evaluate model performance using mean squared error (MSE) and R² score. Visualize the predicted values alongside actual values using a bar graph or line plot to assess fit quality."
- Two-Year Focused Fit
Prompt: "Filter the CO₂ emissions data for China to include only two years (e.g., 2020–2021). Fit a polynomial or linear regression function to the filtered dataset. Plot the actual daily CO₂ emissions as bars and overlay the fitted function to highlight trends within the selected period."