week03: function fitting¶
lesson learned from this session.¶
This section focuses on mathematical foundations for fitting functions to data, starting with scalar, vector, and matrix operations including transposition, multiplication, identity/inverse matrices, and determinants. It covers various function types from linear/affine to polynomial and nonlinear models (trigonometric, radial basis, neural networks). Core calculus concepts like summation, integration, derivatives (measuring slope/curvature), and distance metrics are reviewed. The Vandermonde matrix organizes data for fitting, where residuals measure differences between data and fit, and loss functions (least squares using L² norm or robust L¹ norm) quantify overall fit quality. Specific fitting methods include linear least squares via SVD for linear-in-parameters models and polynomial fitting routines like polyfit for one-dimensional cases.
function fitting on datasets¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy.polynomial.polynomial import Polynomial
# -------------------------------
# 1. Load CSV
# -------------------------------
df = pd.read_csv("datasets/viii_2023.csv")
print(df.head())
# -------------------------------
# 2. Choose a column to fit
# -------------------------------
subject = "Maths" # change to your subject
# Convert to numeric, force errors to NaN
y = pd.to_numeric(df[subject], errors='coerce')
# Remove NaN values
x = np.arange(len(y))
mask = ~np.isnan(y)
x_clean = x[mask]
y_clean = y[mask]
# -------------------------------
# 3. Fit a polynomial (degree = 2)
# -------------------------------
degree = 2
coefs = Polynomial.fit(x_clean, y_clean, degree).convert().coef
print("Function coefficients:", coefs)
# Create fitted values
p = Polynomial(coefs)
y_fit = p(x_clean)
# -------------------------------
# 4. Plot real data & fitted curve
# -------------------------------
plt.figure(figsize=(8,5))
plt.scatter(x_clean, y_clean, label="Actual Data")
plt.plot(x_clean, y_fit, linewidth=2, label=f"Polynomial Fit (degree {degree})")
plt.xlabel("Student Index")
plt.ylabel(subject)
plt.title(f"Function Fit for {subject}")
plt.legend()
plt.grid(True)
plt.show()
Unnamed: 0 Dzongkha English Geography History ICT Maths \ 0 Sangay Tenzin 66.63 59.8 57 60.45 47.06 58.3 1 Sujandeep Sunar 72.13 79.35 81.88 77.2 64.75 77.6 2 Singye Dorji 69.32 70.9 58.25 63.6 59.38 60.28 3 Tenzin Wangyal Tshering 70.25 83.95 86.7 81.5 71 79.85 4 Sushmita Kami 73.69 81.85 73.18 82.05 63.31 62.05 Science Unnamed: 8 0 48.35 Pass 1 69.53 Pass 2 55.05 Pass 3 73.95 Pass 4 61.8 Pass Function coefficients: [ 574.4655493 -130.72887369 5.05377835]
EXPLANATION¶
This Python code reads a CSV file containing student data and performs a polynomial regression on a chosen subject column, in this case "Maths". First, it imports the necessary libraries: numpy for numerical operations, pandas for data handling, matplotlib.pyplot for plotting, and Polynomial from numpy.polynomial for polynomial fitting. It loads the CSV into a DataFrame and prints the first few rows. Then, it converts the chosen subject column to numeric values, forcing any non-numeric entries to NaN, and removes these missing values. Using the cleaned data, it fits a second-degree polynomial, extracts the coefficients, and generates fitted values. Finally, it visualizes the actual scores versus the polynomial fit with a scatter plot for the real data and a line plot for the fitted curve, adding labels, a title, a legend, and a grid for clarity.