[Kuenzang Dorji] - Fab Futures - Data Science
Home About

week03: function fitting¶

lesson learned from this session.¶

This section focuses on mathematical foundations for fitting functions to data, starting with scalar, vector, and matrix operations including transposition, multiplication, identity/inverse matrices, and determinants. It covers various function types from linear/affine to polynomial and nonlinear models (trigonometric, radial basis, neural networks). Core calculus concepts like summation, integration, derivatives (measuring slope/curvature), and distance metrics are reviewed. The Vandermonde matrix organizes data for fitting, where residuals measure differences between data and fit, and loss functions (least squares using L² norm or robust L¹ norm) quantify overall fit quality. Specific fitting methods include linear least squares via SVD for linear-in-parameters models and polynomial fitting routines like polyfit for one-dimensional cases.

function fitting on datasets¶

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy.polynomial.polynomial import Polynomial

# -------------------------------
# 1. Load CSV
# -------------------------------
df = pd.read_csv("datasets/viii_2023.csv")
print(df.head())

# -------------------------------
# 2. Choose a column to fit
# -------------------------------
subject = "Maths"  # change to your subject

# Convert to numeric, force errors to NaN
y = pd.to_numeric(df[subject], errors='coerce')

# Remove NaN values
x = np.arange(len(y))
mask = ~np.isnan(y)
x_clean = x[mask]
y_clean = y[mask]

# -------------------------------
# 3. Fit a polynomial (degree = 2)
# -------------------------------
degree = 2
coefs = Polynomial.fit(x_clean, y_clean, degree).convert().coef
print("Function coefficients:", coefs)

# Create fitted values
p = Polynomial(coefs)
y_fit = p(x_clean)

# -------------------------------
# 4. Plot real data & fitted curve
# -------------------------------
plt.figure(figsize=(8,5))
plt.scatter(x_clean, y_clean, label="Actual Data")
plt.plot(x_clean, y_fit, linewidth=2, label=f"Polynomial Fit (degree {degree})")
plt.xlabel("Student Index")
plt.ylabel(subject)
plt.title(f"Function Fit for {subject}")
plt.legend()
plt.grid(True)
plt.show()
                Unnamed: 0 Dzongkha English Geography History   ICT   Maths  \
0            Sangay Tenzin    66.63    59.8        57   60.45  47.06   58.3   
1          Sujandeep Sunar    72.13   79.35     81.88    77.2  64.75   77.6   
2             Singye Dorji    69.32    70.9     58.25    63.6  59.38  60.28   
3  Tenzin Wangyal Tshering    70.25   83.95      86.7    81.5     71  79.85   
4            Sushmita Kami    73.69   81.85     73.18   82.05  63.31  62.05   

  Science Unnamed: 8  
0   48.35       Pass  
1   69.53       Pass  
2   55.05       Pass  
3   73.95       Pass  
4    61.8       Pass  
Function coefficients: [ 574.4655493  -130.72887369    5.05377835]
No description has been provided for this image

EXPLANATION¶

This Python code reads a CSV file containing student data and performs a polynomial regression on a chosen subject column, in this case "Maths". First, it imports the necessary libraries: numpy for numerical operations, pandas for data handling, matplotlib.pyplot for plotting, and Polynomial from numpy.polynomial for polynomial fitting. It loads the CSV into a DataFrame and prints the first few rows. Then, it converts the chosen subject column to numeric values, forcing any non-numeric entries to NaN, and removes these missing values. Using the cleaned data, it fits a second-degree polynomial, extracts the coefficients, and generates fitted values. Finally, it visualizes the actual scores versus the polynomial fit with a scatter plot for the real data and a line plot for the fitted curve, adding labels, a title, a legend, and a grid for clarity.

In [ ]: