CLASS 3: Fitting data¶
To do this assignment i beg ChatGPT a list of data about the births in an hipotetical case (just to have clear data i can check)
This file will have the year and the number of births, from 1975 to 2024.
Now lets do the fitting.
Fitting the data means adjust a mathematicl function to the data you have. This is helpful for future operations with data, that can help to make predictions.
if x=year and y=number of births, we want to find a function related with year that give us a close number of births.
chatgpt conversation link:
To make fitting in python we need: 1-upload the JSON file 2-extract an array of years and number 3-adjunts linear models and polinomials 4-draw results
import json
import numpy as np
import matplotlib.pyplot as plt
# 1. Cargar los datos desde el JSON
with open("datasets/nacimientos_espana_1975_2024.json") as f:
data = json.load(f)
years = np.array([d["year"] for d in data])
births = np.array([d["births"] for d in data])
# 2. Ajuste lineal
coeffs_linear = np.polyfit(years, births, 1)
linear_fit = np.poly1d(coeffs_linear)
# 3. Ajuste polinomial (grado 2)
coeffs_poly2 = np.polyfit(years, births, 2)
poly2_fit = np.poly1d(coeffs_poly2)
# 4. Crear puntos para dibujar las curvas
years_smooth = np.linspace(years.min(), years.max(), 300)
# 5. Dibujar
plt.figure(figsize=(10,6))
plt.scatter(years, births, label="Datos reales")
plt.plot(years_smooth, linear_fit(years_smooth), label="Ajuste lineal", linewidth=2)
plt.plot(years_smooth, poly2_fit(years_smooth), label="Ajuste polinomial grado 2", linewidth=2)
plt.title("Fitting de nacimientos en España (1975–2024)")
plt.xlabel("Año")
plt.ylabel("Nacimientos")
plt.legend()
plt.grid(True)
plt.show()
# 6. Mostrar modelos obtenidos
print("Modelo lineal:")
print(linear_fit)
print("\nModelo polinomial (grado 2):")
print(poly2_fit)
Modelo lineal:
-4279 x + 9.005e+06
Modelo polinomial (grado 2):
2
147.5 x - 5.941e+05 x + 5.987e+08
This code give some approach with linear adjustment (which is a bad approach) and a polinomial function (a bit better, but not perfect).
Lets try other approach with other functions:
import json
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
# 1. Cargar JSON
with open("datasets/nacimientos_espana_1975_2024.json") as f:
data = json.load(f)
years = np.array([d["year"] for d in data])
births = np.array([d["births"] for d in data])
# 2. Probar polinomios de grado 1, 2 y 3
models = {}
r2_scores = {}
for degree in [1, 2, 3]:
coeffs = np.polyfit(years, births, degree)
model = np.poly1d(coeffs)
models[degree] = model
# calcular R²
preds = model(years)
r2 = r2_score(births, preds)
r2_scores[degree] = r2
# 3. Mostrar resultados
print("R² de cada modelo:")
for degree, r2 in r2_scores.items():
print(f"Grado {degree}: R² = {r2:.6f}")
best_degree = max(r2_scores, key=r2_scores.get)
print(f"\n➡️ El mejor modelo según R² es el polinomio de grado {best_degree}")
print(models[best_degree], "\n")
# 4. Dibujar los modelos
years_smooth = np.linspace(years.min(), years.max(), 300)
plt.figure(figsize=(10,6))
plt.scatter(years, births, label="Datos reales")
colors = {1: "red", 2: "green", 3: "blue"}
for degree, model in models.items():
plt.plot(years_smooth, model(years_smooth),
label=f"Polinomio grado {degree}",
color=colors[degree])
plt.title("Comparación de modelos polinomiales")
plt.xlabel("Año")
plt.ylabel("Nacimientos")
plt.grid(True)
plt.legend()
plt.show()
R² de cada modelo:
Grado 1: R² = 0.464473
Grado 2: R² = 0.556298
Grado 3: R² = 0.874157
➡️ El mejor modelo según R² es el polinomio de grado 3
3 2
-21.69 x + 1.302e+05 x - 2.607e+08 x + 1.739e+11
The 3th greade polinomio fits better with this data, has the up and down curves that fits better with the data.