[Pablo Nuñez] - Fab Futures 2025 - Data Science
Home About

CLASS 3: Fitting data¶

To do this assignment i beg ChatGPT a list of data about the births in an hipotetical case (just to have clear data i can check)

This file will have the year and the number of births, from 1975 to 2024.

Now lets do the fitting.

Fitting the data means adjust a mathematicl function to the data you have. This is helpful for future operations with data, that can help to make predictions.

if x=year and y=number of births, we want to find a function related with year that give us a close number of births.

chatgpt conversation link:

To make fitting in python we need: 1-upload the JSON file 2-extract an array of years and number 3-adjunts linear models and polinomials 4-draw results

In [2]:
import json
import numpy as np
import matplotlib.pyplot as plt

# 1. Cargar los datos desde el JSON
with open("datasets/nacimientos_espana_1975_2024.json") as f:
    data = json.load(f)

years = np.array([d["year"] for d in data])
births = np.array([d["births"] for d in data])

# 2. Ajuste lineal
coeffs_linear = np.polyfit(years, births, 1)
linear_fit = np.poly1d(coeffs_linear)

# 3. Ajuste polinomial (grado 2)
coeffs_poly2 = np.polyfit(years, births, 2)
poly2_fit = np.poly1d(coeffs_poly2)

# 4. Crear puntos para dibujar las curvas
years_smooth = np.linspace(years.min(), years.max(), 300)

# 5. Dibujar
plt.figure(figsize=(10,6))
plt.scatter(years, births, label="Datos reales")
plt.plot(years_smooth, linear_fit(years_smooth), label="Ajuste lineal", linewidth=2)
plt.plot(years_smooth, poly2_fit(years_smooth), label="Ajuste polinomial grado 2", linewidth=2)

plt.title("Fitting de nacimientos en España (1975–2024)")
plt.xlabel("Año")
plt.ylabel("Nacimientos")
plt.legend()
plt.grid(True)
plt.show()

# 6. Mostrar modelos obtenidos
print("Modelo lineal:")
print(linear_fit)

print("\nModelo polinomial (grado 2):")
print(poly2_fit)
No description has been provided for this image
Modelo lineal:
 
-4279 x + 9.005e+06

Modelo polinomial (grado 2):
       2
147.5 x - 5.941e+05 x + 5.987e+08

This code give some approach with linear adjustment (which is a bad approach) and a polinomial function (a bit better, but not perfect).

Lets try other approach with other functions:

In [4]:
import json
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score

# 1. Cargar JSON
with open("datasets/nacimientos_espana_1975_2024.json") as f:
    data = json.load(f)

years = np.array([d["year"] for d in data])
births = np.array([d["births"] for d in data])

# 2. Probar polinomios de grado 1, 2 y 3
models = {}
r2_scores = {}

for degree in [1, 2, 3]:
    coeffs = np.polyfit(years, births, degree)
    model = np.poly1d(coeffs)
    models[degree] = model

    # calcular R²
    preds = model(years)
    r2 = r2_score(births, preds)
    r2_scores[degree] = r2

# 3. Mostrar resultados
print("R² de cada modelo:")
for degree, r2 in r2_scores.items():
    print(f"Grado {degree}: R² = {r2:.6f}")

best_degree = max(r2_scores, key=r2_scores.get)
print(f"\n➡️ El mejor modelo según R² es el polinomio de grado {best_degree}")
print(models[best_degree], "\n")

# 4. Dibujar los modelos
years_smooth = np.linspace(years.min(), years.max(), 300)

plt.figure(figsize=(10,6))
plt.scatter(years, births, label="Datos reales")

colors = {1: "red", 2: "green", 3: "blue"}

for degree, model in models.items():
    plt.plot(years_smooth, model(years_smooth), 
             label=f"Polinomio grado {degree}",
             color=colors[degree])

plt.title("Comparación de modelos polinomiales")
plt.xlabel("Año")
plt.ylabel("Nacimientos")
plt.grid(True)
plt.legend()
plt.show()
R² de cada modelo:
Grado 1: R² = 0.464473
Grado 2: R² = 0.556298
Grado 3: R² = 0.874157

➡️ El mejor modelo según R² es el polinomio de grado 3
        3             2
-21.69 x + 1.302e+05 x - 2.607e+08 x + 1.739e+11 

No description has been provided for this image
In [ ]:
The 3th greade polinomio fits better with this data, has the up and down curves that fits better with the data.