Adrian Torres - Fab Lab León - Fab Futures - Data Science
Home About

< Home

Week 1: Tools¶

November 20, 2025

In this second session, Neil talked to us about how we can visualize our datasets on a web page.

To do this, we will use Programming Languages such as:

  • Javascript
  • Rust -> Web assembly
  • Python -> Highly recommended. It has a loop function. Version 3.13

For the Documentation & Coding Platform we will use:

  • Jupyter -> notebooks -> cells -> kernel -> result
  • Jupyter widgets -> charts

For Package Management

  • Conda
  • Miniconda

And to manage all distributed Version Control

  • Git

Assignment:¶

  • Visualize your data set(s)

In my case, I want to visualize data from the Renfe Data:. I need to save the data in a folder (datasets). It can be CSV or XML files, and I'll use Pandas, for example, which is a data extension for Python.

I'm going to start with a simple file: a list of stations throughout Spain. I have a CSV or XLS file; I download the CSV that contains more information.

Pandas for CSV¶

Next, I need Pandas to analyze the data. (I'm a bit lost, so I'm using ChatGPT 5.1 to help me with the Python code. The prompt I'm using is that I have a CSV file inside a Jupyter server, and I want to display a table containing the station code, name, longitude, and latitude.)

In [35]:
import pandas as pd

# Upload the CSV (adjust the path if your file is in a different folder)
df = pd.read_csv("datasets/estaciones.csv", encoding="latin-1", sep=";")

# Select only the necessary columns
selected = df[["CODIGO", "DESCRIPCION", "LATITUD", "LONGITUD"]]

# Show the table
selected
Out[35]:
CODIGO DESCRIPCION LATITUD LONGITUD
0 1003 ARAHAL 37.268081 -5.548514
1 1005 MARCHENA 37.334282 -5.425519
2 1007 OSUNA 37.233899 -5.115026
3 1009 PEDRERA 37.222396 -4.893519
4 2002 PUENTE GENIL-HERRERA 37.357900 -4.821638
... ... ... ... ...
1675 99501 ANDORRA NaN NaN
1676 99800 CERCEDILLA TURÍSTICO NaN NaN
1677 99801 PUERTO NAVACERRADA NaN NaN
1678 99802 COTOS NaN NaN
1679 99853 VIELHA-BUS 42.702661 0.794352

1680 rows × 4 columns

Folium¶

Now I want to try to represent that data on a map of Spain. I'm going to use Folium, which is an interface for creating maps in Python.

In [36]:
!pip install folium
Collecting folium
  Downloading folium-0.20.0-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.8.2-py3-none-any.whl.metadata (1.7 kB)
Requirement already satisfied: jinja2>=2.9 in /opt/conda/lib/python3.13/site-packages (from folium) (3.1.6)
Requirement already satisfied: numpy in /opt/conda/lib/python3.13/site-packages (from folium) (2.3.3)
Requirement already satisfied: requests in /opt/conda/lib/python3.13/site-packages (from folium) (2.32.5)
Requirement already satisfied: xyzservices in /opt/conda/lib/python3.13/site-packages (from folium) (2025.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.13/site-packages (from jinja2>=2.9->folium) (3.0.3)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.13/site-packages (from requests->folium) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.13/site-packages (from requests->folium) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.13/site-packages (from requests->folium) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.13/site-packages (from requests->folium) (2025.10.5)
Downloading folium-0.20.0-py2.py3-none-any.whl (113 kB)
Downloading branca-0.8.2-py3-none-any.whl (26 kB)
Installing collected packages: branca, folium
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/2 [folium]
Successfully installed branca-0.8.2 folium-0.20.0
In [37]:
import pandas as pd
import folium

# 1. Upload the CSV
df = pd.read_csv("datasets/estaciones.csv", encoding="latin-1", sep=";")

# 2. We only keep the necessary columns
df = df[["DESCRIPCION", "LATITUD", "LONGITUD"]]

# 3. Delete rows without coordinates
df = df.dropna(subset=["LATITUD", "LONGITUD"])

# 4. Create the map centered on Spain
mapa = folium.Map(location=[40.0, -3.7], zoom_start=6)

# 5. Add a marker for each station
for _, fila in df.iterrows():
    folium.Marker(
        location=[fila["LATITUD"], fila["LONGITUD"]],
        popup=fila["DESCRIPCION"]
    ).add_to(mapa)

# 6. Display the map in Jupyter
mapa
Out[37]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [38]:
mapa.save("mapa_estaciones.html")
In [ ]:
 

Matplotlib¶

While investigating, I also found data on passenger boarding and alighting at stations, in this case for the Asturias region (there is data for León on the Narrow Gauge line, but the figures are very poor...). In this case, use Matplotlib.

In [39]:
import pandas as pd
import matplotlib.pyplot as plt

# Load CSV
df = pd.read_csv("datasets/asturias_viajeros_por_franja_csv.csv", encoding="latin-1", sep=";")

# Filter target stations
estaciones_objetivo = [
    "GIJON-SANZ CRESPO",
    "OVIEDO",
    "POLA DE LENA",
    "MIERES-PUENTE"
]

df_filtrado = df[df["NOMBRE_ESTACION"].isin(estaciones_objetivo)]

# Group by station and sum travellers
resumen = df_filtrado.groupby("NOMBRE_ESTACION")[["VIAJEROS_SUBIDOS", "VIAJEROS_BAJADOS"]].sum()

# Custom Colored Bar Chart
plt.figure(figsize=(10, 6))
ax = resumen.plot(
    kind="bar",
    color=["green", "red"]   # SUBIDOS, BAJADOS
)

plt.xlabel("Station")
plt.ylabel("Number of travelers")
plt.title("Passengers boarding and alighting per station (Asturias)")
plt.tight_layout()
plt.show()
<Figure size 1000x600 with 0 Axes>
No description has been provided for this image

Plotly¶

Represent the same thing with Plotly (interactive graph)

In [40]:
!pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.13/site-packages (6.5.0)
Requirement already satisfied: narwhals>=1.15.1 in /opt/conda/lib/python3.13/site-packages (from plotly) (2.9.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.13/site-packages (from plotly) (25.0)
In [41]:
import pandas as pd
import plotly.express as px

# Upload CSV
df = pd.read_csv("datasets/asturias_viajeros_por_franja_csv.csv",
                 encoding="latin-1", sep=";")

# Filter stations
estaciones_objetivo = [
    "GIJON-SANZ CRESPO",
    "OVIEDO",
    "POLA DE LENA",
    "MIERES-PUENTE"
]

df_filtrado = df[df["NOMBRE_ESTACION"].isin(estaciones_objetivo)]

# Group by station
resumen = df_filtrado.groupby("NOMBRE_ESTACION")[
    ["VIAJEROS_SUBIDOS", "VIAJEROS_BAJADOS"]
].sum().reset_index()

# Interactive graphic
fig = px.bar(
    resumen,
    x="NOMBRE_ESTACION",
    y=["VIAJEROS_SUBIDOS", "VIAJEROS_BAJADOS"],
    barmode="group",
    title="Passengers boarding and alighting per station (Asturias)",
    labels={
        "NOMBRE_ESTACION": "Station",
        "value": "Number of travellers",
        "variable": "Type"
    }
)

fig.show()
No description has been provided for this image

Sankey diagram¶

I don't have much information in my data to create a Sankey diagram, but I was curious to see how it works. In this case, I used ChatGPT 5.1 (the prompt I used was, "With this CSV file, I want you to generate a Sankey diagram showing the flow of passengers getting on and off"). Ideally, I would have information about the passenger and where they get on and off; then it would be a very nice diagram (but I only have the total for each station, and it's the same number for both getting on and off).

In [42]:
!pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.13/site-packages (6.5.0)
Requirement already satisfied: narwhals>=1.15.1 in /opt/conda/lib/python3.13/site-packages (from plotly) (2.9.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.13/site-packages (from plotly) (25.0)
In [45]:
import pandas as pd
import plotly.graph_objects as go

# 1. Upload CSV
df = pd.read_csv("datasets/asturias_viajeros_por_franja_csv.csv",
                 encoding="latin-1", sep=";")

# (Optional) Filter for Asturias only, in case there are others
df = df[df["NUCLEO_CERCANIAS"] == "ASTURIAS"]

# 2. Group by station and add up travelers
resumen = df.groupby("NOMBRE_ESTACION")[["VIAJEROS_SUBIDOS", "VIAJEROS_BAJADOS"]].sum()

# 3. Crear lista de nodos:
#    0 -> "Viajeros que SUBEN"
#    1 -> "Viajeros que BAJAN"
#    2..n -> estaciones
nombres_estaciones = resumen.index.tolist()

labels = ["Travelers UP", "Travelers GET OFF"] + nombres_estaciones

# Node indices
idx_suben = 0
idx_bajan = 1
idx_estaciones = {est: i + 2 for i, est in enumerate(nombres_estaciones)}

# 4. Build the Sankey links

source = []
target = []
value  = []

# Flow: "Passengers BOARD" -> Stations (VIAJEROS_SUBIDOS)
for est, row in resumen.iterrows():
    source.append(idx_suben)
    target.append(idx_estaciones[est])
    value.append(row["VIAJEROS_SUBIDOS"])

# Flow: Stations -> "Passengers GET OFF" (VIAJEROS_BAJADOS)
for est, row in resumen.iterrows():
    source.append(idx_estaciones[est])
    target.append(idx_bajan)
    value.append(row["VIAJEROS_BAJADOS"])

# 5. Create the Sankey diagram
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=20,
        thickness=20,
        label=labels
    ),
    link=dict(
        source=source,
        target=target,
        value=value
    )
)])

fig.update_layout(
    title_text="Passenger flows boarding and alighting at each station (Asturias)",
    font_size=10
)

fig.show()
No description has been provided for this image