< Home
Class 2: Showing data visualization¶
Os this first class Neil introduced us in the world of data science, explaining how understanding data and giving them interpretation could make a terrible difference in differents areas of life, from the history of a cholera spread in London to the discovering the Higgs Bosson as a very little deviation of an expected graphic of data.
So our goal this first class y to understand the varaity of uses that the data can have in our days. Right now everybody is collecting data from our surroundings, online services, goverments and private companies. And some of them are available for the people...but we dont know what to do with them.
Having this data and make something with them is the goal of this course, learning the tools and resources to deal with them. So, lets start!!!!
Starting example¶
First of all im going to try to visualize some data downloaded from a website. Im going to use the JSON file, and i will follow the example from this website: https://www.w3schools.com/python/pandas/pandas_json.asp
Im going to download a dataset from a spanish goverment initiaitive of open data: datos.gob.es
The data i will download data of ageing population from 1975 to 2024: https://datos.gob.es/es/catalogo/ea0010587-indice-de-envejecimiento-idb-identificador-api-1418
Then lets visualize it:
import pandas as pd #pd is a instance of pandas
# load the dataset into the variable df
df = pd.read_csv('datasets/1418.csv')
df.head(10) # show the 10 first rows of the data
| Totales Territoriales;Periodo;Total | |
|---|---|
| Total Nacional;2024;142 | 35.0 |
| Total Nacional;2023;137 | 33.0 |
| Total Nacional;2022;133 | 64.0 |
| Total Nacional;2021;129 | 16.0 |
| Total Nacional;2020;125 | 82.0 |
| Total Nacional;2019;123 | NaN |
| Total Nacional;2018;120 | 56.0 |
| Total Nacional;2017;118 | 36.0 |
| Total Nacional;2016;116 | 33.0 |
| Total Nacional;2015;114 | 69.0 |
Visualizing example¶
Cartography base from Spain in SHAPE file type: https://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=CAANE From the Carthography institute.
Inside the folder of the data comes several files, like dbf, prj... so i made a folder inside the datasets folder, called "espana", and upload all the files inside that.
Now i need to install geopanda in my server, so lets try something like: conda install -c conda-forge geopandas
Here is the prompt that i use gemini to ask for a way to plot a map of this data: "jupyter visualizar mapa españa shapefile"
and give me this code: import geopandas as gpd file_path = './datasets/espana/ee89_14_admin_pais_a.shp' mapa_es = gpd.read_file(file_path)
mapa_es.plot()
It shows a map of the hole world, lets see if i can adjust to be only spain:
import pandas as pd
df = pd.read_csv('datasets/data.csv')
print(df.to_string()) #Display all rows and columns of a DataFrame as a string
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 1 ----> 1 import geopandas as gpd 2 file_path = './datasets/espana/ee89_14_admin_pais_a.shp' 3 mapa_es = gpd.read_file(file_path) ModuleNotFoundError: No module named 'geopandas'
Now we are going to try other example i find in internet. Because i want to show a more datailed spain map, with the provinces and colours. I try this tutorial that gemini show me when i make this prompt: "jupyter plot un mapa de españa conlas provincias" First of all i need a map of spain divided by provinces, you can download here: https://gist.github.com/josemamira/3af52a4698d42b3f676fbc23f807a605?short_path=45ec3d9
import geopandas as gpd
import matplotlib.pyplot as plt
# Cargar el archivo GeoJSON con las provincias de España
provincias = gpd.read_file("./datasets/provincias_spain.geojson")
fig, ax = plt.subplots(figsize=(12, 8))
provincias.plot(ax=ax, edgecolor='black', linewidth=0.5, cmap='Set3')
ax.set_title('Map of spain with provinces', fontsize=16)
ax.set_axis_off()
plt.show()
Now lets use this example to show data in a map.
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_us_cities.csv')
df.head()
df['text'] = df['name'] + '<br>Population ' + (df['pop']/1e6).astype(str)+' million'
limits = [(0,3),(3,11),(11,21),(21,50),(50,3000)]
colors = ["royalblue","crimson","lightseagreen","orange","lightgrey"]
cities = []
scale = 5000
fig = go.Figure()
for i in range(len(limits)):
lim = limits[i]
df_sub = df[lim[0]:lim[1]]
fig.add_trace(go.Scattergeo(
locationmode = 'USA-states',
lon = df_sub['lon'],
lat = df_sub['lat'],
text = df_sub['text'],
marker = dict(
size = df_sub['pop']/scale,
color = colors[i],
line_color='rgb(40,40,40)',
line_width=0.5,
sizemode = 'area'
),
name = '{0} - {1}'.format(lim[0],lim[1])))
fig.update_layout(
title_text = '2014 US city populations<br>(Click legend to toggle traces)',
showlegend = True,
geo = dict(
scope = 'usa',
landcolor = 'rgb(217, 217, 217)',
)
)
fig.show()
LINKS¶