< Home
Class 2: Tools¶
Assignment¶
- Plot your data set
Load Dataset¶
In [2]:
# Using NumPy, Pandas' dataframes and Matplotlib
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
%matplotlib inline
import matplotlib.pyplot as plt
A CSV for the dataset was extracted from the excel file and named Doughnut_Data_Portraits_Indicator_Library_v1_0_FINAL-Database.csv
In [3]:
de_pil = pd.read_csv("datasets/Doughnut_Data_Portraits_Indicator_Library_v1_0_FINAL-Database.csv")
In [4]:
de_pil.head() # quick check of data import and structure
Out[4]:
| index | place | placeCode | country | iso3c | scale | domain | lens | globalConnection | dimension | indicatorType | vision | snapshot | dataYear | indicator | target | value | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Amsterdam | AMS | Netherlands | NLD | local | social | local-social | NaN | Food | Activity Monitoring | A target is currently under development. | In 2018, over 1,200 households made use of the... | 2018 | NaN | NaN | NaN |
| 1 | 1 | Amsterdam | AMS | Netherlands | NLD | local | social | local-social | NaN | Water | Activity Monitoring | Public water is accessible, attractive, clean ... | Tap water quality in 2017 was rated well above... | 2017 | NaN | NaN | NaN |
| 2 | 2 | Amsterdam | AMS | Netherlands | NLD | local | social | local-social | NaN | Health | Status Snapshot | All citizens have an equal chance of living a ... | Around 40% of citizens are overweight and almo... | NaN | NaN | NaN | NaN |
| 3 | 3 | Amsterdam | AMS | Netherlands | NLD | local | social | local-social | NaN | Housing | Activity Monitoring | There is sufficient availability of affordable... | In 2018, 60,000 homeseekers applied online for... | 2018 | NaN | NaN | NaN |
| 4 | 4 | Amsterdam | AMS | Netherlands | NLD | local | social | local-social | NaN | Education | Response Tools | Every child receives a good education in a hig... | In 2019 there were 175 unfilled teaching posts... | 2019 | NaN | NaN | NaN |
In [5]:
# Quick look at the distribution of data across dataYear values
counts_all = de_pil['dataYear'].value_counts(dropna=False) # number of year values present including number missing
counts_sort_all = counts_all.sort_index()
counts_sort_all
Out[5]:
dataYear . 1 2006 1 2007 2 2008 3 2009 9 2010 7 2011 13 2012 1 2013 8 2014 34 2015 33 2016 37 2017 65 2018 113 2019 292 2020 177 2021 317 2022 185 2023 184 2024 119 2025 4 NaN 448 Name: count, dtype: int64
22% of the records were not date tagged. 448 left blank and one with "." in the year fileld.
Visualize Dataset¶
In [7]:
# A visual of the year for records that were date tagged.
fig, axes = plt.subplots(nrows=2, figsize=(10, 10))
fig.subplots_adjust(hspace=0.30)
de_pil_filtered = de_pil[de_pil['dataYear'] != '.'] # remove non-yeaar value
counts = de_pil_filtered['dataYear'].value_counts() # default is the exclude NaN to remove blank year values
counts_sort = counts.sort_index()
counts_sort.plot.bar(ax=axes[0], title="Bar Chart of year data present")
counts_sort.plot.line(ax=axes[1], linestyle="dotted", marker="o", rot=90, title="Line Plot of year data present");