Blair Evans - Fab Futures - Data Science
Home About

< Home

Class 2: Tools¶

Assignment¶

  • Plot your data set

Load Dataset¶

In [2]:
# Using NumPy, Pandas' dataframes and Matplotlib

import numpy as np

import pandas as pd

from pandas import Series, DataFrame

%matplotlib inline

import matplotlib.pyplot as plt

A CSV for the dataset was extracted from the excel file and named Doughnut_Data_Portraits_Indicator_Library_v1_0_FINAL-Database.csv

In [3]:
de_pil = pd.read_csv("datasets/Doughnut_Data_Portraits_Indicator_Library_v1_0_FINAL-Database.csv")
In [4]:
de_pil.head() # quick check of data import and structure
Out[4]:
index place placeCode country iso3c scale domain lens globalConnection dimension indicatorType vision snapshot dataYear indicator target value
0 0 Amsterdam AMS Netherlands NLD local social local-social NaN Food Activity Monitoring A target is currently under development. In 2018, over 1,200 households made use of the... 2018 NaN NaN NaN
1 1 Amsterdam AMS Netherlands NLD local social local-social NaN Water Activity Monitoring Public water is accessible, attractive, clean ... Tap water quality in 2017 was rated well above... 2017 NaN NaN NaN
2 2 Amsterdam AMS Netherlands NLD local social local-social NaN Health Status Snapshot All citizens have an equal chance of living a ... Around 40% of citizens are overweight and almo... NaN NaN NaN NaN
3 3 Amsterdam AMS Netherlands NLD local social local-social NaN Housing Activity Monitoring There is sufficient availability of affordable... In 2018, 60,000 homeseekers applied online for... 2018 NaN NaN NaN
4 4 Amsterdam AMS Netherlands NLD local social local-social NaN Education Response Tools Every child receives a good education in a hig... In 2019 there were 175 unfilled teaching posts... 2019 NaN NaN NaN
In [5]:
# Quick look at the distribution of data across dataYear values

counts_all = de_pil['dataYear'].value_counts(dropna=False) # number of year values present including number missing

counts_sort_all = counts_all.sort_index()

counts_sort_all
Out[5]:
dataYear
.         1
2006      1
2007      2
2008      3
2009      9
2010      7
2011     13
2012      1
2013      8
2014     34
2015     33
2016     37
2017     65
2018    113
2019    292
2020    177
2021    317
2022    185
2023    184
2024    119
2025      4
NaN     448
Name: count, dtype: int64

22% of the records were not date tagged. 448 left blank and one with "." in the year fileld.

Visualize Dataset¶

In [7]:
# A visual of the year for records that were date tagged.

fig, axes = plt.subplots(nrows=2, figsize=(10, 10))
fig.subplots_adjust(hspace=0.30)

de_pil_filtered = de_pil[de_pil['dataYear'] != '.'] # remove non-yeaar value

counts = de_pil_filtered['dataYear'].value_counts() # default is the exclude NaN to remove blank year values

counts_sort = counts.sort_index()

counts_sort.plot.bar(ax=axes[0], title="Bar Chart of year data present")

counts_sort.plot.line(ax=axes[1], linestyle="dotted", marker="o", rot=90, title="Line Plot of year data present");
No description has been provided for this image