Week 2:Tools(20 November 2025)¶
In our session, the tutor introduced several important tools and concepts that we will be using in class, highlighting both open-source and closed-source technologies. We learned that open-source tools are freely available, cross-platform, extensible, and supported by large communities, although community support can sometimes be challenging, while commercial tools offer dedicated support and do not rely on volunteer developers. The tutor also explained programming performance tools such as Numba, showing how just-in-time compilation and parallel processing significantly speed up Python code execution. In addition, we explored different types of data tools, beginning with flat files, which are useful only up to the limits of a computer’s memory, Pandas, which provides powerful routines for working with various data formats, and MySQL, a popular database system for storing large and structured datasets. Overall, the introduction provided a clear understanding of the nature of these tools and how they will support our learning in this course.
Assignments: We are asked to choose one dataset¶
Compiled Dataset: Alcohol-Related Deaths / Burden in Bhutan¶
Introduction to the Dataset¶
This dataset presents a compiled summary of alcohol-related deaths and alcohol-attributable health indicators in Bhutan, drawn from publicly available national and international sources. The data combines information from the Ministry of Health’s Annual Health Bulletins, the National Statistics Bureau’s Vital Statistics Reports, WHO country profiles, and published research such as the Bhutan Health Journal. It includes annual figures on alcohol-related liver disease (ALD) deaths, the proportion of deaths attributed to alcohol in health facilities, trends across multiple years, and population-level alcohol-consumption indicators. The dataset is designed to provide a clear picture of how alcohol contributes to mortality and public health challenges in Bhutan, enabling further analysis, comparison, and interpretation for academic or policy-related purposes.
Reports (key, sourced facts)¶
- A 2018 paper summarizing national data reports 190 deaths due to harmful use of alcohol in 2016 (mostly alcohol-related liver disease).Source
- WHO’s SAFER country snapshot (Bhutan) (published July 28, 2025) gives national alcohol consumption and burden indicators and links to data sources (useful for recent indicators and references).Source
- Bhutan’s Vital Statistics Report (2021) and other NSB publications include registered deaths and cause-of-death tables you can download. These are the primary national records to use for per-year cause-of-death counts.Source
- Global databases such as the Global Burden of Disease / GHDx / IHME and WHO Data Portal provide modeled estimates and downloadable cause-of-death datasets (useful if you want comparable, age-standardized estimates).Source
Table summarizing key data points from public sources (Annual Health Bulletin, WHO, national reports):¶
| Year | Metric | Value | Source/ Notes |
|---|---|---|---|
| 2016 | Alcohol-related (ALD) deaths | 190 | From the Bhutan Health Journal study.Source |
| 2012 → 2016 | Trend in ALD deaths | ~ 140 (2012) → 190 (2016) | Annual Health Bulletin 2017.Source |
| 2020 | Number of deaths for ALD (in health facilities) | 166 | Vital Statistics Report, BH’s 2021 VSR.Source |
| 2021 | Number of deaths for ALD | 141 | Reported by Ministry of Health.Source |
| 2022 | ALD share of facility-reported mortality | 12.22% | From Annual Health Bulletin 2023, health facility deaths.Source |
| 2022 | Change in ALD incidence (from 2021) | −0.26% decline | Reported in AHB 2023.Source |
| 2023 | ALD deaths | 129 | Reported in media citing AHB.Source |
import pandas as pd
# Read the CSV file
ald_data = pd.read_csv("datasets/ALD_Data.csv")
# Display the data
print(ald_data)
Year Metric \
0 2016 Alcohol-related (ALD) deaths
1 2012 → 2016 Trend in ALD deaths
2 2020 Number of deaths for ALD (in health facilities)
3 2021 Number of deaths for ALD
4 2022 ALD share of facility-reported mortality
5 2022 Change in ALD incidence (from 2021)
6 2023 ALD deaths
Value Source/Notes
0 190 From the Bhutan Health Journal study.
1 ~140 (2012) → 190 (2016) Annual Health Bulletin 2017.
2 166 Vital Statistics Report, Bhutan’s 2021 VSR.
3 141 Reported by Ministry of Health.
4 12.22% Annual Health Bulletin 2023.
5 −0.26% decline Reported in AHB 2023.
6 129 Reported in media citing AHB.
Plotting Data into Graphical Representations¶
import pandas as pd
import matplotlib.pyplot as plt
# Read the CSV file
ald_data = pd.read_csv("datasets/ALD_Data.csv")
# Clean numeric values
val_clean = (
ald_data['Value']
.astype(str)
.str.replace('%', '', regex=True)
.str.replace('−', '-', regex=True)
)
# Keep only rows where value is numeric
ald_data_numeric = ald_data[val_clean.str.replace('.', '', regex=False).str.isnumeric()].copy()
# Convert to numeric
ald_data_numeric['Value'] = pd.to_numeric(
ald_data_numeric['Value']
.astype(str)
.str.replace('−', '-', regex=True)
.str.replace('%', '', regex=True)
)
# SCATTER PLOT (not line graph)
plt.figure(figsize=(10,6))
plt.scatter(
ald_data_numeric['Year'],
ald_data_numeric['Value'],
s=120, # dot size
)
plt.title("Alcohol-related (ALD) Deaths Over Years")
plt.xlabel("Year")
plt.ylabel("Number of Deaths")
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()