Bijay Rai – Fab Futures – Data Science
Home About

Contents

< Home

Week 6: Assignment ~ Investigating the Probability Distribution of my Data¶

In this assignment I have trained a linear regression model to predict house price (in $1000s) from size (sqft), leveraging the clear linear relationship in the data.

It is a Supervised Learning: It has labeled data — each input (house size) has a corresponding output (price). The goal is to learn a function f such that:

                  price ≈ f(size)

Step 1: Loading and Cleaning the Data¶

In [5]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load data — skip the first header row and handle empty rows
df = pd.read_csv("datasets/DataSet_CommonDiseases.csv", header=1)
df = df.dropna(how='all')  # remove fully empty rows
df = df.fillna(0)          # replace blanks with 0

# Find the row for "Diarrhoea"
diarrhoea_row = df[df.iloc[:, 0] == 'Diarrhoea']

# Female columns are: F, F.1, F.2, ..., F.9 (10 age groups)
female_counts = diarrhoea_row.iloc[:, [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]].values.flatten()
female_counts = female_counts.astype(int)  # make sure they're numbers

# Age group labels
age_groups = [
    '0–29 Days', '1–11 Months', '1–4 Years', '5–9 Years',
    '10–14 Years', '15–19 Years', '20–24 Years',
    '25–49 Years', '50–59 Years', '60+ Years'
]

# Create a tidy DataFrame
data = pd.DataFrame({
    'Age Group': age_groups,
    'Female Cases': female_counts
})
data
Out[5]:
Age Group Female Cases
0 0–29 Days 66
1 1–11 Months 906
2 1–4 Years 2800
3 5–9 Years 1753
4 10–14 Years 1378
5 15–19 Years 997
6 20–24 Years 920
7 25–49 Years 2928
8 50–59 Years 865
9 60+ Years 1387

Step 2: Plotting Histogram¶

In [6]:
plt.figure(figsize=(8, 4))
plt.hist(data['Female Cases'], bins=8, color='lightcoral', edgecolor='black', alpha=0.8)
plt.title("Histogram of Female Diarrhoea Cases (All Age Groups)")
plt.xlabel("Case Count")
plt.ylabel("Number of Age Groups")
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
No description has been provided for this image

Step 3: Plotting Bargraph¶

In [ ]:
 
In [7]:
plt.figure(figsize=(10, 4))
plt.bar(data['Age Group'], data['Female Cases'], color='steelblue')
plt.title("Female Diarrhoea Cases by Age Group")
plt.ylabel("Number of Cases")
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]: