[karma Tshomo] - Fab Futures - Data Science
Home About

home

In [45]:
import matplotlib as mp
print(mp.__version__)
3.10.7
In [39]:
import pandas as pd
df=pd.read_csv("datasets/Housing.csv")
df.head()
Out[39]:
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea furnishingstatus
0 13300000 7420 4 2 3 yes no no no yes 2 yes furnished
1 12250000 8960 4 4 4 yes no no no yes 3 no furnished
2 12250000 9960 3 2 2 yes no yes no no 2 yes semi-furnished
3 12215000 7500 4 2 2 yes no yes no yes 3 yes furnished
4 11410000 7420 4 1 2 yes yes yes no yes 2 no furnished

import matplotlib.pyplot as plt

In [40]:
plt.hist(df["bedrooms"], bins=30)
plt.xlabel("bedrooms")
plt.ylabel("frequency")
plt.title("Distribution of bedrooms")
plt.show()
No description has been provided for this image

Interpretation:

Shows how many houses have 1, 2, 3, 4, or more bedrooms

Most houses have 3 or 4 bedrooms.

Very few houses have 1 or 2 bedrooms.

In [41]:
plt.boxplot(df["price"])
plt.ylabel("Price")
plt.title("Boxplot of House Prices")
plt.show()
No description has been provided for this image

Interpretation:

The middle line in the box shows the median house price.

Whiskers show the lowest and highest prices within a normal range.

Points outside the whiskers are outliers, meaning some houses are much more expensive than most.

In [42]:
plt.scatter(df["parking"], df["price"])
plt.xlabel("parking")
plt.ylabel("Price")
plt.title("parking vs Price")
plt.show()
No description has been provided for this image

Interpretation:

Shows the relationship between number of parking spaces and house price.

Generally, more parking spaces might be linked to higher prices.

In [43]:
import seaborn as sns

sns.pairplot(df[["price", "area", "bedrooms", "bathrooms", "stories"]])
plt.show()
No description has been provided for this image
In [29]:
df["furnishingstatus"].value_counts().plot(kind="bar")
plt.xlabel("Furnishing Status")
plt.ylabel("Count")
plt.title("Furnishing Status Distribution")
plt.show()
No description has been provided for this image
In [20]:
# Select only numeric columns
numeric_df = df.select_dtypes(include=['int64', 'float64'])

plt.figure(figsize=(10,6))
sns.heatmap(numeric_df.corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()
No description has been provided for this image

Intrepretation:

Dark red = strong positive correlation, dark blue = strong negative correlation, lighter colors = weak correlation.

Helps identify which features may affect house price the most.