In [45]:
import matplotlib as mp
print(mp.__version__)
3.10.7
In [39]:
import pandas as pd
df=pd.read_csv("datasets/Housing.csv")
df.head()
Out[39]:
| price | area | bedrooms | bathrooms | stories | mainroad | guestroom | basement | hotwaterheating | airconditioning | parking | prefarea | furnishingstatus | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 13300000 | 7420 | 4 | 2 | 3 | yes | no | no | no | yes | 2 | yes | furnished |
| 1 | 12250000 | 8960 | 4 | 4 | 4 | yes | no | no | no | yes | 3 | no | furnished |
| 2 | 12250000 | 9960 | 3 | 2 | 2 | yes | no | yes | no | no | 2 | yes | semi-furnished |
| 3 | 12215000 | 7500 | 4 | 2 | 2 | yes | no | yes | no | yes | 3 | yes | furnished |
| 4 | 11410000 | 7420 | 4 | 1 | 2 | yes | yes | yes | no | yes | 2 | no | furnished |
import matplotlib.pyplot as plt
In [40]:
plt.hist(df["bedrooms"], bins=30)
plt.xlabel("bedrooms")
plt.ylabel("frequency")
plt.title("Distribution of bedrooms")
plt.show()
Interpretation:
Shows how many houses have 1, 2, 3, 4, or more bedrooms
Most houses have 3 or 4 bedrooms.
Very few houses have 1 or 2 bedrooms.
In [41]:
plt.boxplot(df["price"])
plt.ylabel("Price")
plt.title("Boxplot of House Prices")
plt.show()
Interpretation:
The middle line in the box shows the median house price.
Whiskers show the lowest and highest prices within a normal range.
Points outside the whiskers are outliers, meaning some houses are much more expensive than most.
In [42]:
plt.scatter(df["parking"], df["price"])
plt.xlabel("parking")
plt.ylabel("Price")
plt.title("parking vs Price")
plt.show()
Interpretation:
Shows the relationship between number of parking spaces and house price.
Generally, more parking spaces might be linked to higher prices.
In [43]:
import seaborn as sns
sns.pairplot(df[["price", "area", "bedrooms", "bathrooms", "stories"]])
plt.show()
In [29]:
df["furnishingstatus"].value_counts().plot(kind="bar")
plt.xlabel("Furnishing Status")
plt.ylabel("Count")
plt.title("Furnishing Status Distribution")
plt.show()
In [20]:
# Select only numeric columns
numeric_df = df.select_dtypes(include=['int64', 'float64'])
plt.figure(figsize=(10,6))
sns.heatmap(numeric_df.corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()
Intrepretation:
Dark red = strong positive correlation, dark blue = strong negative correlation, lighter colors = weak correlation.
Helps identify which features may affect house price the most.