I selected the Housing Prices Dataset from Kaggle because it contains detailed information about residential houses. The dataset includes features such as the size of the house, the number of bedrooms and bathrooms, the number of stories, availability of parking, furnishing status, and the final sale price. These variables make the dataset suitable for a wide range of data analysis and machine learning tasks.
Number of Observations:¶
The dataset contains approximately 5,000+ rows and 12–20 columns, depending on the version. This size is manageable to work with while still being large enough to extract meaningful insights.
Target Variable:¶
The main variable I aim to predict is price, which is a continuous variable. This makes the dataset especially appropriate for regression analysis.
Features:¶
The key features included in the dataset are:
1.area (square feet)
2.bedrooms
3.bathrooms
4.stories
5.parking
6.mainroad (yes/no)
7.furnishingstatus (furnished/semifurnished/unfurnished)
8.guestroom
9.basement
10.hotwaterheating
11.airconditioning
12.prefarea
Purpose of Selecting this Dataset:¶
It allows me to apply regression, classification, smoothing, and detection techniques.
It is clean and easy enough for beginners to work with, yet still complex enough to conduct meaningful analysis.
It offers a balanced mix of numeric and categorical variables, making it ideal for developing and testing different machine learning models.