< Home
Week 1:Selecting a Dataset for My Data Science Project¶
Agriculture plays a vital role in India’s economy, providing livelihood and food security for millions. Understanding agricultural patterns at the district level is essential for improving crop production, planning irrigation, managing resources, and guiding policy decisions.
I selected the ICRISAT District-Level Agricultural Dataset because it offers detailed and reliable data on crops, rainfall, production, and land usage across Indian districts. This dataset allows me to analyze real-world agricultural challenges, explore trends, and apply data science techniques to derive meaningful insights that can support better decision-making in the agricultural sector.
** source : kaggle.com **
This dataset is published by the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) and contains district-wise agricultural data for India.
About the dataset¶
This dataset provides district-level agricultural statistics collected from various states across India. It includes information related to:
Crop production
Crop area (sown/harvested)
Crop yield
Rainfall
Irrigation
Agricultural inputs
District and state identifiers
This data allows meaningful analysis of agriculture, productivity trends, climatic impact, crop patterns, and regional variations across India.
Purpose of choosing this Dataset¶
I selected this dataset because:
It contains rich numerical and categorical features, suitable for
Exploratory Data Analysis (EDA)
Data visualization
Feature engineering
Machine learning modeling
It is relevant to Indian agriculture, one of the most important sectors of the Indian economy.
It is appropriate for learning various data science methods like
Correlation analysis
Regression modeling
Trend analysis
State/district-wise comparisons
structure of the Dataset¶
The dataset contains multiple columns such as:
State Name
District Name
Crop Name
Area (in hectares)
Production (in tonnes)
Yield
Rainfall / Irrigation variables (if present)
Year
Along with several additional agricultural and demographic indicators depending on the file version.
The dataset is typically formatted as a CSV file containing multiple rows representing districts, enabling detailed spatial analysis.