Understanding the Data¶
ICRISAT District-Level Agricultural Dataset because it offers detailed and reliable data on crops, rainfall, production, and land usage across Indian districts. This dataset allows me to analyze real-world agricultural challenges, explore trends, and apply data science techniques to derive meaningful insights that can support better decision-making in the agricultural sector.
source: kaggle.com
Loading the Dataset¶
In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("datasets/ICRISAT-District Level Data.csv")
In [3]:
df.head()
Out[3]:
| Dist Code | Year | State Code | State Name | Dist Name | RICE AREA (1000 ha) | RICE PRODUCTION (1000 tons) | RICE YIELD (Kg per ha) | WHEAT AREA (1000 ha) | WHEAT PRODUCTION (1000 tons) | ... | SUGARCANE YIELD (Kg per ha) | COTTON AREA (1000 ha) | COTTON PRODUCTION (1000 tons) | COTTON YIELD (Kg per ha) | FRUITS AREA (1000 ha) | VEGETABLES AREA (1000 ha) | FRUITS AND VEGETABLES AREA (1000 ha) | POTATOES AREA (1000 ha) | ONION AREA (1000 ha) | FODDER AREA (1000 ha) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1966 | 14 | Chhattisgarh | Durg | 548.0 | 185.0 | 337.59 | 44.0 | 20.0 | ... | 1777.78 | 0.0 | 0.0 | 0.0 | 5.95 | 6.64 | 12.59 | 0.01 | 0.60 | 0.47 |
| 1 | 1 | 1967 | 14 | Chhattisgarh | Durg | 547.0 | 409.0 | 747.71 | 50.0 | 26.0 | ... | 1500.00 | 0.0 | 0.0 | 0.0 | 5.77 | 7.24 | 13.02 | 0.01 | 0.56 | 1.23 |
| 2 | 1 | 1968 | 14 | Chhattisgarh | Durg | 556.3 | 468.0 | 841.27 | 53.7 | 30.0 | ... | 1000.00 | 0.0 | 0.0 | 0.0 | 5.41 | 7.40 | 12.81 | 0.10 | 0.58 | 1.02 |
| 3 | 1 | 1969 | 14 | Chhattisgarh | Durg | 563.4 | 400.8 | 711.40 | 49.4 | 26.5 | ... | 1900.00 | 0.0 | 0.0 | 0.0 | 5.52 | 7.16 | 12.69 | 0.01 | 0.56 | 0.84 |
| 4 | 1 | 1970 | 14 | Chhattisgarh | Durg | 571.6 | 473.6 | 828.55 | 44.2 | 29.0 | ... | 2000.00 | 0.0 | 0.0 | 0.0 | 5.45 | 7.19 | 12.64 | 0.02 | 0.52 | 0.42 |
5 rows × 80 columns
In [4]:
df.info()
df.describe()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16146 entries, 0 to 16145 Data columns (total 80 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Dist Code 16146 non-null int64 1 Year 16146 non-null int64 2 State Code 16146 non-null int64 3 State Name 16146 non-null object 4 Dist Name 16146 non-null object 5 RICE AREA (1000 ha) 16146 non-null float64 6 RICE PRODUCTION (1000 tons) 16146 non-null float64 7 RICE YIELD (Kg per ha) 16146 non-null float64 8 WHEAT AREA (1000 ha) 16146 non-null float64 9 WHEAT PRODUCTION (1000 tons) 16146 non-null float64 10 WHEAT YIELD (Kg per ha) 16146 non-null float64 11 KHARIF SORGHUM AREA (1000 ha) 16146 non-null float64 12 KHARIF SORGHUM PRODUCTION (1000 tons) 16146 non-null float64 13 KHARIF SORGHUM YIELD (Kg per ha) 16146 non-null float64 14 RABI SORGHUM AREA (1000 ha) 16146 non-null float64 15 RABI SORGHUM PRODUCTION (1000 tons) 16146 non-null float64 16 RABI SORGHUM YIELD (Kg per ha) 16146 non-null float64 17 SORGHUM AREA (1000 ha) 16146 non-null float64 18 SORGHUM PRODUCTION (1000 tons) 16146 non-null float64 19 SORGHUM YIELD (Kg per ha) 16146 non-null float64 20 PEARL MILLET AREA (1000 ha) 16146 non-null float64 21 PEARL MILLET PRODUCTION (1000 tons) 16146 non-null float64 22 PEARL MILLET YIELD (Kg per ha) 16146 non-null float64 23 MAIZE AREA (1000 ha) 16146 non-null float64 24 MAIZE PRODUCTION (1000 tons) 16146 non-null float64 25 MAIZE YIELD (Kg per ha) 16146 non-null float64 26 FINGER MILLET AREA (1000 ha) 16146 non-null float64 27 FINGER MILLET PRODUCTION (1000 tons) 16146 non-null float64 28 FINGER MILLET YIELD (Kg per ha) 16146 non-null float64 29 BARLEY AREA (1000 ha) 16146 non-null float64 30 BARLEY PRODUCTION (1000 tons) 16146 non-null float64 31 BARLEY YIELD (Kg per ha) 16146 non-null float64 32 CHICKPEA AREA (1000 ha) 16146 non-null float64 33 CHICKPEA PRODUCTION (1000 tons) 16146 non-null float64 34 CHICKPEA YIELD (Kg per ha) 16146 non-null float64 35 PIGEONPEA AREA (1000 ha) 16146 non-null float64 36 PIGEONPEA PRODUCTION (1000 tons) 16146 non-null float64 37 PIGEONPEA YIELD (Kg per ha) 16146 non-null float64 38 MINOR PULSES AREA (1000 ha) 16146 non-null float64 39 MINOR PULSES PRODUCTION (1000 tons) 16146 non-null float64 40 MINOR PULSES YIELD (Kg per ha) 16146 non-null float64 41 GROUNDNUT AREA (1000 ha) 16146 non-null float64 42 GROUNDNUT PRODUCTION (1000 tons) 16146 non-null float64 43 GROUNDNUT YIELD (Kg per ha) 16146 non-null float64 44 SESAMUM AREA (1000 ha) 16146 non-null float64 45 SESAMUM PRODUCTION (1000 tons) 16146 non-null float64 46 SESAMUM YIELD (Kg per ha) 16146 non-null float64 47 RAPESEED AND MUSTARD AREA (1000 ha) 16146 non-null float64 48 RAPESEED AND MUSTARD PRODUCTION (1000 tons) 16146 non-null float64 49 RAPESEED AND MUSTARD YIELD (Kg per ha) 16146 non-null float64 50 SAFFLOWER AREA (1000 ha) 16146 non-null float64 51 SAFFLOWER PRODUCTION (1000 tons) 16146 non-null float64 52 SAFFLOWER YIELD (Kg per ha) 16146 non-null float64 53 CASTOR AREA (1000 ha) 16146 non-null float64 54 CASTOR PRODUCTION (1000 tons) 16146 non-null float64 55 CASTOR YIELD (Kg per ha) 16146 non-null float64 56 LINSEED AREA (1000 ha) 16146 non-null float64 57 LINSEED PRODUCTION (1000 tons) 16146 non-null float64 58 LINSEED YIELD (Kg per ha) 16146 non-null float64 59 SUNFLOWER AREA (1000 ha) 16146 non-null float64 60 SUNFLOWER PRODUCTION (1000 tons) 16146 non-null float64 61 SUNFLOWER YIELD (Kg per ha) 16146 non-null float64 62 SOYABEAN AREA (1000 ha) 16146 non-null float64 63 SOYABEAN PRODUCTION (1000 tons) 16146 non-null float64 64 SOYABEAN YIELD (Kg per ha) 16146 non-null float64 65 OILSEEDS AREA (1000 ha) 16146 non-null float64 66 OILSEEDS PRODUCTION (1000 tons) 16146 non-null float64 67 OILSEEDS YIELD (Kg per ha) 16146 non-null float64 68 SUGARCANE AREA (1000 ha) 16146 non-null float64 69 SUGARCANE PRODUCTION (1000 tons) 16146 non-null float64 70 SUGARCANE YIELD (Kg per ha) 16146 non-null float64 71 COTTON AREA (1000 ha) 16146 non-null float64 72 COTTON PRODUCTION (1000 tons) 16146 non-null float64 73 COTTON YIELD (Kg per ha) 16146 non-null float64 74 FRUITS AREA (1000 ha) 16146 non-null float64 75 VEGETABLES AREA (1000 ha) 16146 non-null float64 76 FRUITS AND VEGETABLES AREA (1000 ha) 16146 non-null float64 77 POTATOES AREA (1000 ha) 16146 non-null float64 78 ONION AREA (1000 ha) 16146 non-null float64 79 FODDER AREA (1000 ha) 16146 non-null float64 dtypes: float64(75), int64(3), object(2) memory usage: 9.9+ MB
Out[4]:
| Dist Code | Year | State Code | RICE AREA (1000 ha) | RICE PRODUCTION (1000 tons) | RICE YIELD (Kg per ha) | WHEAT AREA (1000 ha) | WHEAT PRODUCTION (1000 tons) | WHEAT YIELD (Kg per ha) | KHARIF SORGHUM AREA (1000 ha) | ... | SUGARCANE YIELD (Kg per ha) | COTTON AREA (1000 ha) | COTTON PRODUCTION (1000 tons) | COTTON YIELD (Kg per ha) | FRUITS AREA (1000 ha) | VEGETABLES AREA (1000 ha) | FRUITS AND VEGETABLES AREA (1000 ha) | POTATOES AREA (1000 ha) | ONION AREA (1000 ha) | FODDER AREA (1000 ha) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | ... | 16146.00000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 | 16146.000000 |
| mean | 269.769231 | 1991.496841 | 9.568562 | 128.593192 | 224.889565 | 1486.924784 | 77.057946 | 182.012746 | 1492.419859 | 22.632268 | ... | 4500.15306 | 28.018367 | 7.229225 | 124.644823 | 7.750478 | 11.086250 | 18.677877 | 3.177038 | 1.194604 | 21.550328 |
| std | 278.309125 | 15.011185 | 4.988538 | 160.078825 | 326.629828 | 956.185281 | 100.394479 | 348.834254 | 1081.255367 | 45.062714 | ... | 3153.97042 | 74.239648 | 25.042132 | 207.681147 | 13.591135 | 18.003257 | 25.881842 | 8.029509 | 4.285067 | 60.062601 |
| min | 1.000000 | 1966.000000 | 1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | ... | -1.00000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 | -1.000000 |
| 25% | 78.000000 | 1978.000000 | 6.000000 | 10.400000 | 9.460000 | 800.000000 | 1.770000 | 2.000000 | 750.000000 | 0.000000 | ... | 2000.00000 | 0.000000 | 0.000000 | 0.000000 | 0.310000 | 1.600000 | 2.520000 | 0.000000 | 0.060000 | 0.000000 |
| 50% | 156.000000 | 1991.000000 | 10.000000 | 66.800000 | 95.840000 | 1333.210000 | 36.800000 | 42.700000 | 1347.450000 | 2.050000 | ... | 4502.21000 | 0.050000 | 0.000000 | 0.000000 | 2.220000 | 4.740000 | 8.945000 | 0.390000 | 0.280000 | 1.200000 |
| 75% | 241.000000 | 2005.000000 | 12.000000 | 191.390000 | 315.715000 | 2113.517500 | 123.000000 | 215.192500 | 2131.580000 | 20.900000 | ... | 6704.60500 | 10.097500 | 2.000000 | 202.270000 | 8.790000 | 12.510000 | 23.570000 | 3.150000 | 0.880000 | 16.762500 |
| max | 917.000000 | 2017.000000 | 20.000000 | 1154.230000 | 3215.010000 | 5653.830000 | 879.490000 | 4305.500000 | 5541.520000 | 334.800000 | ... | 22062.30000 | 800.890000 | 376.610000 | 5000.000000 | 159.540000 | 200.060000 | 240.990000 | 111.610000 | 131.350000 | 1162.660000 |
8 rows × 78 columns
Data Visualization¶
- Bar chart
In [8]:
plt.figure(figsize=(12,6))
plt.bar(df["Year"], df["RICE PRODUCTION (1000 tons)"])
plt.xlabel("Year")
plt.ylabel("Rice Production (1000 tons)")
plt.title("Rice Production Over the Years")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
2.Line chart
In [11]:
plt.figure(figsize=(12,6))
plt.plot(df["Year"], df["RICE PRODUCTION (1000 tons)"], marker='o')
plt.xlabel("Year")
plt.ylabel("Rice Production (1000 tons)")
plt.title("Rice Production Trend Over the Years")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
- Scatter chart
In [13]:
plt.figure(figsize=(8,5))
plt.scatter(df["RICE AREA (1000 ha)"], df["RICE PRODUCTION (1000 tons)"])
plt.xlabel("Rice Area (1000 ha)")
plt.ylabel("Rice Production (1000 tons)")
plt.title("Rice Area vs Rice Production")
plt.show()
- Histogram chart
In [14]:
plt.figure(figsize=(8,5))
plt.hist(df["RICE YIELD (Kg per ha)"], bins=20)
plt.xlabel("Rice Yield (Kg per ha)")
plt.ylabel("Frequency")
plt.title("Distribution of Rice Yield")
plt.show()
- Correlation Heat map
In [15]:
plt.figure(figsize=(10,8))
plt.imshow(df.corr(numeric_only=True))
plt.colorbar()
plt.title("Correlation Heatmap of Numeric Features")
plt.show()
In [ ]: