< Home

Lesson 1: Selecting Dataset¶

The dataset is flat-line file containing agricultural production table of Bhutan containing information about various fruit crops across different regions or administrative divisions for the year 2021. Each fruit category—such as apple, areca nut, mandarin, watermelon, peach, plum, walnut, pineapple, and passion fruit—is represented with multiple associated attributes. These attributes typically include values like Total Trees, Bearing Trees, and Production (MT), indicating the scale of cultivation and productivity for each crop. Structurally, the dataset includes 76 columns and 22 rows, where several columns are labeled as Unnamed. This suggests that the dataset was likely extracted from a formatted table (possibly a PDF or report), and the merged or multi-level headers were separated into individual columns. The first two rows appear to contain header-like descriptors rather than actual data, showing labels for “Total Tree”, “Bearing Tree”, and “Production”. The following rows contain numeric values—often stored as text strings—representing agricultural statistics for each fruit type. Overall, the dataset provides a broad snapshot of fruit production in a specific context, likely at a national or regional level. It is valuable for analyzing patterns in crop distribution, productivity differences across fruit types, and overall agricultural output. However, due to its layout—especially the presence of many Unnamed columns—it would benefit from cleaning and restructuring before use in analysis, such as combining header rows, converting numeric strings to numbers, and properly labeling each variable.

In [ ]: