Week 2: Assignment and Tools (Packages, Modules and Programming Languages)¶

November 20,2025¶

The second week of Professor Neil's data science course focused on essential tools for data analysis. The class introduced various tools, including programming languages like Python, JavaScript, and Rust.

The main goal was to get students familiar with several key Python modules (packages), which are:

Matplotlib: For creating static, interactive, and animated visualizations in Python.
Pandas: For data manipulation and analysis, especially with structured data like tables.
Seaborn: For creating attractive and informative statistical graphics (it's built on Matplotlib).
ipywidgets: For adding interactive controls (like sliders and buttons) to Jupyter notebooks.
Numpy, Scipy, and others: Numpy is vital for numerical computing (especially arrays and matrices), and Scipy is used for more advanced scientific and technical computing.In short, the second week was a hands-on introduction to the programming languages and specific Python libraries needed to efficiently handle, analyze, and visualize data.

Assignment objectives: To visualize the dataset chosen in the first assignment using data visualization tools

Sample Data¶

No description has been provided for this image

Exploration that I have done on Data Visualization¶

For this assignment I have explored the Mathplotlib library of Python Programming Language. The followigns were the points and information i have searched for while doing this presenation.

What is Mathplotlib
What are the features of the Mathplotlib library.
Example codings and applications using Mathplotlib After reading the docmumentation and infomration from various sources, I have learnt that Mathplotlib library is a third-party library.The primary purpose of Mathplotlib is to create static, animated, and interactive plots and figures to help analysts and scientists understand and communicate data insights.

Features I have explored¶

It has extensive plotting capabilities like 2D,3D and specailized plot.
Highly customizable meaning, the library creates a visualization and gives you the flexibility to fine tune each visual elements.
Generates a publication quality output meaning the visualization generated by mathplotlib could be of top notch quality that could be used in academic writings too.
Integration with other data science library meaning that mathplotlib library can work seamlessly with other data science libraries such as Panda,seaborn and Numpy
Dual Interface meaning it provides two main ways of creating plots. (Pylot interface and Object Oriented Interface)

Coding concpets exlpored¶

Importing of the library and modules to current workspace--> from mathplotlib import pyplot(plotting interface) as pyt(namespace to be used in the code)
Loading the external data to the code --> I have understood that Panda modules are used for loading the data to the Pandas Data Frame.

Using Panda to load the data files to panda data frame¶

No description has been provided for this image The above inforamtion has been retrieved from Gemenie. The importing of CSV files and other extensioned files are also quite same.

Using CSV to Load the file and converting each rows into dictionary format¶

example code learnt import csv with open("filepath") as csv_file: csv_reader = csv.DictReader(csv_file)

Methods Explored on Pyplot interface as of 2nd week¶

Plot Method : Plots the datapoints in the form a line chart by default.
Bar Method: Plots the datapoints in the form of a bar graph.
title: Methods used for naming the visualiztion
xlabel: Naming the X-Axis
YLabel: Nameing the Y-Axis
Legend: setting the legend for clear visualization

Video Tutorial I have refered to learn about Mathplotlib Library.¶

No description has been provided for this image

In [40]:

'''
For the completion of this assignment and learning purpose, i have worked with CSV file dataset titled "Salary insights job role" before moving with my real dataset
I have retrieved this data set from Kaggle, for this data viaualization i have just used worked with one field of the dataset
i.e Job title 
The analysis i have done here is to find out the most popular jobs in USA
For the completion of this assignment i have used the following libraries and modules 
csv, matplotlib and counter 
csv for loading the dataset in the program, matplotlib for visualizing and counter for counting the occurence of a value.

From this analyis i have found out that Data engineer is the most popular jobs in USA.
'''
import csv # importing csv module
import matplotlib.pyplot as pt # importing the matplotlib library
from collections import Counter # importing the counter module 

jobs_list = [] # empty list for storing the jobs name retrieved form csv_reader
with open("datasets/test_data.csv") as csv_file: # opening the CSV file 
    csv_reader = csv.DictReader(csv_file) #Creating a dictionary for each rows.
    data_list = list(csv_reader) # Converting it to list data structure 
for row in data_list:
    jobs_list.append(row["job_title"])
    #Iterating though the list where each element is an independent dict

count_of_jobs = Counter(jobs_list) # Counting the occurence of the particual jobs which returns dict packed inside a tuple
jobs = []
popularity = []
for item in count_of_jobs.most_common(5): # extracting elements from the tuple and appending it with the two empty list
    jobs.append(item[0]) 
    popularity.append(item[1])

#Data visualization part 
pt.barh(jobs,popularity)
pt.title("Populatiry of Technical Jobs in USA")
pt.xlabel("Popularity")
pt.ylabel("Jobs")
pt.grid(True)

No description has been provided for this image

Challenges Faced¶

Difficulty in understading the codning concept, however i was able to soldier through that
Due to huge collection of data points, there was challenges while displaying the raw datas.

In [ ]: