WEEK01

WEEK 1. Introduction to Data Science and Exploration¶

NOVEMBER 18

Data science uses mathematics, statistics, programming, advanced analytics, AI, machine learning, and subject-specific knowledge to discover useful insights hidden within an organization’s data.

ASSIGNMENT

Select and document a data set to analyze
Connect to a JupyterLab server and become familiar with the user interface.

DATASET

This dataset is the record of motor vehicle collision information from all police reported motor vehicle collisions in NYC.

No description has been provided for this image

DATASET

Used codes to import the dataset to this page

In [8]:

import pandas as pd
motor_vehicle = pd.read_csv("datasets/Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
motor_vehicle.head()

Out[8]:

	CRASH DATE	CRASH TIME	BOROUGH	ZIP CODE	LATITUDE	LONGITUDE	LOCATION	ON STREET NAME	CROSS STREET NAME	OFF STREET NAME	...	CONTRIBUTING FACTOR VEHICLE 2	CONTRIBUTING FACTOR VEHICLE 3	CONTRIBUTING FACTOR VEHICLE 4	CONTRIBUTING FACTOR VEHICLE 5	COLLISION_ID	VEHICLE TYPE CODE 1	VEHICLE TYPE CODE 2	VEHICLE TYPE CODE 3	VEHICLE TYPE CODE 4	VEHICLE TYPE CODE 5
0	09/11/2021	2:39	NaN	NaN	NaN	NaN	NaN	WHITESTONE EXPRESSWAY	20 AVENUE	NaN	...	Unspecified	NaN	NaN	NaN	4455765	Sedan	Sedan	NaN	NaN	NaN
1	03/26/2022	11:45	NaN	NaN	NaN	NaN	NaN	QUEENSBORO BRIDGE UPPER	NaN	NaN	...	NaN	NaN	NaN	NaN	4513547	Sedan	NaN	NaN	NaN	NaN
2	11/01/2023	1:29	BROOKLYN	11230	40.62179	-73.970024	(40.62179, -73.970024)	OCEAN PARKWAY	AVENUE K	NaN	...	Unspecified	Unspecified	NaN	NaN	4675373	Moped	Sedan	Sedan	NaN	NaN
3	06/29/2022	6:55	NaN	NaN	NaN	NaN	NaN	THROGS NECK BRIDGE	NaN	NaN	...	Unspecified	NaN	NaN	NaN	4541903	Sedan	Pick-up Truck	NaN	NaN	NaN
4	09/21/2022	13:21	NaN	NaN	NaN	NaN	NaN	BROOKLYN BRIDGE	NaN	NaN	...	Unspecified	NaN	NaN	NaN	4566131	Station Wagon/Sport Utility Vehicle	NaN	NaN	NaN	NaN

5 rows × 29 columns

In [7]:

motor_vehicle.tail()

Out[7]:

	CRASH DATE	CRASH TIME	BOROUGH	ZIP CODE	LATITUDE	LONGITUDE	LOCATION	ON STREET NAME	CROSS STREET NAME	OFF STREET NAME	...	CONTRIBUTING FACTOR VEHICLE 2	CONTRIBUTING FACTOR VEHICLE 3	CONTRIBUTING FACTOR VEHICLE 4	CONTRIBUTING FACTOR VEHICLE 5	COLLISION_ID	VEHICLE TYPE CODE 1	VEHICLE TYPE CODE 2	VEHICLE TYPE CODE 3	VEHICLE TYPE CODE 4	VEHICLE TYPE CODE 5
2221554	11/16/2025	17:05	BROOKLYN	11235	40.588512	-73.94935	(40.588512, -73.94935)	AVENUE Z	OCEAN AVE	NaN	...	Unspecified	NaN	NaN	NaN	4857807	Station Wagon/Sport Utility Vehicle	Station Wagon/Sport Utility Vehicle	NaN	NaN	NaN
2221555	11/16/2025	9:40	BROOKLYN	11216	40.677650	-73.94977	(40.67765, -73.94977)	PACIFIC ST	NOSTRAND AVE	NaN	...	Driver Inattention/Distraction	NaN	NaN	NaN	4858253	Sedan	NaN	NaN	NaN	NaN
2221556	11/13/2025	21:30	MANHATTAN	10034	40.865100	-73.92188	(40.8651, -73.92188)	W 204 ST	SHERMAN AVE	NaN	...	NaN	NaN	NaN	NaN	4858271	Sedan	NaN	NaN	NaN	NaN
2221557	11/15/2025	2:45	BRONX	10457	40.844720	-73.91227	(40.84472, -73.91227)	E 174 ST	WALTON AVE	NaN	...	NaN	NaN	NaN	NaN	4858285	Station Wagon/Sport Utility Vehicle	NaN	NaN	NaN	NaN
2221558	11/16/2025	1:30	QUEENS	11419	40.684383	-73.82326	(40.684383, -73.82326)	107 AVE	LEFFERTS BLVD	NaN	...	Unspecified	NaN	NaN	NaN	4857856	Sedan	Station Wagon/Sport Utility Vehicle	NaN	NaN	NaN

5 rows × 29 columns

In [ ]:

import pandas as pd

df = pd.read_csv("datasets/Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
df.head(200)

Connect to a JuypterLab Server

Juypter Lab Server
- Signed up and created an account.
Public Website URL
- Accessed and explored the URL.
Git Repository

Your Jupyter notebook URL: https://jupyter.fabcloud.org/user/adip-rai

Your public website URL: https://class.academany.org/futures/data-science/2025/labs/dgi/students/adip-rai/

Your project repository (on our git platform): https://gitlab.fabcloud.org/academany/futures/units/data-science/2025/labs/dgi/students/adip-rai/

JuypterLab Server

In [ ]:

Data Science Tools¶

Goal¶

The section outlines the tools that will be used in the Data Science curriculum — focusing on tools that support data analysis and computing workflows.

Types of Tools¶

No description has been provided for this image

Specific Programming Languages and Environments¶

An environment where we work for this project(notebooks, cells, kernels, interactive outputs).

Described as a versatile language for data science tasks, including expressions, functions, and data structures.

Packages and Libraries¶

The curriculum lists many tools that support data analysis and visualization:

NumPy – numerical computing

SciPy – scientific algorithms

scikit-learn – machine learning

ipywidgets – interactive UI elements

Matplotlib / Plotly / D3.js – visualization libraries

Git(version control)

Package managers like Conda for managing environments and dependencies

Data Science: Fitting¶

The page focuses on teaching how to fit models to data — an essential part of learning data science. It explains both the idea of fitting and the practical tools you’ll use in the curriculum.

Core Learning Goals¶

Fit mathematical and statistical functions to real data – this includes understanding how a model (like a line or curve) can represent patterns in data.

Understand different modeling methods, such as simple linear trends, nonlinear fits, and more complex supervised learning models.

Use models to interpret and make predictions based on datasets, not just to visualize but to extract meaningful patterns. Fab Futures

Learn practical machine learning workflows, including evaluating how well fits or models perform on data.

Communicate results clearly with visualizations and explanations so that others can understand what the model shows.

In [ ]: