Adip Rai - Fab Futures - Data Science
Home About

< Home

WEEK 1. Introduction to Data Science and Exploration¶

NOVEMBER 18

Data science uses mathematics, statistics, programming, advanced analytics, AI, machine learning, and subject-specific knowledge to discover useful insights hidden within an organization’s data.

ASSIGNMENT

  • Select and document a data set to analyze
  • Connect to a JupyterLab server and become familiar with the user interface.

DATASET

This dataset is the record of motor vehicle collision information from all police reported motor vehicle collisions in NYC.

No description has been provided for this image

DATASET

Used codes to import the dataset to this page
In [8]:
import pandas as pd
motor_vehicle = pd.read_csv("datasets/Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
motor_vehicle.head()
Out[8]:
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME ... CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
0 09/11/2021 2:39 NaN NaN NaN NaN NaN WHITESTONE EXPRESSWAY 20 AVENUE NaN ... Unspecified NaN NaN NaN 4455765 Sedan Sedan NaN NaN NaN
1 03/26/2022 11:45 NaN NaN NaN NaN NaN QUEENSBORO BRIDGE UPPER NaN NaN ... NaN NaN NaN NaN 4513547 Sedan NaN NaN NaN NaN
2 11/01/2023 1:29 BROOKLYN 11230 40.62179 -73.970024 (40.62179, -73.970024) OCEAN PARKWAY AVENUE K NaN ... Unspecified Unspecified NaN NaN 4675373 Moped Sedan Sedan NaN NaN
3 06/29/2022 6:55 NaN NaN NaN NaN NaN THROGS NECK BRIDGE NaN NaN ... Unspecified NaN NaN NaN 4541903 Sedan Pick-up Truck NaN NaN NaN
4 09/21/2022 13:21 NaN NaN NaN NaN NaN BROOKLYN BRIDGE NaN NaN ... Unspecified NaN NaN NaN 4566131 Station Wagon/Sport Utility Vehicle NaN NaN NaN NaN

5 rows × 29 columns

In [7]:
motor_vehicle.tail()
Out[7]:
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME ... CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
2221554 11/16/2025 17:05 BROOKLYN 11235 40.588512 -73.94935 (40.588512, -73.94935) AVENUE Z OCEAN AVE NaN ... Unspecified NaN NaN NaN 4857807 Station Wagon/Sport Utility Vehicle Station Wagon/Sport Utility Vehicle NaN NaN NaN
2221555 11/16/2025 9:40 BROOKLYN 11216 40.677650 -73.94977 (40.67765, -73.94977) PACIFIC ST NOSTRAND AVE NaN ... Driver Inattention/Distraction NaN NaN NaN 4858253 Sedan NaN NaN NaN NaN
2221556 11/13/2025 21:30 MANHATTAN 10034 40.865100 -73.92188 (40.8651, -73.92188) W 204 ST SHERMAN AVE NaN ... NaN NaN NaN NaN 4858271 Sedan NaN NaN NaN NaN
2221557 11/15/2025 2:45 BRONX 10457 40.844720 -73.91227 (40.84472, -73.91227) E 174 ST WALTON AVE NaN ... NaN NaN NaN NaN 4858285 Station Wagon/Sport Utility Vehicle NaN NaN NaN NaN
2221558 11/16/2025 1:30 QUEENS 11419 40.684383 -73.82326 (40.684383, -73.82326) 107 AVE LEFFERTS BLVD NaN ... Unspecified NaN NaN NaN 4857856 Sedan Station Wagon/Sport Utility Vehicle NaN NaN NaN

5 rows × 29 columns

In [ ]:
import pandas as pd

df = pd.read_csv("datasets/Motor_Vehicle_Collisions_-_Crashes.csv", low_memory=False)
df.head(200)
Connect to a JuypterLab Server
  • Juypter Lab Server
    • Signed up and created an account.
  • Public Website URL
    • Accessed and explored the URL.
  • Git Repository

Your Jupyter notebook URL: https://jupyter.fabcloud.org/user/adip-rai

Your public website URL: https://class.academany.org/futures/data-science/2025/labs/dgi/students/adip-rai/

Your project repository (on our git platform): https://gitlab.fabcloud.org/academany/futures/units/data-science/2025/labs/dgi/students/adip-rai/

JuypterLab Server
In [ ]:
 

Data Science Tools¶

Goal¶

The section outlines the tools that will be used in the Data Science curriculum — focusing on tools that support data analysis and computing workflows.

Types of Tools¶

No description has been provided for this image

Specific Programming Languages and Environments¶

No description has been provided for this image
Mentioned as a language for scripting and web-based interaction.
No description has been provided for this image
Highlighted for its performance and safety features, especially for systems programming.
No description has been provided for this image
An environment where we work for this project(notebooks, cells, kernels, interactive outputs).
No description has been provided for this image
Described as a versatile language for data science tasks, including expressions, functions, and data structures.

Packages and Libraries¶

The curriculum lists many tools that support data analysis and visualization:

NumPy – numerical computing

SciPy – scientific algorithms

scikit-learn – machine learning

ipywidgets – interactive UI elements

Matplotlib / Plotly / D3.js – visualization libraries

Git(version control)

Package managers like Conda for managing environments and dependencies

Data Science: Fitting¶

The page focuses on teaching how to fit models to data — an essential part of learning data science. It explains both the idea of fitting and the practical tools you’ll use in the curriculum.

Core Learning Goals¶

Fit mathematical and statistical functions to real data – this includes understanding how a model (like a line or curve) can represent patterns in data.

Understand different modeling methods, such as simple linear trends, nonlinear fits, and more complex supervised learning models.

Use models to interpret and make predictions based on datasets, not just to visualize but to extract meaningful patterns. Fab Futures

Learn practical machine learning workflows, including evaluating how well fits or models perform on data.

Communicate results clearly with visualizations and explanations so that others can understand what the model shows.

In [ ]:
 
In [ ]: