[Sedat Yalcin] - Fab Futures - Data Science
Home About

Data Science - Week 1: Introduction Assignment¶

Student: [Sedat Yalcin]
Data Source: İBB Açık Veri Portalı


1. Dataset Selection¶

İstanbul Trafik Endeksi (Istanbul Traffic Index)¶

Source: İstanbul Büyükşehir Belediyesi Açık Veri Portalı
URL: https://data.ibb.gov.tr/dataset/istanbul-trafik-indeksi
License: İBB Açık Veri Lisansı
Period: August 2015 - Present
Format: CSV

Variables:

  • trafficindexdate - Measurement date
  • minimum_traffic_index - Daily minimum
  • maximum_traffic_index - Daily maximum
  • average_traffic_index - Daily average

Analysis Focus:

  • Temporal trends in Istanbul traffic
  • Weekday vs weekend patterns
  • Seasonal variations

2. Data Exploration¶

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
In [3]:
df = pd.read_csv('datasets/traffic_index.csv')
print(f"Shape: {df.shape[0]} rows × {df.shape[1]} columns")
Shape: 3332 rows × 4 columns

2.1 Basic Information¶

In [4]:
# Display first few rows
df.head(10)
Out[4]:
trafficindexdate minimum_traffic_index maximum_traffic_index average_traffic_index
0 2015-08-06 00:00:00 +0000 +0000 24 63 57.858116
1 2015-08-07 00:00:00 +0000 +0000 2 49 23.770492
2 2015-08-11 00:00:00 +0000 +0000 11 62 38.601266
3 2015-08-12 00:00:00 +0000 +0000 1 63 29.715278
4 2015-08-13 00:00:00 +0000 +0000 2 56 28.557491
5 2015-08-14 00:00:00 +0000 +0000 2 69 32.300000
6 2015-08-15 00:00:00 +0000 +0000 1 47 24.184028
7 2015-08-16 00:00:00 +0000 +0000 1 29 10.590278
8 2015-08-17 00:00:00 +0000 +0000 1 58 25.745645
9 2015-08-18 00:00:00 +0000 +0000 1 65 31.427083
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3332 entries, 0 to 3331
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   trafficindexdate       3332 non-null   object 
 1   minimum_traffic_index  3332 non-null   int64  
 2   maximum_traffic_index  3332 non-null   int64  
 3   average_traffic_index  3332 non-null   float64
dtypes: float64(1), int64(2), object(1)
memory usage: 104.3+ KB
In [6]:
df.isnull().sum()
Out[6]:
trafficindexdate         0
minimum_traffic_index    0
maximum_traffic_index    0
average_traffic_index    0
dtype: int64

2.2 Preprocessing¶

In [7]:
df['trafficindexdate'] = pd.to_datetime(df['trafficindexdate'], format='mixed')
df['year'] = df['trafficindexdate'].dt.year
df['month'] = df['trafficindexdate'].dt.month
df['dayofweek'] = df['trafficindexdate'].dt.dayofweek
df['dayname'] = df['trafficindexdate'].dt.day_name()
df['is_weekend'] = df['dayofweek'].isin([5, 6]).astype(int)

2.3 Statistics¶

In [8]:
df[['minimum_traffic_index', 'maximum_traffic_index', 'average_traffic_index']].describe()
Out[8]:
minimum_traffic_index maximum_traffic_index average_traffic_index
count 3332.000000 3332.000000 3332.000000
mean 2.075630 59.956483 27.830846
std 2.751651 15.613316 8.252579
min 1.000000 4.000000 1.083916
25% 1.000000 53.000000 23.651568
50% 1.000000 63.000000 28.979524
75% 2.000000 71.000000 33.491313
max 58.000000 90.000000 59.428571
In [9]:
yearly_avg = df.groupby('year')['average_traffic_index'].agg(['mean', 'std', 'min', 'max', 'count'])
yearly_avg.columns = ['Mean', 'Std', 'Min', 'Max', 'Days']
yearly_avg
Out[9]:
Mean Std Min Max Days
year
2015 30.475363 8.081503 7.731707 57.858116 145
2016 30.311479 8.615723 4.680556 59.428571 349
2017 25.076298 5.705280 5.153846 43.804196 356
2018 24.680702 5.927610 4.344948 40.487633 365
2019 25.681933 6.387881 4.699301 49.461538 364
2020 24.094842 9.425063 2.635688 44.482639 366
2021 28.452302 11.069990 1.083916 46.298246 365
2022 29.234242 6.723550 8.383275 44.911111 365
2023 31.241237 7.123850 4.651568 58.187500 365
2024 31.416273 7.268290 2.233333 48.375000 292

3. Visualizations¶

In [10]:
plt.figure(figsize=(15, 6))
plt.plot(df['trafficindexdate'], df['average_traffic_index'], linewidth=0.8, alpha=0.7)
plt.title('İstanbul Traffic Index Over Time', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Average Traffic Index')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [11]:
plt.figure(figsize=(14, 6))
df.boxplot(column='average_traffic_index', by='year', figsize=(14, 6))
plt.suptitle('')
plt.title('Traffic Index Distribution by Year', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Average Traffic Index')
plt.tight_layout()
plt.show()
<Figure size 1400x600 with 0 Axes>
No description has been provided for this image
In [12]:
plt.figure(figsize=(12, 6))
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_avg = df.groupby('dayname')['average_traffic_index'].mean().reindex(day_order)
colors = ['#4ecdc4' if day not in ['Saturday', 'Sunday'] else '#ff6b6b' for day in day_order]
plt.bar(range(len(day_avg)), day_avg.values, color=colors)
plt.xticks(range(len(day_avg)), day_order, rotation=45)
plt.title('Average Traffic by Day of Week', fontsize=14)
plt.ylabel('Average Traffic Index')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [13]:
pivot_table = df.pivot_table(
    values='average_traffic_index',
    index='month',
    columns='year',
    aggfunc='mean'
)
plt.figure(figsize=(14, 8))
sns.heatmap(pivot_table, annot=True, fmt='.1f', cmap='YlOrRd')
plt.title('Monthly Traffic Index by Year', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Month')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [14]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

weekend_data = df[df['is_weekend'] == 1]['average_traffic_index']
weekday_data = df[df['is_weekend'] == 0]['average_traffic_index']

axes[0].hist(weekday_data, bins=30, alpha=0.7, label='Weekday')
axes[0].hist(weekend_data, bins=30, alpha=0.7, label='Weekend')
axes[0].set_title('Weekday vs Weekend Distribution')
axes[0].set_xlabel('Average Traffic Index')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

df.boxplot(column='average_traffic_index', by='is_weekend', ax=axes[1])
axes[1].set_title('Weekday vs Weekend Comparison')
axes[1].set_xlabel('0=Weekday, 1=Weekend')
plt.suptitle('')
plt.tight_layout()
plt.show()
No description has been provided for this image

4. Findings¶

Data Coverage:

  • 9+ years (August 2015 - present)
  • 3,300+ daily observations
  • No missing values

Key Patterns:

  • Weekday traffic ~30% higher than weekends
  • Summer months (July-August) show reduced traffic
  • Visible impact of major events and holidays
  • Thursday typically has peak weekly traffic

Potential Further Analysis:

  • Time series forecasting
  • Anomaly detection (lockdowns, events)
  • Correlation with weather/holidays
  • Trend decomposition

Tools Used:
Python (pandas, numpy, matplotlib, seaborn), JupyterLab

AI Assistance:
Claude AI (Anthropic) - code generation, visualization, analysis structure

Data Source:
İBB Açık Veri Portalı - https://data.ibb.gov.tr

In [ ]:
 
In [ ]: