Data Science - Week 1: Introduction Assignment¶
Student: [Sedat Yalcin]
Data Source: İBB Açık Veri Portalı
1. Dataset Selection¶
İstanbul Trafik Endeksi (Istanbul Traffic Index)¶
Source: İstanbul Büyükşehir Belediyesi Açık Veri Portalı
URL: https://data.ibb.gov.tr/dataset/istanbul-trafik-indeksi
License: İBB Açık Veri Lisansı
Period: August 2015 - Present
Format: CSV
Variables:
trafficindexdate- Measurement dateminimum_traffic_index- Daily minimummaximum_traffic_index- Daily maximumaverage_traffic_index- Daily average
Analysis Focus:
- Temporal trends in Istanbul traffic
- Weekday vs weekend patterns
- Seasonal variations
2. Data Exploration¶
In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
In [3]:
df = pd.read_csv('datasets/traffic_index.csv')
print(f"Shape: {df.shape[0]} rows × {df.shape[1]} columns")
Shape: 3332 rows × 4 columns
2.1 Basic Information¶
In [4]:
# Display first few rows
df.head(10)
Out[4]:
| trafficindexdate | minimum_traffic_index | maximum_traffic_index | average_traffic_index | |
|---|---|---|---|---|
| 0 | 2015-08-06 00:00:00 +0000 +0000 | 24 | 63 | 57.858116 |
| 1 | 2015-08-07 00:00:00 +0000 +0000 | 2 | 49 | 23.770492 |
| 2 | 2015-08-11 00:00:00 +0000 +0000 | 11 | 62 | 38.601266 |
| 3 | 2015-08-12 00:00:00 +0000 +0000 | 1 | 63 | 29.715278 |
| 4 | 2015-08-13 00:00:00 +0000 +0000 | 2 | 56 | 28.557491 |
| 5 | 2015-08-14 00:00:00 +0000 +0000 | 2 | 69 | 32.300000 |
| 6 | 2015-08-15 00:00:00 +0000 +0000 | 1 | 47 | 24.184028 |
| 7 | 2015-08-16 00:00:00 +0000 +0000 | 1 | 29 | 10.590278 |
| 8 | 2015-08-17 00:00:00 +0000 +0000 | 1 | 58 | 25.745645 |
| 9 | 2015-08-18 00:00:00 +0000 +0000 | 1 | 65 | 31.427083 |
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3332 entries, 0 to 3331 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 trafficindexdate 3332 non-null object 1 minimum_traffic_index 3332 non-null int64 2 maximum_traffic_index 3332 non-null int64 3 average_traffic_index 3332 non-null float64 dtypes: float64(1), int64(2), object(1) memory usage: 104.3+ KB
In [6]:
df.isnull().sum()
Out[6]:
trafficindexdate 0 minimum_traffic_index 0 maximum_traffic_index 0 average_traffic_index 0 dtype: int64
2.2 Preprocessing¶
In [7]:
df['trafficindexdate'] = pd.to_datetime(df['trafficindexdate'], format='mixed')
df['year'] = df['trafficindexdate'].dt.year
df['month'] = df['trafficindexdate'].dt.month
df['dayofweek'] = df['trafficindexdate'].dt.dayofweek
df['dayname'] = df['trafficindexdate'].dt.day_name()
df['is_weekend'] = df['dayofweek'].isin([5, 6]).astype(int)
2.3 Statistics¶
In [8]:
df[['minimum_traffic_index', 'maximum_traffic_index', 'average_traffic_index']].describe()
Out[8]:
| minimum_traffic_index | maximum_traffic_index | average_traffic_index | |
|---|---|---|---|
| count | 3332.000000 | 3332.000000 | 3332.000000 |
| mean | 2.075630 | 59.956483 | 27.830846 |
| std | 2.751651 | 15.613316 | 8.252579 |
| min | 1.000000 | 4.000000 | 1.083916 |
| 25% | 1.000000 | 53.000000 | 23.651568 |
| 50% | 1.000000 | 63.000000 | 28.979524 |
| 75% | 2.000000 | 71.000000 | 33.491313 |
| max | 58.000000 | 90.000000 | 59.428571 |
In [9]:
yearly_avg = df.groupby('year')['average_traffic_index'].agg(['mean', 'std', 'min', 'max', 'count'])
yearly_avg.columns = ['Mean', 'Std', 'Min', 'Max', 'Days']
yearly_avg
Out[9]:
| Mean | Std | Min | Max | Days | |
|---|---|---|---|---|---|
| year | |||||
| 2015 | 30.475363 | 8.081503 | 7.731707 | 57.858116 | 145 |
| 2016 | 30.311479 | 8.615723 | 4.680556 | 59.428571 | 349 |
| 2017 | 25.076298 | 5.705280 | 5.153846 | 43.804196 | 356 |
| 2018 | 24.680702 | 5.927610 | 4.344948 | 40.487633 | 365 |
| 2019 | 25.681933 | 6.387881 | 4.699301 | 49.461538 | 364 |
| 2020 | 24.094842 | 9.425063 | 2.635688 | 44.482639 | 366 |
| 2021 | 28.452302 | 11.069990 | 1.083916 | 46.298246 | 365 |
| 2022 | 29.234242 | 6.723550 | 8.383275 | 44.911111 | 365 |
| 2023 | 31.241237 | 7.123850 | 4.651568 | 58.187500 | 365 |
| 2024 | 31.416273 | 7.268290 | 2.233333 | 48.375000 | 292 |
3. Visualizations¶
In [10]:
plt.figure(figsize=(15, 6))
plt.plot(df['trafficindexdate'], df['average_traffic_index'], linewidth=0.8, alpha=0.7)
plt.title('İstanbul Traffic Index Over Time', fontsize=14)
plt.xlabel('Date')
plt.ylabel('Average Traffic Index')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
In [11]:
plt.figure(figsize=(14, 6))
df.boxplot(column='average_traffic_index', by='year', figsize=(14, 6))
plt.suptitle('')
plt.title('Traffic Index Distribution by Year', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Average Traffic Index')
plt.tight_layout()
plt.show()
<Figure size 1400x600 with 0 Axes>
In [12]:
plt.figure(figsize=(12, 6))
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_avg = df.groupby('dayname')['average_traffic_index'].mean().reindex(day_order)
colors = ['#4ecdc4' if day not in ['Saturday', 'Sunday'] else '#ff6b6b' for day in day_order]
plt.bar(range(len(day_avg)), day_avg.values, color=colors)
plt.xticks(range(len(day_avg)), day_order, rotation=45)
plt.title('Average Traffic by Day of Week', fontsize=14)
plt.ylabel('Average Traffic Index')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
In [13]:
pivot_table = df.pivot_table(
values='average_traffic_index',
index='month',
columns='year',
aggfunc='mean'
)
plt.figure(figsize=(14, 8))
sns.heatmap(pivot_table, annot=True, fmt='.1f', cmap='YlOrRd')
plt.title('Monthly Traffic Index by Year', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Month')
plt.tight_layout()
plt.show()
In [14]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
weekend_data = df[df['is_weekend'] == 1]['average_traffic_index']
weekday_data = df[df['is_weekend'] == 0]['average_traffic_index']
axes[0].hist(weekday_data, bins=30, alpha=0.7, label='Weekday')
axes[0].hist(weekend_data, bins=30, alpha=0.7, label='Weekend')
axes[0].set_title('Weekday vs Weekend Distribution')
axes[0].set_xlabel('Average Traffic Index')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
df.boxplot(column='average_traffic_index', by='is_weekend', ax=axes[1])
axes[1].set_title('Weekday vs Weekend Comparison')
axes[1].set_xlabel('0=Weekday, 1=Weekend')
plt.suptitle('')
plt.tight_layout()
plt.show()
4. Findings¶
Data Coverage:
- 9+ years (August 2015 - present)
- 3,300+ daily observations
- No missing values
Key Patterns:
- Weekday traffic ~30% higher than weekends
- Summer months (July-August) show reduced traffic
- Visible impact of major events and holidays
- Thursday typically has peak weekly traffic
Potential Further Analysis:
- Time series forecasting
- Anomaly detection (lockdowns, events)
- Correlation with weather/holidays
- Trend decomposition
Tools Used:
Python (pandas, numpy, matplotlib, seaborn), JupyterLab
AI Assistance:
Claude AI (Anthropic) - code generation, visualization, analysis structure
Data Source:
İBB Açık Veri Portalı - https://data.ibb.gov.tr
In [ ]:
In [ ]: