Week 2: tools - "Anomaly detection in network traffic" dataset¶
Context¶
- Source: Kaggle
- Description: this dataset is designed for the development and evaluation of anomaly detection models in embedded system network security. It includes network traffic features that simulate both normal and malicious behavior, making it suitable for supervised learning tasks focused on identifying security breaches in networked embedded systems. The data in this file is structured with various network-related features, including packet size, inter-arrival time, protocol type, source and destination IPs, TCP flags, and frequency-domain features extracted using Wavelet Transform. The dataset also includes a target column that marks each data entry as either normal (0) or anomalous (1).
Load dataset¶
In [7]:
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
df_traffic = pd.read_csv("datasets/embedded_system_network_security_dataset.csv")
# 🧾 Display dataset informations
print("Traffic dataset shape:", df_traffic.shape)
#print(df.info)
Traffic dataset shape: (1000, 18)
Explore content¶
In [8]:
df_traffic.head()
Out[8]:
| packet_size | inter_arrival_time | src_port | dst_port | packet_count_5s | mean_packet_size | spectral_entropy | frequency_band_energy | label | protocol_type_TCP | protocol_type_UDP | src_ip_192.168.1.2 | src_ip_192.168.1.3 | dst_ip_192.168.1.5 | dst_ip_192.168.1.6 | tcp_flags_FIN | tcp_flags_SYN | tcp_flags_SYN-ACK | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.405154 | 0.620362 | 62569 | 443 | 0.857143 | 0.0 | 0.834066 | 0.534891 | 0.0 | False | True | True | False | False | False | False | False | False |
| 1 | 0.527559 | 0.741288 | 59382 | 443 | 0.785714 | 0.0 | 0.147196 | 0.990757 | 0.0 | False | True | False | False | False | True | False | True | False |
| 2 | 0.226199 | 0.485116 | 65484 | 80 | 0.285714 | 0.0 | 0.855192 | 0.031781 | 0.0 | False | True | False | False | True | False | False | False | False |
| 3 | 0.573372 | 0.450965 | 51707 | 53 | 0.142857 | 0.0 | 0.153220 | 0.169958 | 0.0 | False | False | False | True | False | False | False | False | False |
| 4 | 0.651396 | 0.888740 | 26915 | 53 | 0.714286 | 0.0 | 0.923916 | 0.552053 | 0.0 | True | False | False | True | False | False | False | True | False |
Display a nice chart¶
In [9]:
import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()
df_traffic_sample= df_traffic #[df_traffic['label'] > 0]
series_from_range = pd.Series(range(100000))
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "grey", width = 0.1),
label = series_from_range,
color = "red"
),
link = dict(
source = df_traffic_sample['src_port'].to_numpy(),
target = df_traffic_sample['dst_port'].to_numpy(),
value = df_traffic_sample['packet_size'].to_numpy()
))])
fig.update_layout(title_text="Source to destination traffic", font_size=5, height=1200)
fig.show()
In [ ]: