Philippe Libioulle - Fab Futures - Data Science
Home About

< Previous dataset - Week 2 home - Next dataset>

Week 2: tools - "Anomaly detection in network traffic" dataset¶

Context¶

  • Source: Kaggle
  • Description: this dataset is designed for the development and evaluation of anomaly detection models in embedded system network security. It includes network traffic features that simulate both normal and malicious behavior, making it suitable for supervised learning tasks focused on identifying security breaches in networked embedded systems. The data in this file is structured with various network-related features, including packet size, inter-arrival time, protocol type, source and destination IPs, TCP flags, and frequency-domain features extracted using Wavelet Transform. The dataset also includes a target column that marks each data entry as either normal (0) or anomalous (1).

Load dataset¶

In [7]:
import pandas as pd 
import seaborn as sb
import matplotlib.pyplot as plt

df_traffic = pd.read_csv("datasets/embedded_system_network_security_dataset.csv")

# 🧾 Display dataset informations
print("Traffic dataset shape:", df_traffic.shape)
#print(df.info)
Traffic dataset shape: (1000, 18)

Explore content¶

In [8]:
df_traffic.head()
Out[8]:
packet_size inter_arrival_time src_port dst_port packet_count_5s mean_packet_size spectral_entropy frequency_band_energy label protocol_type_TCP protocol_type_UDP src_ip_192.168.1.2 src_ip_192.168.1.3 dst_ip_192.168.1.5 dst_ip_192.168.1.6 tcp_flags_FIN tcp_flags_SYN tcp_flags_SYN-ACK
0 0.405154 0.620362 62569 443 0.857143 0.0 0.834066 0.534891 0.0 False True True False False False False False False
1 0.527559 0.741288 59382 443 0.785714 0.0 0.147196 0.990757 0.0 False True False False False True False True False
2 0.226199 0.485116 65484 80 0.285714 0.0 0.855192 0.031781 0.0 False True False False True False False False False
3 0.573372 0.450965 51707 53 0.142857 0.0 0.153220 0.169958 0.0 False False False True False False False False False
4 0.651396 0.888740 26915 53 0.714286 0.0 0.923916 0.552053 0.0 True False False True False False False True False

Display a nice chart¶

In [9]:
import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()

df_traffic_sample= df_traffic #[df_traffic['label'] > 0]
series_from_range = pd.Series(range(100000))

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "grey", width = 0.1),
      label = series_from_range,
      color = "red"
    ),
    link = dict(
      source = df_traffic_sample['src_port'].to_numpy(),      
      target = df_traffic_sample['dst_port'].to_numpy(),
      value = df_traffic_sample['packet_size'].to_numpy()
  ))])

fig.update_layout(title_text="Source to destination traffic", font_size=5, height=1200)
fig.show()
In [ ]: