Week 2: tools - "AI impact jobs by 2030" dataset¶
Context¶
- Source: Kaggle
- Description: this dataset simulates the future of work in the age of artificial intelligence. It models how various professions, skills, and education levels might be impacted by AI-driven automation by the year 2030.
Load dataset¶
In [1]:
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
df = pd.read_csv("datasets/AI_Impact_on_Jobs_2030.csv")
# 🧾 Display dataset informations
print("Dataset shape:", df.shape)
print(df.info)
Dataset shape: (3000, 18)
<bound method DataFrame.info of Job_Title Average_Salary Years_Experience Education_Level \
0 Security Guard 45795 28 Master's
1 Research Scientist 133355 20 PhD
2 Construction Worker 146216 2 High School
3 Software Engineer 136530 13 PhD
4 Financial Analyst 70397 22 High School
... ... ... ... ...
2995 Doctor 111319 6 Bachelor's
2996 UX Researcher 44363 29 PhD
2997 Data Scientist 61325 23 Master's
2998 Graphic Designer 110296 7 PhD
2999 Graphic Designer 123909 25 PhD
AI_Exposure_Index Tech_Growth_Factor Automation_Probability_2030 \
0 0.18 1.28 0.85
1 0.62 1.11 0.05
2 0.86 1.18 0.81
3 0.39 0.68 0.60
4 0.52 1.46 0.64
... ... ... ...
2995 0.24 1.18 0.20
2996 0.65 0.74 0.35
2997 0.64 0.94 0.39
2998 0.95 1.23 0.46
2999 0.69 0.56 0.49
Risk_Category Skill_1 Skill_2 Skill_3 Skill_4 Skill_5 Skill_6 \
0 High 0.45 0.10 0.46 0.33 0.14 0.65
1 Low 0.02 0.52 0.40 0.05 0.97 0.23
2 High 0.01 0.94 0.56 0.39 0.02 0.23
3 Medium 0.43 0.21 0.57 0.03 0.84 0.45
4 Medium 0.75 0.54 0.59 0.97 0.61 0.28
... ... ... ... ... ... ... ...
2995 Low 0.73 0.37 0.99 0.07 0.08 0.92
2996 Medium 0.23 0.48 0.05 0.88 0.56 0.29
2997 Medium 0.28 0.62 0.73 0.21 0.96 0.01
2998 Medium 0.21 0.18 0.14 0.22 0.55 0.68
2999 Medium 0.77 0.54 0.95 0.05 0.29 0.22
Skill_7 Skill_8 Skill_9 Skill_10
0 0.06 0.72 0.94 0.00
1 0.09 0.62 0.38 0.98
2 0.24 0.68 0.61 0.83
3 0.40 0.93 0.73 0.33
4 0.30 0.17 0.02 0.42
... ... ... ... ...
2995 0.65 0.33 0.76 0.45
2996 0.69 0.80 0.61 0.20
2997 0.70 0.29 0.48 0.57
2998 0.31 0.55 0.34 0.70
2999 0.77 0.52 0.14 0.29
[3000 rows x 18 columns]>
Explore content¶
In [2]:
df.head()
Out[2]:
| Job_Title | Average_Salary | Years_Experience | Education_Level | AI_Exposure_Index | Tech_Growth_Factor | Automation_Probability_2030 | Risk_Category | Skill_1 | Skill_2 | Skill_3 | Skill_4 | Skill_5 | Skill_6 | Skill_7 | Skill_8 | Skill_9 | Skill_10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Security Guard | 45795 | 28 | Master's | 0.18 | 1.28 | 0.85 | High | 0.45 | 0.10 | 0.46 | 0.33 | 0.14 | 0.65 | 0.06 | 0.72 | 0.94 | 0.00 |
| 1 | Research Scientist | 133355 | 20 | PhD | 0.62 | 1.11 | 0.05 | Low | 0.02 | 0.52 | 0.40 | 0.05 | 0.97 | 0.23 | 0.09 | 0.62 | 0.38 | 0.98 |
| 2 | Construction Worker | 146216 | 2 | High School | 0.86 | 1.18 | 0.81 | High | 0.01 | 0.94 | 0.56 | 0.39 | 0.02 | 0.23 | 0.24 | 0.68 | 0.61 | 0.83 |
| 3 | Software Engineer | 136530 | 13 | PhD | 0.39 | 0.68 | 0.60 | Medium | 0.43 | 0.21 | 0.57 | 0.03 | 0.84 | 0.45 | 0.40 | 0.93 | 0.73 | 0.33 |
| 4 | Financial Analyst | 70397 | 22 | High School | 0.52 | 1.46 | 0.64 | Medium | 0.75 | 0.54 | 0.59 | 0.97 | 0.61 | 0.28 | 0.30 | 0.17 | 0.02 | 0.42 |
Display a nice chart¶
In [3]:
target = 'Job_Title'
plt.figure(figsize=(10, 8))
sb.countplot(data=df, x=target, palette='pastel', hue=target, edgecolor='black')
plt.title(f'Distribution by {target}', fontsize=14)
plt.xlabel(target, fontsize=12)
plt.xticks(rotation=90)
plt.ylabel('Count', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
In [ ]: