Philippe Libioulle - Fab Futures - Data Science
Home About

< Previous dataset - Week 2 home - Next dataset>

Week 2: tools - "AI impact jobs by 2030" dataset¶

Context¶

  • Source: Kaggle
  • Description: this dataset simulates the future of work in the age of artificial intelligence. It models how various professions, skills, and education levels might be impacted by AI-driven automation by the year 2030.

Load dataset¶

In [1]:
import pandas as pd 
import seaborn as sb
import matplotlib.pyplot as plt

df = pd.read_csv("datasets/AI_Impact_on_Jobs_2030.csv")

# 🧾 Display dataset informations
print("Dataset shape:", df.shape)
print(df.info)
Dataset shape: (3000, 18)
<bound method DataFrame.info of                 Job_Title  Average_Salary  Years_Experience Education_Level  \
0          Security Guard           45795                28        Master's   
1      Research Scientist          133355                20             PhD   
2     Construction Worker          146216                 2     High School   
3       Software Engineer          136530                13             PhD   
4       Financial Analyst           70397                22     High School   
...                   ...             ...               ...             ...   
2995               Doctor          111319                 6      Bachelor's   
2996        UX Researcher           44363                29             PhD   
2997       Data Scientist           61325                23        Master's   
2998     Graphic Designer          110296                 7             PhD   
2999     Graphic Designer          123909                25             PhD   

      AI_Exposure_Index  Tech_Growth_Factor  Automation_Probability_2030  \
0                  0.18                1.28                         0.85   
1                  0.62                1.11                         0.05   
2                  0.86                1.18                         0.81   
3                  0.39                0.68                         0.60   
4                  0.52                1.46                         0.64   
...                 ...                 ...                          ...   
2995               0.24                1.18                         0.20   
2996               0.65                0.74                         0.35   
2997               0.64                0.94                         0.39   
2998               0.95                1.23                         0.46   
2999               0.69                0.56                         0.49   

     Risk_Category  Skill_1  Skill_2  Skill_3  Skill_4  Skill_5  Skill_6  \
0             High     0.45     0.10     0.46     0.33     0.14     0.65   
1              Low     0.02     0.52     0.40     0.05     0.97     0.23   
2             High     0.01     0.94     0.56     0.39     0.02     0.23   
3           Medium     0.43     0.21     0.57     0.03     0.84     0.45   
4           Medium     0.75     0.54     0.59     0.97     0.61     0.28   
...            ...      ...      ...      ...      ...      ...      ...   
2995           Low     0.73     0.37     0.99     0.07     0.08     0.92   
2996        Medium     0.23     0.48     0.05     0.88     0.56     0.29   
2997        Medium     0.28     0.62     0.73     0.21     0.96     0.01   
2998        Medium     0.21     0.18     0.14     0.22     0.55     0.68   
2999        Medium     0.77     0.54     0.95     0.05     0.29     0.22   

      Skill_7  Skill_8  Skill_9  Skill_10  
0        0.06     0.72     0.94      0.00  
1        0.09     0.62     0.38      0.98  
2        0.24     0.68     0.61      0.83  
3        0.40     0.93     0.73      0.33  
4        0.30     0.17     0.02      0.42  
...       ...      ...      ...       ...  
2995     0.65     0.33     0.76      0.45  
2996     0.69     0.80     0.61      0.20  
2997     0.70     0.29     0.48      0.57  
2998     0.31     0.55     0.34      0.70  
2999     0.77     0.52     0.14      0.29  

[3000 rows x 18 columns]>

Explore content¶

In [2]:
df.head()
Out[2]:
Job_Title Average_Salary Years_Experience Education_Level AI_Exposure_Index Tech_Growth_Factor Automation_Probability_2030 Risk_Category Skill_1 Skill_2 Skill_3 Skill_4 Skill_5 Skill_6 Skill_7 Skill_8 Skill_9 Skill_10
0 Security Guard 45795 28 Master's 0.18 1.28 0.85 High 0.45 0.10 0.46 0.33 0.14 0.65 0.06 0.72 0.94 0.00
1 Research Scientist 133355 20 PhD 0.62 1.11 0.05 Low 0.02 0.52 0.40 0.05 0.97 0.23 0.09 0.62 0.38 0.98
2 Construction Worker 146216 2 High School 0.86 1.18 0.81 High 0.01 0.94 0.56 0.39 0.02 0.23 0.24 0.68 0.61 0.83
3 Software Engineer 136530 13 PhD 0.39 0.68 0.60 Medium 0.43 0.21 0.57 0.03 0.84 0.45 0.40 0.93 0.73 0.33
4 Financial Analyst 70397 22 High School 0.52 1.46 0.64 Medium 0.75 0.54 0.59 0.97 0.61 0.28 0.30 0.17 0.02 0.42

Display a nice chart¶

In [3]:
target = 'Job_Title'
plt.figure(figsize=(10, 8))
sb.countplot(data=df, x=target, palette='pastel', hue=target, edgecolor='black')
plt.title(f'Distribution by {target}', fontsize=14)
plt.xlabel(target, fontsize=12)
plt.xticks(rotation=90)
plt.ylabel('Count', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()

plt.show()
No description has been provided for this image
In [ ]: