Week 3: fitting - "AI impact jobs by 2030" dataset¶
Context¶
- Source: Kaggle
- Description: this dataset simulates the future of work in the age of artificial intelligence. It models how various professions, skills, and education levels might be impacted by AI-driven automation by the year 2030.
Load dataset¶
In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("datasets/AI_Impact_on_Jobs_2030.csv")
print("Dataset shape:", sorted_df.shape)
Dataset shape: (168, 18)
Explore content¶
In [7]:
df.head()
Out[7]:
| Job_Title | Average_Salary | Years_Experience | Education_Level | AI_Exposure_Index | Tech_Growth_Factor | Automation_Probability_2030 | Risk_Category | Skill_1 | Skill_2 | Skill_3 | Skill_4 | Skill_5 | Skill_6 | Skill_7 | Skill_8 | Skill_9 | Skill_10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Security Guard | 45795 | 28 | Master's | 0.18 | 1.28 | 0.85 | High | 0.45 | 0.10 | 0.46 | 0.33 | 0.14 | 0.65 | 0.06 | 0.72 | 0.94 | 0.00 |
| 1 | Research Scientist | 133355 | 20 | PhD | 0.62 | 1.11 | 0.05 | Low | 0.02 | 0.52 | 0.40 | 0.05 | 0.97 | 0.23 | 0.09 | 0.62 | 0.38 | 0.98 |
| 2 | Construction Worker | 146216 | 2 | High School | 0.86 | 1.18 | 0.81 | High | 0.01 | 0.94 | 0.56 | 0.39 | 0.02 | 0.23 | 0.24 | 0.68 | 0.61 | 0.83 |
| 3 | Software Engineer | 136530 | 13 | PhD | 0.39 | 0.68 | 0.60 | Medium | 0.43 | 0.21 | 0.57 | 0.03 | 0.84 | 0.45 | 0.40 | 0.93 | 0.73 | 0.33 |
| 4 | Financial Analyst | 70397 | 22 | High School | 0.52 | 1.46 | 0.64 | Medium | 0.75 | 0.54 | 0.59 | 0.97 | 0.61 | 0.28 | 0.30 | 0.17 | 0.02 | 0.42 |
Display how automation probability is related (or not..) to salary¶
In [8]:
x = df['Average_Salary']
y = df['Automation_Probability_2030']
plt.plot(x,y,'o')
plt.show()
The graph shows there is no obvious link between salary and automation probability !
Display how automation probability is related (or not..) to years of experience¶
In [9]:
x = df['Years_Experience']
y = df['Automation_Probability_2030']
plt.plot(x,y,'o')
plt.show()
Nothing usefull here... Let's try again and focus on job title..¶
In [12]:
x = df['Job_Title']
y = df['Automation_Probability_2030']
plt.plot(x,y,'o')
plt.xticks(rotation=90)
plt.show()
Conclusion: this is not a good candidate for fitting...¶
In [ ]: