Detailed AI responses I used to understand the assignment:¶
The following response was to the two questions,
- What data would I use to train a machine learning model, given the data I already have?
- How can I apply MLPClassifier to the data I have? Please provide an explanation on the theory first.
- What data would I use to train a machine learning model?
You typically use the exact same data you already have, but you split it into two specific roles.
To a computer, "Training Data" just means "History." For your Bhutan Health project, here is how you assign the roles:
- The Input (Features / X): The Year. Why? This is the information you "know" and want to use to make a guess.Data: [2018, 2019, 2020, 2021, 2022]
- The Output (Target / y): The Death Count. Why? This is the answer you want the computer to learn how to find.Data: [50, 55, 60, 58, 70] (Example numbers for one disease)
- The "Training" Process: You feed both $X$ and $y$ into the model. The model looks at them and learns: "Ah, as the Year gets bigger, the Death Count seems to go up."
- The "Prediction" Process: Once trained, you give the model a New Input (e.g., 2024). It uses the pattern it learned to give you a New Output.
- How can I apply MLPClassifier?
Important Correction: In Machine Learning, names matter!
Classifier: Used when the answer is a Category (e.g., "Cat" vs. "Dog", or "High Risk" vs. "Low Risk").Regressor: Used when the answer is a Number (e.g., "50 deaths", "100.5 degrees").
Since you want to predict the number of deaths (a curve), you actually want an MLPRegressor. It works exactly the same way but outputs numbers instead of labels.
Part A: The Theory (Simple Explanation) MLP stands for Multi-Layer Perceptron. It is a fancy name for a basic Neural Network—the same technology behind things like Siri or ChatGPT, just much smaller. Imagine a "Brain" with three layers:
- Input Layer (The Eyes): It sees the data (The Year: "2018").
- Hidden Layer (The Neurons): A group of mathematical cells that process the information. They mix the numbers, add weights, and try to find complex patterns (not just straight lines). Example: One neuron might look for "slow growth," another for "sudden spikes."
- Output Layer (The Mouth): It shouts out the answer ("50 Deaths").
- How it learns (Backpropagation):The model guesses a random number. It checks the answer key (your CSV). It realizes it was wrong. It goes backwards through the neurons, adjusting their knobs (weights) slightly to fix the error. It does this thousands of times until the error is near zero.
Here is the code to swap out your "Polynomial Curve" for a "Neural Network (MLPRegressor)."
Note: Neural Networks are very sensitive. They get confused by big years like "2018." We often have to "scale" the data (turn 2018 into 0, 2019 into 0.25, etc.) for them to work well.
from sklearn.neural_network import MLPRegressor from sklearn.preprocessing import MinMaxScaler import numpy as np import plotly.graph_objects as go
def analyze_neural_network(category_name): # 1. PREPARE DATA # Get the data just like before if category_name not in df_grouped.index: return y_values = df_grouped.loc[category_name].values x_values = np.array([2018, 2019, 2020, 2021, 2022])
# 2. FORMATTING (Crucial for Neural Networks)
# Scikit-learn needs data shaped like vertical columns [[2018], [2019]...]
X_train = x_values.reshape(-1, 1)
y_train = y_values
# Scaling: Neural Networks hate big numbers like "2022".
# We squash them between 0 and 1.
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
# 3. CREATE THE BRAIN (MLPRegressor)
# hidden_layer_sizes=(100,): A brain with 100 neurons
# max_iter=2000: Try to learn 2000 times
model = MLPRegressor(hidden_layer_sizes=(100,), max_iter=2000, random_state=42)
# 4. TRAIN (Fit)
model.fit(X_scaled, y_train)
# 5. PREDICT
# Let's predict for 2018-2023
x_future = np.array([[2018], [2019], [2020], [2021], [2022], [2023]])
x_future_scaled = scaler.transform(x_future) # Don't forget to scale the future too!
predictions = model.predict(x_future_scaled)
# 6. VISUALIZE
fig = go.Figure()
# Real Data
fig.add_trace(go.Scatter(x=x_values, y=y_values, mode='markers', name='Actual Data',
marker=dict(color='black', size=12)))
# Neural Network Prediction
fig.add_trace(go.Scatter(x=x_future.flatten(), y=predictions, mode='lines', name='Neural Net Fit',
line=dict(color='purple', width=3)))
fig.update_layout(title=f"Neural Network Analysis: {category_name}")
fig.show()
Run it¶
analyze_neural_network('Infectious Diseases')