Assignment : Machine LearningΒΆ
scikit-learnΒΆ
explain this code: from sklearn.neural_network import MLPClassifier import numpy as np X = [[0,0],[0,1],[1,0],[1,1]] y = [0,1,1,0] classifier = MLPClassifier(solver='lbfgs',hidden_layer_sizes=(4),activation='tanh',random_state=1) classifier.fit(X,y) print(f"score: {classifier.score(X,y)}") print("Predictions:") np.c_[X,classifier.predict(X)]
The response that i got form ChatGPT was:
β Full Code Explanation (Simple Terms)
from sklearn.neural_network import MLPClassifier
import numpy as np
You import a neural network model (MLPClassifier).
You import NumPy for handling arrays.
X = [[0,0],[0,1],[1,0],[1,1]] y = [0,1,1,0]
X = your input data Each pair represents two inputs. Example: [0, 1]
y = your output labels Example: input [0,1] β output 1
π‘ These data points represent the XOR problem:
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0
classifier = MLPClassifier(solver='lbfgs', hidden_layer_sizes=(4), activation='tanh', random_state=1)
This creates a neural network classifier with:
solver='lbfgs' β algorithm used to train the model
hidden_layer_sizes=(4) β one hidden layer with 4 neurons
activation='tanh' β type of function in each neuron
random_state=1 β makes results reproducible
π This neural network will try to learn the XOR pattern.
classifier.fit(X, y)
This trains the neural network.
The model learns the relationship between inputs (X) and outputs (y).
π After this, the model knows how XOR works.
print(f"score: {classifier.score(X,y)}")
This prints the accuracy of the model.
Since XOR has 4 items, and neural network learns perfectly, score = 1.0 (100% correct).
print("Predictions:")
np.c_[X, classifier.predict(X)]
classifier.predict(X) β model predicts outputs for each input
np.c_[] β concatenates input and predicted output side-by-side
You get a table like:
input1 input2 prediction 0 0 0 0 1 1 1 0 1 1 1 0
π This shows that the neural network has learned XOR.
β Super Simple Summary
You gave the neural network data for XOR.
It learned the XOR rule.
It predicts correctly.
You print its accuracy and predictions.
what is sklearn?
Scikit-learn, often referred to as sklearn, is a free and open-source machine learning library for Python. It provides a wide range of algorithms and tools for various machine learning tasks, making it a fundamental library for data scientists and machine learning practitioners. What is MLPClassifier?
A Multilayer Perceptron (MLP) classifier is a type of artificial neural network (ANN) used for classification tasks. It is a fundamental deep learning model that learns to map input data to output categories by training on a dataset using a process called backpropagation.
In this classifier, it is structured into layers with neurons (node) with multilayers where the nodes are connected to the hidden layer such as:
Input Layer : Initial Data or features Hidden Layers : One or more layers positioned between input and output layer Output Layer : The final layer will produce the prediction.
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# β
Load & parse data from Large Numbers.pdf
data = [
["Abhishek Subba", "69.10%", 6951, 748, "95.90%", 29, "Gold", 4, 0, 0, "2025-10-22T14:53:12"],
["Abishek Adhikari", "63.71%", 4985, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-08-18T11:21:05"],
["Anjana Subba", "82.00%", 5311, 846, "108.46%", 33, "Diamond", 2, 2, 0, "2025-09-10T13:22:29"],
["Arpan Rai", "82.81%", 5547, 790, "101.28%", 29, "Diamond", 4, 0, 0, "2025-08-09T18:04:17"],
["Arpana Ghimirey", "78.34%", 4773, 509, "65.26%", 21, "Bronze", 1, 0, 0, "2025-10-22T12:40:02"],
["Chimi Dolma Gurung", "70.38%", 4093, 468, "60.00%", 23, "Bronze", 1, 0, 0, "2025-10-01T12:20:50"],
["Dawa Kelwang Keltshok", "73.07%", 4601, 782, "100.26%", 31, "Diamond", 4, 0, 0, "2025-05-20T13:15:02"],
["Jamyang Gurung", "77.57%", 5469, 781, "100.13%", 30, "Diamond", 4, 0, 0, "2025-05-15T20:20:30"],
["Jamyang Tenzin Namgyel", "75.23%", 5180, 797, "102.18%", 30, "Diamond", 2, 3, 0, "2025-09-03T14:34:27"],
["Jigme Tenzin Wangpo", "75.83%", 5037, 782, "100.26%", 30, "Diamond", 2, 0, 0, "2025-10-22T08:31:26"],
["Karma Dema Chokey", "75.55%", 16432, 788, "101.03%", 30, "Diamond", 4, 0, 0, "2025-09-25T13:18:29"],
["Kishan Rai", "79.70%", 4460, 800, "102.56%", 31, "Diamond", 0, 3, 0, "2025-09-29T12:12:10"],
["Kuenga Rinchen", "62.28%", 9502, 451, "57.82%", 22, "Bronze", 1, 0, 0, "2025-08-08T17:23:50"],
["Leki Tshomo", "65.09%", 15455, 782, "100.26%", 30, "Diamond", 1, 3, 0, "2025-11-03T20:48:32"],
["Lhakey Choden", "83.23%", 2665, 459, "58.85%", 20, "Bronze", 0, 2, 0, "2025-09-09T13:46:40"],
["Melan Rai", "71.27%", 7520, 448, "57.44%", 21, "Bronze", 1, 1, 0, "2025-08-28T13:22:57"],
["Mercy Jeshron Subba", "67.59%", 7630, 786, "100.77%", 31, "Diamond", 3, 0, 0, "2025-10-15T15:00:19"],
["Najimul Mia", "56.79%", 10148, 788, "101.03%", 30, "Diamond", 3, 1, 1, "2025-08-29T19:06:48"],
["Nima Kelwang Keltshok", "77.49%", 5491, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-05-13T17:56:59"],
["Radha Dulal", "72.74%", 7431, 800, "102.56%", 31, "Diamond", 3, 1, 0, "2025-09-10T17:06:07"],
["Rigyel Singer", "76.60%", 10525, 787, "100.90%", 30, "Diamond", 0, 4, 1, "2025-10-08T13:28:29"],
["Susil Acharja", "73.73%", 5372, 794, "101.79%", 31, "Diamond", 4, 0, 0, "2025-06-08T19:19:10"],
["Tashi Tshokey Wangmo", "81.97%", 9897, 800, "102.56%", 30, "Diamond", 4, 0, 0, "2025-08-20T12:29:57"],
["Tashi Wangchuk", "85.39%", 5708, 472, "60.51%", 22, "Bronze", 0, 3, 0, "2025-09-08T12:30:39"],
["Tenzin Sonam Dolkar", "80.88%", 9247, 808, "103.59%", 31, "Diamond", 1, 2, 0, "2025-09-29T13:38:06"],
["Yeshey Tshoki", "85.73%", 2958, 412, "52.82%", 19, "Bronze", 1, 0, 0, "2025-08-06T14:36:48"],
["Yogira Kami", "80.36%", 7782, 783, "100.38%", 31, "Diamond", 2, 0, 0, "2025-10-08T13:25:35"],
]
df = pd.DataFrame(data, columns=[
"Name","Accuracy","Time (total)","Score (780)","Score (%)",
"Exercises started","Trophies","Easy","Moderate","Hard","Last submission date"
])
# β
Clean and prepare
df["Accuracy"] = df["Accuracy"].str.rstrip('%').astype(float)
df["Score (%)"] = df["Score (%)"].str.rstrip('%').astype(float)
# Features: numerical performance indicators
X = df[["Accuracy", "Exercises started", "Score (%)", "Time (total)"]].values
# Target: 1 if Diamond, 0 otherwise (Bronze/Gold β not Diamond)
y = (df["Trophies"] == "Diamond").astype(int).values
# β
Scale features (important for neural nets)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# β
Train MLP: 2 hidden layers (4, 3 neurons), tanh activation
mlp = MLPClassifier(
solver='lbfgs',
hidden_layer_sizes=(4, 3),
activation='tanh',
random_state=1,
max_iter=1000
)
mlp.fit(X_scaled, y)
# β
Evaluate
train_score = mlp.score(X_scaled, y)
print(f"Training Accuracy: {train_score:.4f}")
# β
Show predictions vs actual
predictions = mlp.predict(X_scaled)
result = np.c_[X, y, predictions]
print("\n[Accuracy, Exercises, Score%, Time, Actual (1=Diamond), Pred]")
print(result.round(2))
Training Accuracy: 1.0000 [Accuracy, Exercises, Score%, Time, Actual (1=Diamond), Pred] [[6.9100e+01 2.9000e+01 9.5900e+01 6.9510e+03 0.0000e+00 0.0000e+00] [6.3710e+01 3.0000e+01 1.0064e+02 4.9850e+03 1.0000e+00 1.0000e+00] [8.2000e+01 3.3000e+01 1.0846e+02 5.3110e+03 1.0000e+00 1.0000e+00] [8.2810e+01 2.9000e+01 1.0128e+02 5.5470e+03 1.0000e+00 1.0000e+00] [7.8340e+01 2.1000e+01 6.5260e+01 4.7730e+03 0.0000e+00 0.0000e+00] [7.0380e+01 2.3000e+01 6.0000e+01 4.0930e+03 0.0000e+00 0.0000e+00] [7.3070e+01 3.1000e+01 1.0026e+02 4.6010e+03 1.0000e+00 1.0000e+00] [7.7570e+01 3.0000e+01 1.0013e+02 5.4690e+03 1.0000e+00 1.0000e+00] [7.5230e+01 3.0000e+01 1.0218e+02 5.1800e+03 1.0000e+00 1.0000e+00] [7.5830e+01 3.0000e+01 1.0026e+02 5.0370e+03 1.0000e+00 1.0000e+00] [7.5550e+01 3.0000e+01 1.0103e+02 1.6432e+04 1.0000e+00 1.0000e+00] [7.9700e+01 3.1000e+01 1.0256e+02 4.4600e+03 1.0000e+00 1.0000e+00] [6.2280e+01 2.2000e+01 5.7820e+01 9.5020e+03 0.0000e+00 0.0000e+00] [6.5090e+01 3.0000e+01 1.0026e+02 1.5455e+04 1.0000e+00 1.0000e+00] [8.3230e+01 2.0000e+01 5.8850e+01 2.6650e+03 0.0000e+00 0.0000e+00] [7.1270e+01 2.1000e+01 5.7440e+01 7.5200e+03 0.0000e+00 0.0000e+00] [6.7590e+01 3.1000e+01 1.0077e+02 7.6300e+03 1.0000e+00 1.0000e+00] [5.6790e+01 3.0000e+01 1.0103e+02 1.0148e+04 1.0000e+00 1.0000e+00] [7.7490e+01 3.0000e+01 1.0064e+02 5.4910e+03 1.0000e+00 1.0000e+00] [7.2740e+01 3.1000e+01 1.0256e+02 7.4310e+03 1.0000e+00 1.0000e+00] [7.6600e+01 3.0000e+01 1.0090e+02 1.0525e+04 1.0000e+00 1.0000e+00] [7.3730e+01 3.1000e+01 1.0179e+02 5.3720e+03 1.0000e+00 1.0000e+00] [8.1970e+01 3.0000e+01 1.0256e+02 9.8970e+03 1.0000e+00 1.0000e+00] [8.5390e+01 2.2000e+01 6.0510e+01 5.7080e+03 0.0000e+00 0.0000e+00] [8.0880e+01 3.1000e+01 1.0359e+02 9.2470e+03 1.0000e+00 1.0000e+00] [8.5730e+01 1.9000e+01 5.2820e+01 2.9580e+03 0.0000e+00 0.0000e+00] [8.0360e+01 3.1000e+01 1.0038e+02 7.7820e+03 1.0000e+00 1.0000e+00]]
import pandas as pd
# Parse the markdown table from Large Numbers.pdf (as provided in your KB)
data_lines = [
["Abhishek Subba", "69.10%", 6951, 748, "95.90%", 29, "Gold", 4, 0, 0, "2025-10-22T14:53:12"],
["Abishek Adhikari", "63.71%", 4985, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-08-18T11:21:05"],
["Anjana Subba", "82.00%", 5311, 846, "108.46%", 33, "Diamond", 2, 2, 0, "2025-09-10T13:22:29"],
["Arpan Rai", "82.81%", 5547, 790, "101.28%", 29, "Diamond", 4, 0, 0, "2025-08-09T18:04:17"],
["Arpana Ghimirey", "78.34%", 4773, 509, "65.26%", 21, "Bronze", 1, 0, 0, "2025-10-22T12:40:02"],
["Chimi Dolma Gurung", "70.38%", 4093, 468, "60.00%", 23, "Bronze", 1, 0, 0, "2025-10-01T12:20:50"],
["Dawa Kelwang Keltshok", "73.07%", 4601, 782, "100.26%", 31, "Diamond", 4, 0, 0, "2025-05-20T13:15:02"],
["Jamyang Gurung", "77.57%", 5469, 781, "100.13%", 30, "Diamond", 4, 0, 0, "2025-05-15T20:20:30"],
["Jamyang Tenzin Namgyel", "75.23%", 5180, 797, "102.18%", 30, "Diamond", 2, 3, 0, "2025-09-03T14:34:27"],
["Jigme Tenzin Wangpo", "75.83%", 5037, 782, "100.26%", 30, "Diamond", 2, 0, 0, "2025-10-22T08:31:26"],
["Karma Dema Chokey", "75.55%", 16432, 788, "101.03%", 30, "Diamond", 4, 0, 0, "2025-09-25T13:18:29"],
["Kishan Rai", "79.70%", 4460, 800, "102.56%", 31, "Diamond", 0, 3, 0, "2025-09-29T12:12:10"],
["Kuenga Rinchen", "62.28%", 9502, 451, "57.82%", 22, "Bronze", 1, 0, 0, "2025-08-08T17:23:50"],
["Leki Tshomo", "65.09%", 15455, 782, "100.26%", 30, "Diamond", 1, 3, 0, "2025-11-03T20:48:32"],
["Lhakey Choden", "83.23%", 2665, 459, "58.85%", 20, "Bronze", 0, 2, 0, "2025-09-09T13:46:40"],
["Melan Rai", "71.27%", 7520, 448, "57.44%", 21, "Bronze", 1, 1, 0, "2025-08-28T13:22:57"],
["Mercy Jeshron Subba", "67.59%", 7630, 786, "100.77%", 31, "Diamond", 3, 0, 0, "2025-10-15T15:00:19"],
["Najimul Mia", "56.79%", 10148, 788, "101.03%", 30, "Diamond", 3, 1, 1, "2025-08-29T19:06:48"],
["Nima Kelwang Keltshok", "77.49%", 5491, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-05-13T17:56:59"],
["Radha Dulal", "72.74%", 7431, 800, "102.56%", 31, "Diamond", 3, 1, 0, "2025-09-10T17:06:07"],
["Rigyel Singer", "76.60%", 10525, 787, "100.90%", 30, "Diamond", 0, 4, 1, "2025-10-08T13:28:29"],
["Susil Acharja", "73.73%", 5372, 794, "101.79%", 31, "Diamond", 4, 0, 0, "2025-06-08T19:19:10"],
["Tashi Tshokey Wangmo", "81.97%", 9897, 800, "102.56%", 30, "Diamond", 4, 0, 0, "2025-08-20T12:29:57"],
["Tashi Wangchuk", "85.39%", 5708, 472, "60.51%", 22, "Bronze", 0, 3, 0, "2025-09-08T12:30:39"],
["Tenzin Sonam Dolkar", "80.88%", 9247, 808, "103.59%", 31, "Diamond", 1, 2, 0, "2025-09-29T13:38:06"],
["Yeshey Tshoki", "85.73%", 2958, 412, "52.82%", 19, "Bronze", 1, 0, 0, "2025-08-06T14:36:48"],
["Yogira Kami", "80.36%", 7782, 783, "100.38%", 31, "Diamond", 2, 0, 0, "2025-10-08T13:25:35"],
]
# Create DataFrame with correct column names
df = pd.DataFrame(data_lines, columns=[
"Name", "Accuracy", "Time (total)", "Score (780)", "Score (%)",
"Exercises started", "Trophies", "Easy", "Moderate", "Hard", "Last submission date"
])
# Preview first 5 rows
print(df.head())
Name Accuracy Time (total) Score (780) Score (%) \ 0 Abhishek Subba 69.10% 6951 748 95.90% 1 Abishek Adhikari 63.71% 4985 785 100.64% 2 Anjana Subba 82.00% 5311 846 108.46% 3 Arpan Rai 82.81% 5547 790 101.28% 4 Arpana Ghimirey 78.34% 4773 509 65.26% Exercises started Trophies Easy Moderate Hard Last submission date 0 29 Gold 4 0 0 2025-10-22T14:53:12 1 30 Diamond 4 0 0 2025-08-18T11:21:05 2 33 Diamond 2 2 0 2025-09-10T13:22:29 3 29 Diamond 4 0 0 2025-08-09T18:04:17 4 21 Bronze 1 0 0 2025-10-22T12:40:02