Dawa Tshering - Fab Futures - Data Science
Home About

Assignment : Machine LearningΒΆ

scikit-learnΒΆ

explain this code: from sklearn.neural_network import MLPClassifier import numpy as np X = [[0,0],[0,1],[1,0],[1,1]] y = [0,1,1,0] classifier = MLPClassifier(solver='lbfgs',hidden_layer_sizes=(4),activation='tanh',random_state=1) classifier.fit(X,y) print(f"score: {classifier.score(X,y)}") print("Predictions:") np.c_[X,classifier.predict(X)]

The response that i got form ChatGPT was:

βœ… Full Code Explanation (Simple Terms)

from sklearn.neural_network import MLPClassifier

import numpy as np

You import a neural network model (MLPClassifier).

You import NumPy for handling arrays.

X = [[0,0],[0,1],[1,0],[1,1]] y = [0,1,1,0]

X = your input data Each pair represents two inputs. Example: [0, 1]

y = your output labels Example: input [0,1] β†’ output 1

πŸ’‘ These data points represent the XOR problem:

0 XOR 0 = 0

0 XOR 1 = 1

1 XOR 0 = 1

1 XOR 1 = 0

classifier = MLPClassifier(solver='lbfgs', hidden_layer_sizes=(4), activation='tanh', random_state=1)

This creates a neural network classifier with:

solver='lbfgs' β†’ algorithm used to train the model

hidden_layer_sizes=(4) β†’ one hidden layer with 4 neurons

activation='tanh' β†’ type of function in each neuron

random_state=1 β†’ makes results reproducible

πŸ‘‰ This neural network will try to learn the XOR pattern.

classifier.fit(X, y)

This trains the neural network.

The model learns the relationship between inputs (X) and outputs (y).

πŸ‘‰ After this, the model knows how XOR works.

print(f"score: {classifier.score(X,y)}")

This prints the accuracy of the model.

Since XOR has 4 items, and neural network learns perfectly, score = 1.0 (100% correct).

print("Predictions:")

np.c_[X, classifier.predict(X)]

classifier.predict(X) β†’ model predicts outputs for each input

np.c_[] β†’ concatenates input and predicted output side-by-side

You get a table like:

input1 input2 prediction 0 0 0 0 1 1 1 0 1 1 1 0

πŸ‘‰ This shows that the neural network has learned XOR.

⭐ Super Simple Summary

You gave the neural network data for XOR.

It learned the XOR rule.

It predicts correctly.

You print its accuracy and predictions.

what is sklearn?

Scikit-learn, often referred to as sklearn, is a free and open-source machine learning library for Python. It provides a wide range of algorithms and tools for various machine learning tasks, making it a fundamental library for data scientists and machine learning practitioners. What is MLPClassifier?

A Multilayer Perceptron (MLP) classifier is a type of artificial neural network (ANN) used for classification tasks. It is a fundamental deep learning model that learns to map input data to output categories by training on a dataset using a process called backpropagation.

In this classifier, it is structured into layers with neurons (node) with multilayers where the nodes are connected to the hidden layer such as:

Input Layer : Initial Data or features Hidden Layers : One or more layers positioned between input and output layer Output Layer : The final layer will produce the prediction.

InΒ [4]:
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# βœ… Load & parse data from Large Numbers.pdf
data = [
    ["Abhishek Subba", "69.10%", 6951, 748, "95.90%", 29, "Gold", 4, 0, 0, "2025-10-22T14:53:12"],
    ["Abishek Adhikari", "63.71%", 4985, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-08-18T11:21:05"],
    ["Anjana Subba", "82.00%", 5311, 846, "108.46%", 33, "Diamond", 2, 2, 0, "2025-09-10T13:22:29"],
    ["Arpan Rai", "82.81%", 5547, 790, "101.28%", 29, "Diamond", 4, 0, 0, "2025-08-09T18:04:17"],
    ["Arpana Ghimirey", "78.34%", 4773, 509, "65.26%", 21, "Bronze", 1, 0, 0, "2025-10-22T12:40:02"],
    ["Chimi Dolma Gurung", "70.38%", 4093, 468, "60.00%", 23, "Bronze", 1, 0, 0, "2025-10-01T12:20:50"],
    ["Dawa Kelwang Keltshok", "73.07%", 4601, 782, "100.26%", 31, "Diamond", 4, 0, 0, "2025-05-20T13:15:02"],
    ["Jamyang Gurung", "77.57%", 5469, 781, "100.13%", 30, "Diamond", 4, 0, 0, "2025-05-15T20:20:30"],
    ["Jamyang Tenzin Namgyel", "75.23%", 5180, 797, "102.18%", 30, "Diamond", 2, 3, 0, "2025-09-03T14:34:27"],
    ["Jigme Tenzin Wangpo", "75.83%", 5037, 782, "100.26%", 30, "Diamond", 2, 0, 0, "2025-10-22T08:31:26"],
    ["Karma Dema Chokey", "75.55%", 16432, 788, "101.03%", 30, "Diamond", 4, 0, 0, "2025-09-25T13:18:29"],
    ["Kishan Rai", "79.70%", 4460, 800, "102.56%", 31, "Diamond", 0, 3, 0, "2025-09-29T12:12:10"],
    ["Kuenga Rinchen", "62.28%", 9502, 451, "57.82%", 22, "Bronze", 1, 0, 0, "2025-08-08T17:23:50"],
    ["Leki Tshomo", "65.09%", 15455, 782, "100.26%", 30, "Diamond", 1, 3, 0, "2025-11-03T20:48:32"],
    ["Lhakey Choden", "83.23%", 2665, 459, "58.85%", 20, "Bronze", 0, 2, 0, "2025-09-09T13:46:40"],
    ["Melan Rai", "71.27%", 7520, 448, "57.44%", 21, "Bronze", 1, 1, 0, "2025-08-28T13:22:57"],
    ["Mercy Jeshron Subba", "67.59%", 7630, 786, "100.77%", 31, "Diamond", 3, 0, 0, "2025-10-15T15:00:19"],
    ["Najimul Mia", "56.79%", 10148, 788, "101.03%", 30, "Diamond", 3, 1, 1, "2025-08-29T19:06:48"],
    ["Nima Kelwang Keltshok", "77.49%", 5491, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-05-13T17:56:59"],
    ["Radha Dulal", "72.74%", 7431, 800, "102.56%", 31, "Diamond", 3, 1, 0, "2025-09-10T17:06:07"],
    ["Rigyel Singer", "76.60%", 10525, 787, "100.90%", 30, "Diamond", 0, 4, 1, "2025-10-08T13:28:29"],
    ["Susil Acharja", "73.73%", 5372, 794, "101.79%", 31, "Diamond", 4, 0, 0, "2025-06-08T19:19:10"],
    ["Tashi Tshokey Wangmo", "81.97%", 9897, 800, "102.56%", 30, "Diamond", 4, 0, 0, "2025-08-20T12:29:57"],
    ["Tashi Wangchuk", "85.39%", 5708, 472, "60.51%", 22, "Bronze", 0, 3, 0, "2025-09-08T12:30:39"],
    ["Tenzin Sonam Dolkar", "80.88%", 9247, 808, "103.59%", 31, "Diamond", 1, 2, 0, "2025-09-29T13:38:06"],
    ["Yeshey Tshoki", "85.73%", 2958, 412, "52.82%", 19, "Bronze", 1, 0, 0, "2025-08-06T14:36:48"],
    ["Yogira Kami", "80.36%", 7782, 783, "100.38%", 31, "Diamond", 2, 0, 0, "2025-10-08T13:25:35"],
]

df = pd.DataFrame(data, columns=[
    "Name","Accuracy","Time (total)","Score (780)","Score (%)",
    "Exercises started","Trophies","Easy","Moderate","Hard","Last submission date"
])

# βœ… Clean and prepare
df["Accuracy"] = df["Accuracy"].str.rstrip('%').astype(float)
df["Score (%)"] = df["Score (%)"].str.rstrip('%').astype(float)

# Features: numerical performance indicators
X = df[["Accuracy", "Exercises started", "Score (%)", "Time (total)"]].values
# Target: 1 if Diamond, 0 otherwise (Bronze/Gold β†’ not Diamond)
y = (df["Trophies"] == "Diamond").astype(int).values

# βœ… Scale features (important for neural nets)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# βœ… Train MLP: 2 hidden layers (4, 3 neurons), tanh activation
mlp = MLPClassifier(
    solver='lbfgs',
    hidden_layer_sizes=(4, 3),
    activation='tanh',
    random_state=1,
    max_iter=1000
)

mlp.fit(X_scaled, y)

# βœ… Evaluate
train_score = mlp.score(X_scaled, y)
print(f"Training Accuracy: {train_score:.4f}")

# βœ… Show predictions vs actual
predictions = mlp.predict(X_scaled)
result = np.c_[X, y, predictions]
print("\n[Accuracy, Exercises, Score%, Time, Actual (1=Diamond), Pred]")
print(result.round(2))
Training Accuracy: 1.0000

[Accuracy, Exercises, Score%, Time, Actual (1=Diamond), Pred]
[[6.9100e+01 2.9000e+01 9.5900e+01 6.9510e+03 0.0000e+00 0.0000e+00]
 [6.3710e+01 3.0000e+01 1.0064e+02 4.9850e+03 1.0000e+00 1.0000e+00]
 [8.2000e+01 3.3000e+01 1.0846e+02 5.3110e+03 1.0000e+00 1.0000e+00]
 [8.2810e+01 2.9000e+01 1.0128e+02 5.5470e+03 1.0000e+00 1.0000e+00]
 [7.8340e+01 2.1000e+01 6.5260e+01 4.7730e+03 0.0000e+00 0.0000e+00]
 [7.0380e+01 2.3000e+01 6.0000e+01 4.0930e+03 0.0000e+00 0.0000e+00]
 [7.3070e+01 3.1000e+01 1.0026e+02 4.6010e+03 1.0000e+00 1.0000e+00]
 [7.7570e+01 3.0000e+01 1.0013e+02 5.4690e+03 1.0000e+00 1.0000e+00]
 [7.5230e+01 3.0000e+01 1.0218e+02 5.1800e+03 1.0000e+00 1.0000e+00]
 [7.5830e+01 3.0000e+01 1.0026e+02 5.0370e+03 1.0000e+00 1.0000e+00]
 [7.5550e+01 3.0000e+01 1.0103e+02 1.6432e+04 1.0000e+00 1.0000e+00]
 [7.9700e+01 3.1000e+01 1.0256e+02 4.4600e+03 1.0000e+00 1.0000e+00]
 [6.2280e+01 2.2000e+01 5.7820e+01 9.5020e+03 0.0000e+00 0.0000e+00]
 [6.5090e+01 3.0000e+01 1.0026e+02 1.5455e+04 1.0000e+00 1.0000e+00]
 [8.3230e+01 2.0000e+01 5.8850e+01 2.6650e+03 0.0000e+00 0.0000e+00]
 [7.1270e+01 2.1000e+01 5.7440e+01 7.5200e+03 0.0000e+00 0.0000e+00]
 [6.7590e+01 3.1000e+01 1.0077e+02 7.6300e+03 1.0000e+00 1.0000e+00]
 [5.6790e+01 3.0000e+01 1.0103e+02 1.0148e+04 1.0000e+00 1.0000e+00]
 [7.7490e+01 3.0000e+01 1.0064e+02 5.4910e+03 1.0000e+00 1.0000e+00]
 [7.2740e+01 3.1000e+01 1.0256e+02 7.4310e+03 1.0000e+00 1.0000e+00]
 [7.6600e+01 3.0000e+01 1.0090e+02 1.0525e+04 1.0000e+00 1.0000e+00]
 [7.3730e+01 3.1000e+01 1.0179e+02 5.3720e+03 1.0000e+00 1.0000e+00]
 [8.1970e+01 3.0000e+01 1.0256e+02 9.8970e+03 1.0000e+00 1.0000e+00]
 [8.5390e+01 2.2000e+01 6.0510e+01 5.7080e+03 0.0000e+00 0.0000e+00]
 [8.0880e+01 3.1000e+01 1.0359e+02 9.2470e+03 1.0000e+00 1.0000e+00]
 [8.5730e+01 1.9000e+01 5.2820e+01 2.9580e+03 0.0000e+00 0.0000e+00]
 [8.0360e+01 3.1000e+01 1.0038e+02 7.7820e+03 1.0000e+00 1.0000e+00]]
InΒ [6]:
import pandas as pd

# Parse the markdown table from Large Numbers.pdf (as provided in your KB)
data_lines = [
    ["Abhishek Subba", "69.10%", 6951, 748, "95.90%", 29, "Gold", 4, 0, 0, "2025-10-22T14:53:12"],
    ["Abishek Adhikari", "63.71%", 4985, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-08-18T11:21:05"],
    ["Anjana Subba", "82.00%", 5311, 846, "108.46%", 33, "Diamond", 2, 2, 0, "2025-09-10T13:22:29"],
    ["Arpan Rai", "82.81%", 5547, 790, "101.28%", 29, "Diamond", 4, 0, 0, "2025-08-09T18:04:17"],
    ["Arpana Ghimirey", "78.34%", 4773, 509, "65.26%", 21, "Bronze", 1, 0, 0, "2025-10-22T12:40:02"],
    ["Chimi Dolma Gurung", "70.38%", 4093, 468, "60.00%", 23, "Bronze", 1, 0, 0, "2025-10-01T12:20:50"],
    ["Dawa Kelwang Keltshok", "73.07%", 4601, 782, "100.26%", 31, "Diamond", 4, 0, 0, "2025-05-20T13:15:02"],
    ["Jamyang Gurung", "77.57%", 5469, 781, "100.13%", 30, "Diamond", 4, 0, 0, "2025-05-15T20:20:30"],
    ["Jamyang Tenzin Namgyel", "75.23%", 5180, 797, "102.18%", 30, "Diamond", 2, 3, 0, "2025-09-03T14:34:27"],
    ["Jigme Tenzin Wangpo", "75.83%", 5037, 782, "100.26%", 30, "Diamond", 2, 0, 0, "2025-10-22T08:31:26"],
    ["Karma Dema Chokey", "75.55%", 16432, 788, "101.03%", 30, "Diamond", 4, 0, 0, "2025-09-25T13:18:29"],
    ["Kishan Rai", "79.70%", 4460, 800, "102.56%", 31, "Diamond", 0, 3, 0, "2025-09-29T12:12:10"],
    ["Kuenga Rinchen", "62.28%", 9502, 451, "57.82%", 22, "Bronze", 1, 0, 0, "2025-08-08T17:23:50"],
    ["Leki Tshomo", "65.09%", 15455, 782, "100.26%", 30, "Diamond", 1, 3, 0, "2025-11-03T20:48:32"],
    ["Lhakey Choden", "83.23%", 2665, 459, "58.85%", 20, "Bronze", 0, 2, 0, "2025-09-09T13:46:40"],
    ["Melan Rai", "71.27%", 7520, 448, "57.44%", 21, "Bronze", 1, 1, 0, "2025-08-28T13:22:57"],
    ["Mercy Jeshron Subba", "67.59%", 7630, 786, "100.77%", 31, "Diamond", 3, 0, 0, "2025-10-15T15:00:19"],
    ["Najimul Mia", "56.79%", 10148, 788, "101.03%", 30, "Diamond", 3, 1, 1, "2025-08-29T19:06:48"],
    ["Nima Kelwang Keltshok", "77.49%", 5491, 785, "100.64%", 30, "Diamond", 4, 0, 0, "2025-05-13T17:56:59"],
    ["Radha Dulal", "72.74%", 7431, 800, "102.56%", 31, "Diamond", 3, 1, 0, "2025-09-10T17:06:07"],
    ["Rigyel Singer", "76.60%", 10525, 787, "100.90%", 30, "Diamond", 0, 4, 1, "2025-10-08T13:28:29"],
    ["Susil Acharja", "73.73%", 5372, 794, "101.79%", 31, "Diamond", 4, 0, 0, "2025-06-08T19:19:10"],
    ["Tashi Tshokey Wangmo", "81.97%", 9897, 800, "102.56%", 30, "Diamond", 4, 0, 0, "2025-08-20T12:29:57"],
    ["Tashi Wangchuk", "85.39%", 5708, 472, "60.51%", 22, "Bronze", 0, 3, 0, "2025-09-08T12:30:39"],
    ["Tenzin Sonam Dolkar", "80.88%", 9247, 808, "103.59%", 31, "Diamond", 1, 2, 0, "2025-09-29T13:38:06"],
    ["Yeshey Tshoki", "85.73%", 2958, 412, "52.82%", 19, "Bronze", 1, 0, 0, "2025-08-06T14:36:48"],
    ["Yogira Kami", "80.36%", 7782, 783, "100.38%", 31, "Diamond", 2, 0, 0, "2025-10-08T13:25:35"],
]

# Create DataFrame with correct column names
df = pd.DataFrame(data_lines, columns=[
    "Name", "Accuracy", "Time (total)", "Score (780)", "Score (%)",
    "Exercises started", "Trophies", "Easy", "Moderate", "Hard", "Last submission date"
])

# Preview first 5 rows
print(df.head())
               Name Accuracy  Time (total)  Score (780) Score (%)  \
0    Abhishek Subba   69.10%          6951          748    95.90%   
1  Abishek Adhikari   63.71%          4985          785   100.64%   
2      Anjana Subba   82.00%          5311          846   108.46%   
3         Arpan Rai   82.81%          5547          790   101.28%   
4   Arpana Ghimirey   78.34%          4773          509    65.26%   

   Exercises started Trophies  Easy  Moderate  Hard Last submission date  
0                 29     Gold     4         0     0  2025-10-22T14:53:12  
1                 30  Diamond     4         0     0  2025-08-18T11:21:05  
2                 33  Diamond     2         2     0  2025-09-10T13:22:29  
3                 29  Diamond     4         0     0  2025-08-09T18:04:17  
4                 21   Bronze     1         0     0  2025-10-22T12:40:02  
InΒ [Β ]: