[Your-Name-Here] - Fab Futures - Data Science
Home About

Machine Learning¶

What is my data, and what did I do in the fitting assignment?¶

My data is Bitcoin historical data, focused on the daily closing price, and the goal is to see if I can fit a model to it to predict the future prices.¶

Every day, the Bitcoin price goes up and down, and if you look at every single day, it feels noisy and chaotic.¶

To find a pattern, we are looking at 5 years of BTC data and closing prices to see big patterns. So with the Gaussian Smoothing, we remove the daily noise, and we focus on the long-term trend.¶

What is the goal of the Machine Learning Model for this data?¶

The goal is to fit this data to a machine learning model that will act like a BTC investor assistant.¶

So to do that:¶

* We show the ML model the smoothed red line
* It will learn the general up-down cycle
* Then it tries to guess

Let us begin with a simple model and then advance to complex models¶

Stage 1: Basic Linear Regression¶

In [1]:
# Load the data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d
from sklearn.linear_model import LinearRegression

df = pd.read_csv("datasets/BTC_USD_full_data.csv")

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df = df.dropna(subset=['Date', 'Close'])

df['DayIndex'] = (df['Date'] - df['Date'].min()).dt.days

sigma = 20
df['Smooth'] = gaussian_filter1d(df['Close'], sigma=sigma)

X = df['DayIndex'].values.reshape(-1, 1)
y = df['Smooth'].values
In [2]:
model = LinearRegression()
model.fit(X, y)

df['LR_Past'] = model.predict(X)

plt.figure(figsize=(14,6))
plt.plot(df['Date'], df['Smooth'], label="Smoothed", color='red')
plt.plot(df['Date'], df['LR_Past'], label="LR Fit", color='green')
plt.title("Baseline Linear Regression Fit")
plt.legend()
plt.show()
No description has been provided for this image
In [3]:
try:
    future_days = 30
    last_day = df['DayIndex'].max()

    future_index = np.arange(last_day + 1, last_day + future_days + 1).reshape(-1,1)

    future_pred = model.predict(future_index)

    future_df = pd.DataFrame({
        "Date": pd.date_range(start=df['Date'].max(), periods=future_days+1, closed='right'),
        "LR_Forecast": future_pred
    })

    plt.figure(figsize=(14,6))
    plt.plot(df["Date"], df["Smooth"], label="Smoothed", color='red')
    plt.plot(df["Date"], df["LR_Past"], label="LR Fit", color='green')
    plt.plot(future_df["Date"], future_df["LR_Forecast"], label="LR Forecast", color='blue')
    plt.title("Linear Regression Forecast (Next 30 Days)")
    plt.legend()
    plt.show()

    print("Stage 1 Successful: Linear Regression Forecast Generated.")

except Exception as e:
    print("Stage 1 Failed, moving to Stage 2...")
    print(e)
Stage 1 Failed, moving to Stage 2...
DatetimeArray._generate_range() got an unexpected keyword argument 'closed'

In the Basic Linear Regression, the model can only work with the x-values for which it is trained.¶

But the goal is to be able to predict, which means to be able to work on new x-values by looking at the previous trend¶

Linear Regression Forecasting¶

In [10]:
from sklearn.metrics import mean_squared_error
import math

# Split train/test
split = int(len(df) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

model2 = LinearRegression()
model2.fit(X_train, y_train)

y_pred_test = model2.predict(X_test)

rmse = math.sqrt(mean_squared_error(y_test, y_pred_test))
print("Stage 2 RMSE:", rmse)

plt.figure(figsize=(14,6))
plt.plot(df["Date"], y, label="True Smoothed Data", color='red' )
plt.plot(df["Date"].iloc[split:], y_pred_test, label="LR Test Predictions", color='orange')
plt.title("Stage 2: Linear Regression Test Performance")
plt.legend()
plt.show()
Stage 2 RMSE: 50023.91259288457
No description has been provided for this image

As you can see, when using this, the predictions are made, but they are way off from the original smoothed data¶

In [5]:
!pip install tensorflow
!pip install --upgrade protobuf
Requirement already satisfied: tensorflow in /opt/conda/lib/python3.13/site-packages (2.20.0)
Requirement already satisfied: absl-py>=1.0.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (2.3.1)
Requirement already satisfied: astunparse>=1.6.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (1.6.3)
Requirement already satisfied: flatbuffers>=24.3.25 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (25.9.23)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (0.7.0)
Requirement already satisfied: google_pasta>=0.1.1 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (0.2.0)
Requirement already satisfied: libclang>=13.0.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (18.1.1)
Requirement already satisfied: opt_einsum>=2.3.2 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (3.4.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.13/site-packages (from tensorflow) (25.0)
Requirement already satisfied: protobuf>=5.28.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (6.33.1)
Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (2.32.5)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.13/site-packages (from tensorflow) (80.9.0)
Requirement already satisfied: six>=1.12.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (1.17.0)
Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (3.2.0)
Requirement already satisfied: typing_extensions>=3.6.6 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (4.15.0)
Requirement already satisfied: wrapt>=1.11.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (2.0.1)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (1.76.0)
Requirement already satisfied: tensorboard~=2.20.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (2.20.0)
Requirement already satisfied: keras>=3.10.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (3.12.0)
Requirement already satisfied: numpy>=1.26.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (2.3.3)
Requirement already satisfied: h5py>=3.11.0 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (3.15.1)
Requirement already satisfied: ml_dtypes<1.0.0,>=0.5.1 in /opt/conda/lib/python3.13/site-packages (from tensorflow) (0.5.4)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.13/site-packages (from requests<3,>=2.21.0->tensorflow) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.13/site-packages (from requests<3,>=2.21.0->tensorflow) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.13/site-packages (from requests<3,>=2.21.0->tensorflow) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.13/site-packages (from requests<3,>=2.21.0->tensorflow) (2025.10.5)
Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.13/site-packages (from tensorboard~=2.20.0->tensorflow) (3.10)
Requirement already satisfied: pillow in /opt/conda/lib/python3.13/site-packages (from tensorboard~=2.20.0->tensorflow) (11.3.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/conda/lib/python3.13/site-packages (from tensorboard~=2.20.0->tensorflow) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /opt/conda/lib/python3.13/site-packages (from tensorboard~=2.20.0->tensorflow) (3.1.4)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /opt/conda/lib/python3.13/site-packages (from astunparse>=1.6.0->tensorflow) (0.45.1)
Requirement already satisfied: rich in /opt/conda/lib/python3.13/site-packages (from keras>=3.10.0->tensorflow) (14.2.0)
Requirement already satisfied: namex in /opt/conda/lib/python3.13/site-packages (from keras>=3.10.0->tensorflow) (0.1.0)
Requirement already satisfied: optree in /opt/conda/lib/python3.13/site-packages (from keras>=3.10.0->tensorflow) (0.18.0)
Requirement already satisfied: markupsafe>=2.1.1 in /opt/conda/lib/python3.13/site-packages (from werkzeug>=1.0.1->tensorboard~=2.20.0->tensorflow) (3.0.3)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.13/site-packages (from rich->keras>=3.10.0->tensorflow) (4.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.13/site-packages (from rich->keras>=3.10.0->tensorflow) (2.19.2)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.13/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.10.0->tensorflow) (0.1.2)
Requirement already satisfied: protobuf in /opt/conda/lib/python3.13/site-packages (6.33.1)

LSTM¶

In [6]:
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

prices = df['Smooth'].values.reshape(-1,1)

scaler = MinMaxScaler()
scaled = scaler.fit_transform(prices)

# Create sequences (e.g., 60-day windows)
window = 60

X_lstm = []
y_lstm = []

for i in range(window, len(scaled)):
    X_lstm.append(scaled[i-window:i])
    y_lstm.append(scaled[i])

X_lstm = np.array(X_lstm)
y_lstm = np.array(y_lstm)
In [ ]:
model_lstm = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_lstm.shape[1], 1)),
    LSTM(50),
    Dense(1)
])

model_lstm.compile(optimizer="adam", loss="mse")

history = model_lstm.fit(X_lstm, y_lstm, epochs=20, batch_size=32, verbose=1)
Epoch 1/20
/opt/conda/lib/python3.13/site-packages/keras/src/layers/rnn/rnn.py:199: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
55/55 ━━━━━━━━━━━━━━━━━━━━ 5s 49ms/step - loss: 0.0153
Epoch 2/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 48ms/step - loss: 4.2011e-04
Epoch 3/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 48ms/step - loss: 3.2951e-04
Epoch 4/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 2.9605e-04
Epoch 5/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 2.5285e-04
Epoch 6/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 2.1249e-04
Epoch 7/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 50ms/step - loss: 1.6734e-04
Epoch 8/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 48ms/step - loss: 1.1370e-04
Epoch 9/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 8.6034e-05
Epoch 10/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 6.7202e-05
Epoch 11/20
55/55 ━━━━━━━━━━━━━━━━━━━━ 3s 49ms/step - loss: 5.6006e-05
Epoch 12/20
15/55 ━━━━━━━━━━━━━━━━━━━━ 1s 48ms/step - loss: 4.7401e-05
In [11]:
future_input = scaled[-window:]

future_predictions = []

for _ in range(30):
    pred = model_lstm.predict(future_input.reshape(1,window,1))[0]
    future_predictions.append(pred[0])
    future_input = np.append(future_input[1:], pred).reshape(window,1)

future_prices = scaler.inverse_transform(np.array(future_predictions).reshape(-1,1))

future_dates = pd.date_range(start=df['Date'].max() + pd.Timedelta(days=1), periods=30)

plt.figure(figsize=(14,6))
plt.plot(df["Date"], df["Smooth"], label="Smoothed Historical", color='red')
plt.plot(future_dates, future_prices, label="LSTM Forecast", color='purple')
plt.title("LSTM Forecast (Next 30 Days)")
plt.legend()
plt.show()

print("Stage 3 Completed: LSTM Forecast Generated.")
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 38ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step
No description has been provided for this image
Stage 3 Completed: LSTM Forecast Generated.

As seen in the graph, my code only predicts the next 30 days using the last window of historical data. The model never predicts on the test portion of the existing data, so the first 30 predictions start after the last historical date. That’s why the purple line is disconnected from the historical trend.¶

In [12]:
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#Prepare data
prices = df['Smooth'].values.reshape(-1,1)

scaler = MinMaxScaler()
scaled = scaler.fit_transform(prices)

#Train/test split
split = int(len(scaled) * 0.8)
train_scaled = scaled[:split]
test_scaled = scaled[split:]

window = 60

#Create sequences for LSTM
def create_sequences(data, window):
    X = []
    y = []
    for i in range(window, len(data)):
        X.append(data[i-window:i])
        y.append(data[i])
    return np.array(X), np.array(y)

X_train, y_train = create_sequences(train_scaled, window)
X_test, y_test = create_sequences(test_scaled, window)

#Build & train LSTM
model_lstm = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)),
    LSTM(50),
    Dense(1)
])
model_lstm.compile(optimizer="adam", loss="mse")
history = model_lstm.fit(X_train, y_train, epochs=20, batch_size=32, verbose=1)

# Predict on test set
y_pred_scaled = model_lstm.predict(X_test, verbose=0)
y_pred = scaler.inverse_transform(y_pred_scaled)
y_test_actual = scaler.inverse_transform(y_test)

# Create dates for test set
test_dates = df['Date'].iloc[split + window:].reset_index(drop=True)

# Create table of actual vs predicted
results_df = pd.DataFrame({
    "Date": test_dates,
    "Actual": y_test_actual.flatten(),
    "Predicted": y_pred.flatten()
})

print(results_df.head())

# Plot test predictions vs historical trend
plt.figure(figsize=(14,6))
plt.plot(df['Date'], df['Smooth'], label="Historical Smoothed Data", color='red')
plt.plot(test_dates, y_pred, label="LSTM Predicted (Test)", color='purple')
plt.title("LSTM: Test Predictions vs Actual")
plt.xlabel("Date")
plt.ylabel("BTC Price (USD)")
plt.legend()
plt.show()
Epoch 1/20
/opt/conda/lib/python3.13/site-packages/keras/src/layers/rnn/rnn.py:199: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
44/44 ━━━━━━━━━━━━━━━━━━━━ 5s 49ms/step - loss: 0.0109
Epoch 2/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 48ms/step - loss: 2.4973e-04
Epoch 3/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 1.6991e-04
Epoch 4/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 1.2698e-04
Epoch 5/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 7.8550e-05
Epoch 6/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 4.3725e-05
Epoch 7/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 3.8212e-05
Epoch 8/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 48ms/step - loss: 4.2404e-05
Epoch 9/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 2.5922e-05
Epoch 10/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 6.9658e-05
Epoch 11/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 2.4112e-05
Epoch 12/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 1.7959e-05
Epoch 13/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 2.1017e-05
Epoch 14/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 48ms/step - loss: 1.8248e-05
Epoch 15/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 2.2219e-05
Epoch 16/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 50ms/step - loss: 1.2086e-05
Epoch 17/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 48ms/step - loss: 1.4268e-05
Epoch 18/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 1.0890e-05
Epoch 19/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 50ms/step - loss: 1.1845e-05
Epoch 20/20
44/44 ━━━━━━━━━━━━━━━━━━━━ 2s 49ms/step - loss: 1.5910e-05
        Date        Actual     Predicted
0 2025-02-05  96705.526548  98733.085938
1 2025-02-06  96507.887013  98541.242188
2 2025-02-07  96300.370815  98336.484375
3 2025-02-08  96083.286930  98119.078125
4 2025-02-09  95857.011925  97889.390625
No description has been provided for this image
In [ ]:
### References
1. StatQuest with Josh Starmer. (2018). Linear regression, clearly explained!!! [Video]. YouTube.
https://www.youtube.com/watch?v=nk2CQITm_eo
2. GeeksforGeeks. (n.d.). Linear regression.
https://www.geeksforgeeks.org/linear-regression-python-implementation/
3. Steve Brunton. (2020). Radial basis functions (RBFs) [Video]. YouTube.
https://www.youtube.com/watch?v=Oq9xYw6PZ5Y
4. StatQuest with Josh Starmer. (2018). Long short-term memory (LSTM) networks [Video]. YouTube.
https://www.youtube.com/watch?v=8HyCNIVRbSU