Philippe Libioulle - Fab Futures - Data Science
Home About

< Previous dataset - Week 3 home - Next dataset>

Week 3: fitting - "House Property Sales Time Series" dataset¶

Context¶

  • Source: Kaggle
  • Description: property sales data for the 2007-2019 period for one specific region. The data contains sales prices for houses and units with 1,2,3,4,5 bedrooms. These are the cross-depended variables.

Load dataset¶

In [1]:
import pandas as pd 
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
import random

raw_df = pd.read_csv("datasets/House_Property_Sales_Time_Series.csv", usecols=["datesold", "price","bedrooms"],parse_dates=["datesold"])
df = raw_df[(raw_df['bedrooms'] == 4) & (raw_df['price'] > 1300000)]   # data is filtered with the intent to get a nice cloud to analyze 

# 🧾 Display dataset informations
print("House sales dataset shape:", df.shape)
#print(df.info)
House sales dataset shape: (395, 3)

Explore content¶

In [2]:
df.head()
Out[2]:
datesold price bedrooms
7 2007-04-30 1530000 4
26 2007-07-21 1780000 4
691 2008-12-20 1375000 4
781 2009-01-27 2100000 4
880 2009-02-25 1580000 4

Display a nice chart¶

In [3]:
# Let's display a basic chart
plt.rcParams["figure.figsize"] = (20,9)
plt.plot(df['datesold'], df['price'],'o')
plt.xlabel('Date sold')
plt.ylabel('Price')
plt.show()
No description has been provided for this image

Fitting using radial basis function (RBF)¶

In [24]:
ncenters = 15
x = df['datesold'].to_numpy(dtype='int64') / 10000000000000
#print(x)
xmin = x.min()
print("MinX=", xmin)
xmax = x.max()
print("MaxX=", xmax)
npts = np.count_nonzero(x)
print("CountX=", npts)
y = df['price'].to_numpy(dtype='int64')
#print(y)
ymin = y.min()
print("MinY=", ymin)
ymax = y.max()
print("MaxY=", ymax)
indices = np.random.uniform(low=0,high=len(x),size=ncenters).astype(int) # choose random RBF centers from data
#print("Indices=",indices)
centers = x[indices]
print("Centers=", centers) 
M = np.abs(np.outer(x,np.ones(ncenters)) # construct matrix of basis terms
   -np.outer(np.ones(npts),centers))**3
#print("M=",M)
b,residuals,rank,values = np.linalg.lstsq(M,y) # do SVD fit
xfit = np.linspace(xmin,xmax,npts)
yfit = (np.abs(np.outer(xfit,np.ones(ncenters))-np.outer(np.ones(npts),centers))**3)@b # evaluate fit
#print("yfit=",yfit)
plt.plot(x,y,'o')
plt.plot(xfit,yfit,'g-',label='RBF fit')
plt.plot(xfit,(xfit-centers[0])**3,'b-',label='$r^3$ basis functions')
#print((xfit-centers[0])**3)
for i in range(ncenters):
    plt.plot(xfit,np.abs(xfit-centers[i])**3,color=(0.75,0.75,0.75))
plt.ylim(0,8100000)
plt.legend()
plt.show()
MinX= 117789.12
MaxX= 156098.88
CountX= 395
MinY= 1305000
MaxY= 8000000
Centers= [141557.76 126740.16 150914.88 151096.32 139285.44 145938.24 150595.2
 151182.72 144780.48 142344.   149947.2  138542.4  144037.44 155122.56
 151295.04]
No description has been provided for this image
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: