FAB Futures - Data Science
Home About

Research > Cluster Weighted Modeling¶

Tutorial 1¶

4 Basic Types of Cluster Analysis used in Data Analytics

Notes:

Centroid Clustering - Choose Number of Clusters (segmentation categories?)
- Determine a Centroid for each defined cluster
- Assign Data Points to a centroid, based on proximity
- Generate distinct Clusters - Recenter the centroid once the clusters are formed (?)

Density Clustering
- Group Data Points based on their proximity to one another...distance from one point to another
- The more dense a grouping...the more likely that they belong in the same cluster
- Density clustering is able to define odd-shaped clusters that Centroid Clustering method would have been unable to identify

Distribution Clustering
- Looks at the probability that a Data Point belongs to a cluster
- Choose Number of Clusters
- Determine a Centroid for each defined cluster
- Distance from Centroid determines the probability of belonging to one of the cluster

Connectivity Clustering
- Each Data Point starts as their own cluster
- Determine how much one data point is related to another data point...based on the data point's 'behavior' or 'characteristics' or 'features'
- Ultimately depends on the desired number of clusters

Bayesian Optimization¶

I found this short explanation useful. This video was also super helpful.

Joint Probability = The probability of 2 events occurring together.

Bayesian Network

Bayesian Optimization

Bayesian Optimization

Bayesian Hyperparameter Optimization with Python

K Means¶

K Means Clustering