Clustering Steps
- Initialization: Place k centroids. This demo samples initial centers from data points.
- Assignment: Assign each point to the nearest centroid (Euclidean distance).
\( c(i) = \arg\min_j \\|x_i - \mu_j\\|^2 \)
- Update: Recompute each centroid as the mean of its assigned points; empty clusters are re-seeded randomly.
- Convergence: Stop when centroids no longer move or assignments stop changing.
Notation: \(c(i)\)
\(c(i)\) is the index (label) of the cluster for point \(x_i\). Math often uses \(1..k\); this demo uses \(0..k-1\). \(\mu_{c(i)}\) is the assigned centroid.
What is SSE?
Measures within-cluster variation: how far points are from their assigned centroids.
Formula: \( \mathrm{SSE} = \sum_i \\|x_i - \mu_{c(i)}\\|^2 \) — sum of squared distances within clusters.
Properties: non-increasing over iterations; decreases as k increases.
Purpose: quantify compactness of clusters (a.k.a. inertia).
Not separating well? Re-initialize centroids or adjust k (initialization matters).