How K-Means Compression Works

Understanding the algorithm behind intelligent color quantization

The Big Picture

Images typically use 24-bit color (8 bits each for Red, Green, and Blue), allowing for over 16 million unique colors. However, most images don't need all these colors to look visually appealing.

K-Means compression reduces the number of colors to a smaller palette (e.g., 16 or 32 colors), significantly reducing file size while maintaining visual quality.

Compression Example

A 128×128 image with 24-bit color uses 393,216 bits. With 16 colors, it uses only 65,920 bits — a 6x reduction!

The K-Means Algorithm

1

Initialize Centroids

Randomly select K colors from the image as initial cluster centers (centroids). We use K-Means++ for smarter initialization.

centroids = init_centroids(pixels, K)
2

Assign Pixels to Clusters

For each pixel, find the centroid with the smallest Euclidean distance in RGB color space.

c[i] = argmin(||x[i] - μ[j]||²)
3

Update Centroids

Recalculate each centroid as the mean color of all pixels assigned to it.

μ[k] = (1/|C[k]|) × Σ x[i]
4

Iterate Until Convergence

Repeat steps 2-3 until centroids stop moving significantly or max iterations reached.

while (shift > tolerance && iter < max_iters)
5

Reconstruct Image

Replace each pixel with the color of its assigned centroid to create the compressed image.

compressed[i] = centroids[assignment[i]]

The Mathematics

Distance Calculation

d(x, μ) = √[(R₁-R₂)² + (G₁-G₂)² + (B₁-B₂)²]

Euclidean distance in 3D RGB color space

Centroid Update

μₖ = (1/m) × Σ xᵢ for all xᵢ ∈ Cₖ

Mean of all points in cluster k

Objective Function

J = Σᵢ Σₖ wᵢₖ × ||xᵢ - μₖ||²

Minimize within-cluster sum of squares

K-Means++ Initialization

We use K-Means++ for smarter centroid initialization, which leads to faster convergence and better results.

1 Choose the first centroid randomly from data points
2 For each point, compute distance to nearest centroid
3 Select next centroid with probability proportional to distance²
4 Repeat until K centroids are chosen

Understanding Trade-offs

Fewer Colors (K=4-8)

  • Smaller file size
  • Faster processing
  • More visible banding
  • Loss of detail

Balanced (K=16-32)

  • Good compression ratio
  • Acceptable quality
  • Moderate processing time
  • Best for most use cases

More Colors (K=64-128)

  • Better visual quality
  • Preserves gradients
  • Larger file size
  • Slower processing

Ready to Try It?

Experience the K-Means compression algorithm in action!

Compress an Image