Clustering

Statistical Clustering.k-Means .*;

k-Means: Pattern Recognition

k-Means is mildly interesting by itself, but what else can it do? How about pattern recognition?

It's the centroids that make k-Means interesting. In the last two examples, the centroids were continually adjusted until an equilibrium was found. At the point of equilibrium, the centroids became a unique signature representing the data points in each cluster.

Suppose we use the same character samples from previous illustrations:

Figure 1. The letter "A" formed in a grid with 1's

Then we interpret the 1's as grid coordinates:

From this sample, we'd have a set with these coordinate pairs: [(2,0), (3,0), (3,1), (3,2), (2,3), (4,3), (2,4), (4,4), (1,5), (2,5), (3,5), (4,5), (5,5), (1,6), (5,6), (1,7), (5,7), (0,8), (1,8), (2,8), (4,8), (5,8), (6,8)].

When we run k-means on this set, the resulting centroids will represent this letter "A". We can even present A's in slightly different fonts and the centroids will come out about the same. Samples with the letter "B" will have centroids that distinctly tune to "B". And so on.

Unlike Example 2, we can't initialize the centroids at random values. They'll have to start from the same places each time. In this example, four centroids are initialized, one to each corner of the 7x9 grid.

There is some supervised training that needs to happen. The algorithm is made to collect centroid coordinates from specific letters, cluster them and average them accordingly. Future samples are then compared to these averages to see which cluster (i.e.: letter) the sample is closest to.

 

Example Results

Not perfect, but pretty good.

Expected: A, Actual: A
Expected: B, Actual: B
Expected: C, Actual: C
Expected: D, Actual: D
Expected: E, Actual: K
Expected: J, Actual: J
Expected: K, Actual: K
Expected: A, Actual: A
Expected: B, Actual: B
Expected: C, Actual: C
Expected: D, Actual: D
Expected: E, Actual: C
Expected: J, Actual: J
Expected: K, Actual: C
Expected: A, Actual: A
Expected: B, Actual: B
Expected: C, Actual: C
Expected: D, Actual: D
Expected: E, Actual: E
Expected: J, Actual: J
Expected: K, Actual: K

 

public void footer() {
About | Contact | Privacy Policy | Terms of Service | Site Map
Copyright© 2009-2012 John McCullock. All Rights Reserved.
}