Self Organizing Maps

6/18/2018

Recently I've become really fascinated by the idea of manifold learning, the main idea behind it being that most data sets are actually too high-dimensional, and we can reduce the dimension of the data by viewing it as a manifold.

Since I need to start preparing for a senior thesis, I decided that I want to try and implement some manifold learning techniques in order to better understand how all of it currently works. The simplest (at least at first glance to me) technique was called a "Self Organizing Map".

Most C++ implementations of SOMs are restricted to two dimensions, and I wanted to understand SOMs for myself; so I decided to make my own. The current iteration of it is here.

The image to the right is an example of an output of an SOM, it is a 2 dimensional clustering of the colors based off of eight random colors. You'll notice how similar colors sorta ended up in similar places (with the exception of that one lost colony of green), and how there are usually nice gradients between everything.

But what is a SOM?

The idea with an SOM is that we start with a lattice of nodes in the dimension of our map (i.e. above we have a 2d lattice of nodes) and each node has a randomized image vector in the dimension of our data space. So in the color example above, we have our lattice of nodes (each represented by a pixel) which each has a 3 dimensional vector (represented by the color of the pixel).

From there we train our map by dragging this lattice of nodes onto our data to form a manifold. How exactly we do that is via the below steps:

Choose a random vector from our training data
Find the node who's image is the closest (usually determined via euclidean distance) to the training vector, this is our BMU
Drag our BMU's image vector closer to the training vector, and drag all of it's neighbors closer to the training vector.

What we call a neighbor, and how exactly we drag the vectors closer is sorta up to the implementation. In my library I made it so that you can give it a function pointer to a different update function if you so desire.

What does this look like though? Well, let's see the algorithm train. In the below example we are feeding the algorithm 5000 RGB values where the green component is set to 0. So we are approximating a plane with a plane. We then ran the algorithm over 3000 iterations on a lattice of 100 x 100 nodes.

We can see how the algorithm starts with a noisy random plane and converges to a smooth plane approximating the dataset that we gave it.

It's important to note that SOMs don't have to be planes. We can have a SOM line, something that approximates a path. (It is important to note though that SOMs don't preserve "orientation" of the data, the path generated by the SOM could go in any direction).

So let's see it approximate a path.

I generated a dataset of random points on a circle with a gaussian noise (mean 0.0, std. dev 0.1), and ran it through a 1d SOM to try and see how well we could approximate the circle.

If we had no idea that the data was derived from a circle, then the approximation that we get from the SOM looks incredibly reasonable. It underestimates the data a little bit, and tends towards the center of the circle, but it definitely did a good job of approximating the arc of the circle.

Notable though, we end up with a horseshoe shape, not a circle, and that's due to a limitation of the model of our map space. We are mapping on the real line, so we don't get the nice looping effect of a circle. The end of our line is completely distinct from the start, whereas in a circle the end is the same as the start.

It would be an interesting experiment to modify the SOM to instead of having our map space be an N-dimensional cube (as I currently have it implemented), to have it be an N-dimensional sphere.

But all in all, this is an SOM, and I think it's pretty nifty.

0 Comments

Self Organizing Maps

Leave a Reply.

Author

Archives

Categories