Recently for a class in Machine Translation I did another sort of manifold learning type task.
The premise of the problem was this: Using Google's Word2Vec you can take a sentence and learn "word embeddings", or N-dimensional euclidean vectors, that are supposed to encode the meaning of each word. Similar words should be embedded in similar places. What if you do this for two translations of a text? Can you use the word embeddings in one language to predict the embeddings in another?
The answer is a solid sorta.
Recently I've become really fascinated by the idea of manifold learning, the main idea behind it being that most data sets are actually too high-dimensional, and we can reduce the dimension of the data by viewing it as a manifold.
Since I need to start preparing for a senior thesis, I decided that I want to try and implement some manifold learning techniques in order to better understand how all of it currently works. The simplest (at least at first glance to me) technique was called a "Self Organizing Map".
Most C++ implementations of SOMs are restricted to two dimensions, and I wanted to understand SOMs for myself; so I decided to make my own. The current iteration of it is here.
But what is a SOM?