The Intuition Behind Feature Representations using Sculptures
You may have heard of a variety of terms such as: features, bottlenecks, embeddings, feature vectors, feature representations, feature spaces, latent vectors and latent spaces. I'm sure you are also well aware of the incredible amount of jargon dominating the field of machine learning. The aim of this post is to disseminate this jargon and understand what exactly a feature is and why feature representations are required when dealing with high dimensional data.
Lets start with a simple example where we would not require the use of feature extraction because the dimensionality and the scope of the problem is reasonable. Consider the example below:
In this example we are tasked with developing a set of rules to be used to discriminate between red and blue points. These points are 2-dimensional (x and y coordinates). This set of rules becomes a decision boundary which is to say it is a surface on the vector space whereby we can use it to decide which class is which. In this particular example a few things can be realized. The decision boundary shown is non-linear and would require many parameters, but it has a perfect classification accuracy. A simple linear decision boundary would suffice quite well, but it is clear it would not have perfect classification accuracy. Aside: The advantages of a "rougher" decision boundary are explained in my Fish Species Classification post. In that post I also discuss the concept of over-fitting models which I will not discuss here for the sake of brevity.
With this trivial example in mind of drawing a two-class decision boundary on 2-dimensional space, let us try a harder example.
Below shows coloured plexi-glass suspended on strings which can be described using 3 spatial dimesions. If I tasked you to draw a decision boundary to separate the colours in three dimensional space, could you do it? Probably. Maybe it would require non-linear functions and many parameters, but totally possible.
It would be a little bit challenging though right? Certainly harder than the toy 2D example from before. What used to be a decision line is now a decision surface and it would be quite difficult to separate the colours (i.e. classes). It can be obseved that as the dimensionality goes up, creating classification rules becomes more difficult.
Now what if you were to shine light through it, in just the right way.
Shining light through the sculpture is akin to finding a literal projection of the original object where the information we care about is more discernible.
If I were to task you with drawing a decision boundary on this lower dimensional projection, would you have an easier time? That is to say, would that boundary require less parameters? In essence, what we have done is found a lower-dimensional version (2D) of our original data (3D) and chosen to draw our boundaries on that vector space as opposed to the original vector space. This is what feature extraction and dimensionality reduction is.
In the field of computer vision it is common to use pre-trained convolutional neural network feature extractors to convert the original high-dimensional data (images) into a set of lower dimensional features. These are large neural networks (see Inception, Resnet) which are just a collection of linear and non-linear transformations. The parameters that create the decision boundaries in these networks were fitted on large datasets (see ImageNet). These neural networks shine light through the data in just the right way such that the lower dimensional projection on the wall has more discernible features when compared to the original data. They take a 256 by 256 RGB image of a cat and convert it to a lower dimensional vector that is more refined and descriptive.