I mustn’t divulge too much detail as I’ve been told there are some as yet unspecified confidentiality agreements going on, but for reasons I will keep to myself, I expect to be doing a lot of work on state-of-the-art machine learning in the coming year. Let me tell you a little bit about the state of the art of machine learning, known as “Deep Learning” particularly a neat little system called an AutoEncoder.

So, if we deconstruct the term “AutoEncoder,” a general idea of what it is should become relatively clear right away: an AutoEncoder automatically creates an encoding. What does this mean? Why is it important? Well, to explain, let me present this image:

We humans can encode this image in language. It is a picture of a cat in a funny position with a caption. Notice that that this description, while it accurately describes the picture, does not completely describe it.  There are an unlimited number of ways to describe this picture just as the description above could be applied to an unlimited number of pictures. This is common knowledge, commonly expressed in the aphorism “a picture is worth a thousand words.”

But wait, if this description loses much of the detail of the picture, how is it useful? This is the key: when we humans encode something in words, we focus on the elements that will be most meaningful to a given context. If I’m explaining this picture to someone who has never seen a LOLcat, the above description may suffice. If I want a description that will capture the humor of the picture, it will be much, much more difficult.

Now what does this have to do with what computers can do? Computers of course don’t use English to encode things, they use numbers. Instead of a in sentence, an AutoEncoder’s goal would be to encode the most relevant details of this image in a “vector” which is a fixed-length list of numbers. To accomplish this, the AutoEncoder will take many, many images and, using some clever math, convert the hundreds of thousands of underlying numbers (a very long vector) that represent the image verbatim into a more manageable list of numbers (a shorter vector, maybe 200 numbers). Then, to see if it did a good job, it tries to  reconstitute the images from the numbers. With more clever math, it evaluates the reconstituted images against their originals, and then it adjusts its encoding scheme accordingly. After doing this hundreds, thousands, or millions of times, the AutoEncoder, if everything went well, has a decent way of representing an image in a smaller space.

Note that this is different from compression. We would not want to use this as a compression algorithm because it’s generally extremely lossy, that is, the reconstructed image will be noticeably different from the original. This matches our experience using language to describe pictures.

So what is it good for? Well, remember when I mentioned context? Say we wanted to make a machine to automatically identify LOLcats that I would find funny. I could rate hundreds of LOLcats as funny or not funny, and provide this set of ratings alongside the  AutoEncoder as a context. So, in addition to trying to accurately encode the image, the AutoEncoder wants to encode whether it’s a funny image or not. This context can change what the AutoEncoder focuses on in its string. Just like you or I wouldn’t mention the beer can in the photo there, a well-constructed AutoEncoder may be clever enough to realize that the beer can is not likely to have much of an impact on how funny I find the picture, so it can leave it out.

AutoEncoders and deep learning in general represent a departure from the machine learning of previous decades in that they can use context and this encoding concept to develop their own features. Before, we humans would decide how an image should be encoded, using our own ingenuity to figure out what does and does not make Sam laugh when he looks at pictures of cats. Now the computer can do it on its own, and this is a big deal for the future of computer science. As amazing as it may seem, it is conceivable that within our lifetimes a time may come that I never have to look at a boring cat again.


3 thoughts on “AutoEncoders”

  1. This post reminds me of an old poem…

    But seriously, this sounds like exciting work. For one thing, it might be helpful for researchers.

  2. What could the computer possibly know about this picture? Isn’t it just a bunch of little colored dots from the computer’s point of view?


    1. It is, but the computer can look at those colored dots and get a very basic idea of what’s going on, like finding edges where one color changes to another color. Then we look at these basic ideas with another “layer” of artificial intelligence and build more complex notions such as nose, paws, and hands. Then a third layer may be able to recognize a cat. It’s an open question how many layers it would take to model even a single person’s visual sense of humor or if it’s even possible, but the layers as I described them are already well-documented in the image recognition literature.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s