Humans are good at a lot of tasks. For some of the tasks we learn, we have a clear procedure in mind. For example we know how to multiply two numbers as we have learned in school as children the exact steps we need to take to accomplish the task. We can use these exact steps to instruct a computer how to multiply two numbers. With these instructions the computer can do this particular task much faster than humans, billions of numbers can be multiplied in a second on today’s computers.
On the other hand there are tasks that we, as humans, can do easily without having to think of a clear procedure. An example of such a task is the infamous problem of distinguishing cats and dogs based on pictures. The interesting difference of this task compared to multiplying numbers is that we do not have an exact procedure on how to do it. How do you distinguish a cat from a dog? Do you look at the size of the animal? Do you look at the color of the fur? Do you measure the length of their fur or tail, or look at the shape of their ears? How do you manage to still tell dogs from cats if you only see just their head, or even only one of their limbs? Likely you will answer, I don’t know precisely – but a cat just looks different from a dog… Throughout our lives we have seen many cats and dogs and built a mental model of what these animals are supposed to look like. The problem is that if we want a computer to do this task, we have no clear instructions on how to solve the task.
How do you teach a computer a task that you do not know the steps to succeed? We try to imitate nature. Just like we were inspired by birds to build aircrafts, we would like the computer to learn the task based on examples similar to how we do it ourselves. Instead of a mental model, we specify the structure of the model as a black box function with millions of variable parameters. The system then learns from examples and self-adapts these parameters to better and better solve the task at hand. This self-adaptive system proves to be successful again and again for tasks that were previously thought to be unsolvable.
How do computers learn?
The simplest approach to incorporate knowledge into a computer algorithm is to let experts define explicit rules. For example we can look at the height of the animal and if it is below 25cm, we decide for a cat. We can also combine multiple criteria for more accurate results. Such rule based systems are considered Artificial Intelligence.
How do we come up with these thresholds? It would be much easier if we can learn suitable thresholds from the data. Automatically learning rules based on a set of features brings us to Machine Learning. These features can be hand-designed, e.g. the color of the fur or the length of the tail. Typically domain experts define features that make sense. However, extracting such features can be tricky depending on the domain and often it is even unclear what suitable features are. It would be much more elegant if there was a way to even learn the features directly from the data.
Today we have the power to achieve automatic feature extraction with Deep Learning. Using large collections of images it would allow us to learn to distinguish cats and dogs with high accuracy. Unfortunately this approach has its drawbacks. The availability of large amounts of data is not feasible for all domains. Furthermore the outputs of these models are often not explainable. We started using this approach since we have no way of describing the exact steps that can solve the problem, so expecting we get a clear explanation is a bit paradoxical. There are ways to sacrifice performance to get better interpretability for domains where it is a necessity.
What is Machine Learning?
So what is machine learning apart from a trendy buzzword? In a nutshell, machine learning is a set of statistical tools that enable data-driven informed decision making. These tools allow us to transform data into knowledge.
In the past few years the field has gained a lot of interest, both in academia and the industry. There are multiple reasons for its emergence:
- Data availability
- Processing power
The first reason is what was called The Age of Big Data a few years back. Many industries have moved to digital processes, sensors keep track of the world around us and we decided to store all of it. The difference is that now we not only have the capacities to record and store massive amounts of data, but we also have the processing power to actually extract meaningful insights from the data. With the advances in processing power of the past few years it is possible to run these models at scale in production.
All of this has led to the revolution we see today. For many tasks these machine learning models outperform humans. It can be in the game of chess, image classification, cancer detection, inventory forecasting and many other domains.