Articles

Graphics Cards And Machine Learning

Neural networks were once reserved to the world of supercomputers. They can now be easily handled by commodity hardware.

Efficient machine learning alogrithms are highly dependent on the raw speed of mathematical operations.
For decades, we thought that neural network algorithms did not work well because of all sorts of complex problems, mainly related to an obscure mathematical issue named local optima in gradient descent.
But as Prof. Geoffrey Hinton puts it, this story was actually nonsense.
With hindsight, it comes out that what was wrong was that we didn't have fast enough computers.

In fact, the recent revolution in machine learning is completely correlated to our ability to crunch more data more rapidly.
The mathematical tools that we are using today, including neural networks themselves, were already here back in the 70s.
But in the meantime, computers have become much faster, and much cheaper. An iPad 2 has roughly the same processing power as the mythical Cray-2 supercomputer. For the record, the Cray-2 had a 1985 price tag of $16 million.
In essence, the development of Artificial Intelligence is simply a consequence of the phenomenal increase in available computing power.

Graphics Cards are massively parallel devices

It turns out that almost every modern device, from desktop computers to smartphones, includes a wonderful little unit called a graphics card.
A graphics card (or GPU as in Graphics Processing Unit) is responsible for rendering the images on your screen.
The traditional architecture of software is based on a separation of tasks : the CPU (the Central Processing Unit, i.e. the processor of your computer) is responsible for all the logic of the application, while the GPU is responsible for displaying the images.
If your computer needs to verify a password, or encrypt your data, the CPU will take care of it. If your computer needs to zoom on the screen or to change the background color, the GPU will take care of it.
The architecture of a GPU is engineered with this specialization in mind :

  • GPUs always almost do the same, simple tasks. Change a pixel value. Zoom in. Zoom out. Move right. Move left.
  • GPUs don't need to be extraordinarily fast. After all, your eye only catches 24 images per second. A GPU is considered high-end if it can render images 60 times per second.
  • But GPUs need to be able to work on a massive amount of data in parallel. If your screen has a 1920 x 1080 pixel resolution, this means at least 2 million tasks in each round.

GPUs can do much more than graphics

It turns out that the GPU would be an excellent tool for machine learning algorithms.
Machine learning involves always the same basic mathematical operations. Add. Substract. Multiply. And a bit more, of course... But GPUs are perfectly capable to handle this.
And machine learning requires these operations to be processed in parallel on massive amounts of data. In fact, it is much more efficient to do 2 million tasks in one shot on a slow GPU, than it is do them sequentially on an extremely fast CPU.

Naturally the next question becomes : can we really use the GPU to do this ?
In other words, can we use the GPU to run smart machine learning algorithms instead of displaying dumb pixels on a screen ?
And as often in software development, the answer is : yes.
Of course, yes. The GPU is just a piece of silicium. It doesn't know whether we are giving it valid pixels or not; we can fool it and feed it with whatever data we want.

Graphics Cards have impressive computational power

Once we do this, the raw processing power of GPUs that becomes available can be mindshaking.
By way of example, many commercial graphics cards in the $500-$1000 range announce capabilities of well above 1 TeraFLOPS, i.e. 1,000 billion operations per seconds.

Commercial name TeraFLOPS Parallel units
Nvidia Tesla K20 3.520 2496
Nvidia Tesla K40 4.290 2880
Nvidia GeForce GTX 1080 8.228 2560
AMD R9 290X 5.632 2816
AMD R9 390X 5.912 2816

Obviously these specifications are extracted from commercial brochures and real performance will vary depending on a lot of factors.
Nonetheless, the numbers are impressive.

Neural networks can naturally exploit the parallel power in GPUs

In machine learning, there is one field that can hugely benefit from GPUs: neural networks.
Let me put an example in perspective :

  • Let's say that you are trying to learn the features of a database of 60,000 images.
  • Let's say that each of these images has a size of 784 pixels.
  • Let's say that you want to use a neural network of 1000 neurons.

This example might look theoretical, and indeed it is; but people familiar with machine learning will recognize the parameters of a standard benchmark of the industry called MNIST.
I will not enter into the details of the maths involved here, but here are some key numbers:

  • The database is a matrix of 60,000 x 784 elements i.e. almost 50 million numbers.
  • The neural network is a matrix of 784 x 1,000 parameters, i.e. almost a million parameters.

The mathematical foundations behind neural networks lie on linear algebra. Behind this cryptic term, the computing reality is that I need to multiply these 2 matrices together.

Now here comes the issue.
When you mutliply a 60,000 x 784 matrix by a 784 x 1,000 matrix, you need to realize 60,000 x 784 x 1,000 multiplications, and as many additions.
In total, this means about 100 billion operations for just 1 matrix multiplication.
Because the algorithm needs to do a lot of these matrix operations before it gives a result, the computational workload of a neural network is - let's plut it simply - huge.
But if you now have a GPU that can handle 1,000 billion operations per second, something that was impossible a second ago becomes absolutely feasible.

The human brain itself is massively parallel

It can be tempting to compare the structure of the humain brain with the parallel architecture of a GPU.
If you think about it, the human brain is not extremely fast, and not extremely good at computational tasks. In fact it is striking to observe how painful it is for the brain to make a simple addition or multiplication beyond 2 digits. Whereas a CPU will do it instantly and without effort.
However the brain is hugely parallel. It has billions of synapse connections. Far more than the biggest GPU farms. And it is equally striking to observe how easy it is for the brain to recognize a face, or understand a sentence. Whereas a CPU is unable to do this.
And after all, neural networks were designed to mimic the connectional structure that exist between true biological neurons.
So maybe it is not a complete surprise after all that a slow but hugely parallel chip can help us achieve some form of deeper intelligence (let's call it this way) than a very fast but non-parallel chip.

But that's about all we can say

Or rather, it is a complete surprise.
Because in all fairness, we have absolutely no idea about what happens in the brain.
Surely, our brain has billions of synapse connections. But equally surely, our brain is not making matrix multiplications up there.
So, the beauty of neural networks on GPUs is that they look like the brain, but they do not work like the brain.
For now, at least.