How to intuitively understand neural networks

A framework for thinking about neural networks, and supervised machine learning.

I was recently asked the question

How can I better understand neural networks?

There are two powerful things which enable neural networks to be useful.

  1. Universality
  2. Parameter optimization

These things may sound mystic to you now, but you already have an intuitive understanding of the concepts.

But before we get started, let's establish the basics.

A neural network is just a function.

Really, a neural network is just a function; like $f(x)$.

This function depends on a number of parameters. This is also not anything special; you’ve probably seen something like

$$f(x)=ax+b$$

Here, a and b are parameters which change how the function behaves. With some fancy notation, we can write $\theta = [a,b]$.

This again is not scary; it’s just a fancy way of writing both aa and bb using just a single symbol. (We must save the trees, right?)

We can now write $f_\theta(x)$ which means we have a function $f$ which depends on the variable $\theta$ and which runs over the independent variable $x$.

Take a moment to appreciate that writing $f_\theta(x)=\theta_1 x + \theta_2$​ is still fundamentally no different from writing $f(x)=ax+b$ where $\theta_1 = a$, and $\theta_2=b$.

The reason people are interested in neural network functions is that they have some neat properties one of which being that if we find the right parameters $\theta$, a neural network can estimate any other function with arbitrary precision. (some conditions apply)

This means that you can find a value for $\theta$ such that our neural network is basically equal to $f(x)=ax+b$ or $f(x) = \int_{-\infty}^\infty \hat f(\xi)\,e^{2 \pi i \xi x} \,d\xi$, or anything else you might write.

This is really, really powerful!

With this general form $f_\theta(x)$, you can do anything which you can describe as a function. And since everything is a function, this includes everything from finding finding an optimal route, to understanding the contents of an image or a book, and to knowing which movie you want to watch, or which dress you want to buy.

Anything!

I can’t stress enough how powerful this universality is.

But we still have one huge problem.

How do we find $\theta$?

This is where the second powerful thing comes in.

In order to find the optimal value for $\theta$, we just need a way of expressing how bad or wrong our current estimate $f_\theta(x)$ is, and then minimize that expression.

This too is really, really powerful.

We don’t need to know the solution beforehand, all we need is a way of assessing how good our current solution is, and then the computer automagically figures out an optimal solution.

I don’t know how to describe how awesome that is. When I read that, my brain just goes ‘wow this is cheating.

If you read through all of that, you now know much more about neural networks than people worrying about neural networks becoming sentient and taking over the world. You now understand why researchers are amused about comment such as those.

You know it’s silly that a function which is fundamentally no different from $f(x)=ax+b$ should suddenly become sentient.

If you want a mathematically more rigorous introduction which addresses all the “some conditions apply” notes, as well as exactly how neural network functions are constructed, and how we optimize $\theta$, you may want to read Introduction to Neural Networks.