# Optimization

### Let's get **physical**

Sometimes, nothing beats holding a copy of a book in your hands. Writing in the margins, highlighting sentences, folding corners. So this book is also available from Amazon as a paperback.

In the chapter “What is a Neural Network?” we covered the concept of training a neural network. This training process is the compute-heavy, number-crunching we associate with machine learning, this training is called **optimization**.

The good news is that this is what TensorFlow.js is good at, in this lecture, we’ll cover the mechanics of **optimization** using the low-level core library.

## Code

The code for this lecture, and the next lecture on optimization, is in the `tensorflow-optimization`

folder in the source code associated with this course.

That folder has three files, like so:

```
.
├── index.html
```**(1)**
├── main.js **(2)**
└── scratch.js **(3)**

1 |
This index.html file loads tensorflow.js and also just the scratch.js file. |

2 |
This file contains all the completed code for this lecture. |

3 |
This file should be empty. |

Open the index.html as we taught in the setup-instructions lecture and then open the console in the browser, this is where the messages will go.

Add your code to scratch.js and refresh the browser to execute it. If you have problems, check main.js to see the correct completed code.

## Use Case

To demonstrate how optimization works, let’s take an embarrassingly simple use case, something so simple we can deduce the best value in our minds, and then let’s use TensorFlow.js to figure it out for us.

Imagine we have an array `[2, -5, 16, -24, 3]`

we want to multiply each value of the array by a number `x`

so that after the multiplication, they all add up to 0.

If `x`

was `0`

then the array would end up looking like `[2 x 0, -5 x 0, 16 x 0, -24 x 0, 3 x 0]`

which results in `[0, 0, 0, 0, 0]`

.

If you multiply everything by 0, you get 0. I did mention that this was an embarrassingly simple use case!

We know the optimal value for `x`

is `0`

, what if `x`

started life as `4.12`

, how would you use TensorFlow.js to optimize, to *train*, `x`

to become `0`

?

## Variables

We first need to define our values, like so:

```
var x = tf.variable(4.12);
var ys = tf.tensor([2, -5, 16, -24, 3]);
```

`x`

is our variable Tensor, which we create using the special `tf.variable`

function, this tells TensorFlow.js that `x`

is *trainable*.

`ys`

is a Tensor to hold a sample list of numbers.

#### Note

`x`

to `0`

, so we define it as a variable.
## Loss Function

In any optimization, there is a *loss* function, a function that returns a number, which indicates how **wrong** we are. In this case, it will return how wrong our value of `x`

is.

In the previous lecture I introduced you to the handy Mean Squared Error equation, we can use that here as our loss function like so:

`var loss = ys.mul(x).square().mean().print();`

With a value of `4.12`

this initially results in:

```
Tensor
2953.54541015625
```

With a value of `0`

this results in:

```
Tensor
0
```

With a value of `-4.12`

this results in:

```
Tensor
2953.54541015625
```

So we know that if the value of `x`

is trained to below `0`

, the loss will increase again. The minimum value of our loss will be 0.

## Gradient Descent

Another reason to use such a simple use case with one variable is that we can visualize the process in a graph. As you add variables, the number of dimensions of the graph increases, and it becomes harder to reason about.

On the `x`

.
On the

As `x`

moves towards `0`

, the mean squared error also goes down, our loss goes down, as we go past `0`

into negative territory, the number starts going up again.

The lowest point of the curve is the optimal value of `x`

.

We start at 4.12, and we *slide* down the gradient of the curve till we reach the lowest point.

The thing is a computer doesn’t know the slope of the curve, so how would it figure out how to slide down it?

One solution is just to try 4.11 and 4.13 if the loss is less with 4.11, then the curve seems to be sloping down in that direction, so just follow it, perhaps later try 4.10, 4.09.

You keep on doing that until you reach the lowest point, and it starts going up again, then you are reasonably sure you are at the lowest point.

That’s called *Gradient Decent*, and it’s a conventional algorithm for training Machine Learning models.

Optimizing for one variable is a 2D curve, optimizing for two variables is a 3D surface, the lowest point on that surface is the optimal value of those two variables.

#### Important

## Optimizer

That’s what we are doing theoretically, how do we do it practically with TensorFlow.js? We use something called an `optimizer`

.

```
var learningRate = 0.001;
var optimizer = tf.train.sgd(learningRate);
```

We construct an optimizer by using one of the available Training Optimizers^{[1]} in TensorFlow.js. The one above is the Stochastic Gradient Descent^{[2]} (`sgd`

) optimizer, a faster implementation of the Gradient Descent mechanism discussed above.

The `sgd`

optimizer takes as a parameter the *learning rate*; the lower the *learning rate*, the smaller increments it tunes the variables. A large *learning rate* means the training will be fast, but it might never converge to the actual optimum value if it’s too large. A small learning rate will train slower but is more likely not to step over the optimum value and converge.

#### Note

^{[3]}.

The TensorFlow.js documentation for all the optimizers, apart from `sgd`

, have links to academic papers discussing the use of that optimizer. For example, adam^{[4]} links to the paper Adam: A Method for Stochastic Optimization^{[5]}.

But a simple guide for beginners might be to use `sgd`

for shallow networks without many layers and use `adam`

or `rmsprop`

for bigger networks with more layers.
===

Once you’ve created an optimizer, you call `optimiser.minimise`

to perform the optimization. In our use case, we have one variable `x`

which we want TensorFlow.js to try to optimize for us.

`optimiser.minimise`

takes as input a loss function, a loss function that needs to return a *loss* as a Tensor, like so:

`console.log(x.dataSync()); `**(1)**
optimizer.minimize(() => { **(2)**
return ys
.mul(x)
.square()
.mean();
});
console.log(x.dataSync()); **(3)**

1 |
This prints out the current value of `x` , which at the start should be `4.12` |

2 |
Our `minimize` function which takes as input a loss function, a function that returns a Tensor telling the optimizer how wrong the current value of `x` is. |

3 |
This prints out the value of `x` after optimization. |

The loss function **has** to use `x`

somewhere in its calculation, if `x`

isn’t used then there is no point optimizing for it, TensorFlow.js will return an error. We used the mean squared error function we have discussed above.

After the single iteration of optimization above the value of `x`

should be different, on my computer with the learning rate of `0.0001`

, the variable `x`

becomes `3.98`

, from a starting point of `4.12`

.

How do we get it to `0`

? We simply run it again and again with **the same data**. For our example let’s run it 200 times with a simple loop like so:

```
var x = tf.variable(tf.scalar(4.12));
var ys = tf.tensor([2, -5, 16, -24, 3]);
var optimizer = tf.train.sgd(0.0001);
console.log(x.dataSync());
for (let i = 0; i < 200; i++) {
optimizer.minimize(() => {
return ys
.mul(x)
.square()
.mean();
});
console.log(x.dataSync());
}
```

By the end of 200 iterations (*epochs*), I get `0.00345`

, not zero but close. If I run it 1000 times, I get `1.707e-15`

.

That’s the simplicity of supervised machine learning; you get some data, define a loss function, choose an optimizer, and run it across the data repeatedly until you get the desired outcome. That’s training, that’s Machine Learning.

## Cleaning Up

JavaScript does much of the cleaning up after you. In other languages, if you create a variable, you have to remember to tell the computer when you are finished with it, so it knows it can clean it up and let the memory be used by something else. Tensors, however, use your Graphics Card, your GPU. When using your GPU, JavaScript can’t automatically clean up after itself, so you need to clean up after yourself. If you fail to do this, then your application will have a memory leak and will eventually consume all the memory on your computer and die.

There are two methods of doing this, either using the `dispose`

function or the `tf.tidy`

function, let’s first look at the `dispose`

function.

```
for (let i = 0; i < 200; i++) {
var loss = null;
optimizer.minimize(() => {
loss = ys
.mul(x)
.square()
.mean();
return loss;
});
loss.dispose()
```**(1)**
console.log(x.dataSync());
}

1 |
However you do it, make sure after you have finished with a Tensor to call `dispose` on it. |

This can become tedious and error prone to remember all the Tensors that are getting created, so TensorFlow.js has a helper function called `tf.tidy`

which you can use like so:

```
for (let i = 0; i < 200; i++) {
tf.tidy(() => {
```**(1)**
optimizer.minimize(() => {
return ys
.mul(x)
.square()
.mean();
return loss;
});
console.log(x.dataSync());
}); **(1)**
}

1 |
We wrap all the code in our app that is creating Tensors with a `tf.tidy` function when the inner function returns it automatically deletes all the Tensors that have been created. |

## Summary

The process of training a Neural Network is called optimization.

A Neural Network is just a large TensorFlow.js graph of different Tensors and operations performed on those Tensors. Some of those Tensors are read-only; for instance, training data, some of those nodes are `variables`

, for example, weights.

An optimizer is the thing that tunes those variables, adjusts them based on the information it gets about how wrong a Neural Network is, information it gets from a loss function.

We can run the optimizer as many times as we want, each iteration we call an *epoch*.

This is how we do supervised machine learning. We run the neural network with some sample data, compare the result it gives with the known good result, calculate a loss, then let the optimizer tune the variables, and then repeat it until we think we have optimized enough.

**Advanced** JavaScript

This unique course teaches you advanced JavaScript knowledge through a series of interview questions. Bring your JavaScript to the 2021's **today**.

[🌲,🌳,🌴].push(🌲)If you find my courses useful, please consider

**planting a tree on my behalf**to combat climate change. Just $4.50 will pay for 25 trees to be planted in my name. Plant a tree!