#native_company# #native_desc#
#native_cta#

MobileNet Inputs and Outputs

Let's get physical

Sometimes, nothing beats holding a copy of a book in your hands. Writing in the margins, highlighting sentences, folding corners. So this book is also available from Amazon as a paperback.

Buy now on Amazon

The MobileNet app you built, although incredibly cool, wasn’t that good at predicting what’s inside an image. In this lecture, we’ll take a look at why and by the end, you’ll understand how a classification model is structured, the inputs and the outputs.

Classes

The MobileNet model is a type of classification model given a set of inputs (the image); it can decide what class they belong to, for example, toilet tissue.

Taking a look at the MobileNet code, there is a file called imagenet_classes.ts which you can find online here: https://github.com/tensorflow/tfjs-models/blob/master/mobilenet/src/imagenet_classes.ts

Inside that file you will find an object that lists 1000 things, like so:

export const IMAGENET_CLASSES: {[classId: number]: string} = {
  0: 'tench, Tinca tinca',
  1: 'goldfish, Carassius auratus',
  2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
  .
  .
  .
  996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola ' +
      'frondosa',
  997: 'bolete',
  998: 'ear, spike, capitulum',
  999: 'toilet tissue, toilet paper, bathroom tissue'
};

The MobileNet model has been trained to detect only 1000 things in the world. Models that can identify a lot more in an image exist; however, they are very large. MobileNet, as the name suggests, has been optimized so that the model size is small, it’s currently about 18 MB of data which for use in a mobile context, or even at a stretch in a web context, is fine.

You can also see that each textual description has a number associated with it, so 999 equals 'toilet tissue, toilet paper, bathroom tissue'.

Inputs and Outputs

What does the model take as inputs, and what does it output?

We saw from the previous lecture that the model took as inputs an image, like so:

  const image = await camera.capture();
  let predictions = await model.classify(image)

model.classify is just a helper function, the actual code used to pump data into a model and grab its outputs is called model.infer and we can use it like so:

  const logits = await model.infer(image);
  logits.print();

Calling infer returns for us something which we call logits. You can console.log(logits) but it won’t reveal any useful data; it’s a type of something called a Tensor. To get a Tensor to print something useful to the console we need to use the print() function, it will print something like so:

Tensor
     [[0.7322746, 0.0774543, 0.609502, ..., -4.9255857, 2.2073834, 6.5471358],]

Logits are the raw numerical output of the model; you can see it’s printing out three numbers then …​ then three other numbers. Tensors can be huge and printing them out problematic, so the print function just prints the first and last three numbers.

To see the size and shape of the Tensor we can log the shape property, like so:

  const logits = await model.infer(image);
  console.log(logits.shape);
  logits.print();

This will print out [1, 1000] to the console, which means 1 column and 1000 rows.

So our logits Tensor contains 1000 numbers, this might give a clue to what the output of our model might be?

Classification Outputs

To simplify, let’s imagine we have a model that’s trained only to detect if a black and white image is of a cat or a dog.

Perhaps we break the image down into a set of black and white pixels with values ranging from 0 for black and 255 for white and then pass those is as inputs into our model.

What would the output be if we passed an image of a cat to the model? At first, you might guess that the model might return the string “cat”, like so:

2.run existing models.001
Figure 1. Does a model output the string “cat”?

However, that’s not how classification models work; models are just equations we pump some numbers in, it does various forms of maths and pumps some numbers out, so the output has to be numbers.

You then might assume that the model outputs a number, perhaps 1 for Cat and 2 for Dog, like so:

2.run existing models.002
Figure 2. Does a model output the number 1?

This isn’t quite how a classification model works, instead for each class it outputs a number, like so:

2.run existing models.003
Figure 3. The model outputs a probability distribution?

Each number is related to the probability of that class being the one that matches the inputs.

Softmax

Looking back at the output of our MobileNet model, we can see that we are outputting 1000 numbers. Each of these numbers maps to one element in our IMAGENET_CLASSES object.

Tensor
     [[0.7322746, 0.0774543, 0.609502, ..., -4.9255857, 2.2073834, 6.5471358],]

But they don’t look like probabilities; some of them are even negative.

With classification models, we take the raw outputs and pass them through a softmax function. Softmax[1] is a mathematical function that turns an array of numbers into a probability distribution ranging from 0 to 1; you can find out more details about how softmax works mathematically in the article Understand the Softmax Function in Minutes[2].

TensorFlow.js comes with a handy softmax() function which we can use like so:

  const logits = await model.infer(image);
  logits.softmax().print();

And this prints out something like:

Tensor
     [[0.0000087, 0.0000045, 0.0000077, ..., 0, 0.0000378, 0.0029015],]

Which looks far more like a probability distribution, all the numbers add up to one, and the number with the highest probability is what it thinks the image is.

Summary

Models take as inputs numbers and output numbers. Everything you want to solve in Machine Learning needs to be represented in the domain of numbers. This is one of the core challenges of Machine Learning, how to take a problem domain and represent it in numerical form? For a classification model the typical solution is to use a softmax function which gives a probability for what class you think the inputs fall into.



Advanced JavaScript

This unique course teaches you advanced JavaScript knowledge through a series of interview questions. Bring your JavaScript to the 2021's today.

Level up your JavaScript now!