#native_company# #native_desc#

Image Classification

Let's get physical

Sometimes, nothing beats holding a copy of a book in your hands. Writing in the margins, highlighting sentences, folding corners. So this book is also available from Amazon as a paperback.

Buy now on Amazon

The task of trying to figure out what is in an image is called image classification, we are trying to find to which class this image belongs.

There are several different image classification models out there, one of the most popular is MobileNet, the clue is in the name. Models can become quite large, MobileNet is optimized to be small, it’s about 20mb in total which in the land of image classification models is pretty tiny.

In this example we’ll build a small app that uses MobileNet to figure out what’s your webcam is pointing at, it will look something like so:

2.image classification good guess
Figure 1. Our application recognizes a coffee mug

In this lecture, I’ll take you through building an app like that step by step.


We will be using a version of MobileNet that has already been optimized and converted for use in TensorFlow.js.

You can use this model both the ways I have shown in previous lectures, either loading it via script tags or installing it via npm and using a bundler. We’ll be loading it via script tags for simplicity.


The code for this project is in the image-classification folder in the source code repository for this book.

Inside that folder, you will find four files like so:

├── index.html   (1)
├── utils.js     (2)
├── style.css    (3)
├── start.js     (4)
└── completed.js (5)
1 index.html contains the HTML for our project and loads the other JavaScript and CSS files.
2 utils.js is just some utility classes for rendering the results to the screen, feel free to browse, but we won’t be covering the contents of those in this book.
3 style.css contains some styles for use with the application, we won’t be going into the details of styling in this course.
4 start.js is the JavaScript file we are going to start with. There is some boilerplate code there to get you started, but none of the TensorFlow.js code is present, we’ll be adding that in as we go along.
5 completed.js is the completed JavaScript code, if you get stuck, I recommend checking our this file to see where you might have gone wrong.


The start.js file has several TODO comment blocks, as we continue with this lecture I will leave it to you to fill out those TODO blocks and build out the application. The amount of code you will have to write is trivial.

Loading the JavaScript

In our example we are loading TensorFlow.js and MobileNet using <script> tags in the <head> of index.html, like so:

<!-- Load TensorFlow.js. This is required to use MobileNet. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>

<!-- Load the MobileNet model. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/mobilenet"></script>

<script src="utils.js"></script>
<script src="start.js"></script>


=== If you want to use a bundler, feel free to do so. Refer to Using a bundler in the Pre-Trained Models lecture for more information. ===

We need to load TensorFlow.js first, before we load the MobileNet files. This is one of the disadvantages of using script tags; you need to know the order that javascript files need to be included, a bundler handles this for you.

Understanding the HTML

The body of our html looks like so:

<div class="container">
  <video id="video"> (1)
    Video stream not available.
  <div class="overlay"> (2)
    <pre id="predictions"></pre>
1 <video> is the HTML element that will load and display the webcam’s contents.
2 This div contains the predictions of what the webcam is seeing.

Loading mobilenet

In our sample code we will be initializing the load of MobileNet in our main function. The MobileNet JavaScript file we loaded via our <script> tags is not the whole of MobileNet, it’s just the controlling code. MobileNet itself, the actual trained model, is a set of data files many megabytes large, which needed to be loaded over the internet before you can use MobileNet. We initialize this loading like so:

let model = null; (1)
async function main() {
    // Initialize MobileNet and wait for is to load all it's required data files over the internet
    model = await mobilenet.load(); (2)
    await startCamera(); (3)
1 We keep a reference to our model so we can use it in other functions.
2 This line starts loading the MobileNets data files over the network and waits for then to be loaded before moving to the next line.
3 This starts the camera and runs the rest of our code.

In total, MobileNet is 18mb of data, which might seem like a lot, but for a neural network that can classify what’s in an image, it’s pretty small. If we were to look in the browser’s network panel right now, we would see several shard files loaded. This is the MobileNet data chunked into 4mb files loading over the network, like so:

MobileNet sharded Data over the network

Start Camera

The startCamera() function does the bulk of the work, let’s break down the code line by line:

async function startCamera() {
    let videoElement = document.getElementById("video"); (1)
    const camera = await tf.data.webcam(videoElement); (2)

    setInterval(async () => { (3)
        const image = await camera.capture(); (4)
        let predictions = await model.classify(image); (5)

        renderPredictions(predictions); (6)

        const logits = await model.infer(image); (7)
    }, 1000);
1 Gets the <video> HTML element from the document.
2 This helper function from the TensorFlow.js library initializes the camera and starts it running i.e., this line makes the video element start showing the contents of your webcam.
3 Every second we want to grab a still from the video and use MobileNet to figure out what’s inside it, so we use the setInterval(…​, 1000) function with a 1000 ms timer.
4 We capture a still image from the video stream.
5 MobileNet comes with a helper classify function, which, if passed an image will return information about what it thinks is in the image.
6 renderPredictions is a helper function in util.js which pretty prints the output from MobileNet on screen.
7 The infer function returns the raw output from the MobileNet model.

The above code is all you need to create an app that can detect what’s in an image, like so:

Demo App Guesses Wrong

However, after using the app for a while, you might notice that the guesses are quite often wrong. In the next lecture, we’ll look into why that might be the case, the raw output of a model, and how to build a Neural Network model that classifies things.

Advanced JavaScript

This unique course teaches you advanced JavaScript knowledge through a series of interview questions. Bring your JavaScript to the 2021's today.

Level up your JavaScript now!