Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add config option for launcher #1349

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions examples/intel_extension_for_pytorch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# TorchServe with Intel® Extension for PyTorch*

TorchServe can be used with Intel® Extension for PyTorch* (IPEX) to give performance boost on Intel hardware.
Here we show how to use TorchServe with IPEX.

## Contents of this Document
* [Install Intel Extension for PyTorch](#install-intel-extension-for-pytorch)
* [Serving model with Intel Extension for PyTorch](#serving-model-with-intel-extension-for-pytorch)
* [Creating and Exporting INT8 model for IPEX](#creating-and-exporting-int8-model-for-ipex)
* [Benchmarking with Launcher](#benchmarking-with-launcher)


## Install Intel Extension for PyTorch
Refer to the documentation [here](https://github.com/intel/intel-extension-for-pytorch#installation).

## Serving model with Intel Extension for PyTorch
After installation, all it needs to be done to use TorchServe with IPEX is to enable it in `config.properties`.
```
ipex_enable=true
```
Once IPEX is enabled, deploying IPEX exported model follows the same procedure shown [here](https://pytorch.org/serve/use_cases.html). Torchserve with IPEX can deploy any model and do inference.

## Creating and Exporting INT8 model for IPEX
Intel Extension for PyTorch supports both eager and torchscript mode. In this section, we show how to deploy INT8 model for IPEX.

### 1. Creating a serialized file
First create `.pt` serialized file using IPEX INT8 inference. Here we show two examples with BERT and ResNet50.

#### BERT

```
import intel_extension_for_pytorch as ipex
from transformers import AutoModelForSequenceClassification, AutoConfig
import transformers
from datasets import load_dataset
import torch

# load the model
config = AutoConfig.from_pretrained(
"bert-base-uncased", return_dict=False, torchscript=True, num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased", config=config)
model = model.eval()

max_length = 384
dummy_tensor = torch.ones((1, max_length), dtype=torch.long)
jit_inputs = (dummy_tensor, dummy_tensor, dummy_tensor)
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)


# calibration
with torch.no_grad():
for i in range(100):
with ipex.quantization.calibrate(conf):
model(dummy_tensor, dummy_tensor, dummy_tensor)

# optionally save the configuraiton for later use
conf.save(‘model_conf.json’, default_recipe=True)

# conversion
model = ipex.quantization.convert(model, conf, jit_inputs)

# save to .pt
torch.jit.save(model, 'bert_int8_jit.pt')
```

#### ResNet50

```
import intel_extension_for_pytorch as ipex
import torchvision.models as models
import torch
import torch.fx.experimental.optimization as optimization
from copy import deepcopy


model = models.resnet50(pretrained=True)
model = model.eval()

dummy_tensor = torch.randn(1, 3, 224, 224).contiguous(memory_format=torch.channels_last)
jit_inputs = (dummy_tensor)
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_symmetric)

with torch.no_grad():
for i in range(100):
with ipex.quantization.calibrate(conf):
model(dummy_tensor)

model = ipex.quantization.convert(model, conf, jit_inputs)
torch.jit.save(model, 'rn50_int8_jit.pt')
```
### 2. Creating a Model Archive
Once the serialized file ( `.pt`) is created, it can be used with `torch-model-archiver` as ususal. Use the following command to package the model.
```
torch-model-archiver --model-name rn50_ipex_int8 --version 1.0 --serialized-file rn50_int8_jit.pt --handler image_classifier
```
### 3. Start Torchserve to serve the model
Make sure to set `ipex_enable = True` in `config.properties`. Use the following command to start Torchserve with IPEX.
```
torchserve --start --ncs --model-store model_store --ts-config config.properties
```

### 4. Registering and Deploying model
Registering and deploying the model follows the same steps shown [here](https://pytorch.org/serve/use_cases.html).

## Benchmarking with Launcher
`intel_extension_for_pytorch.cpu.launch` launcher can be used with Torchserve official [benchmark](https://github.com/pytorch/serve/tree/master/benchmarks) to launch server and benchmark requests with optimal configuration on Intel hardware.

In this section, we provde an example of using launcher to benchmark on a single instance (worker), single socket, and using all physical cores on that socket. This is to avoid thread oversupscription while using all resources.

### 1. Launcher configuration
All it needs to be done to use Torchserve with launcher is to set its configuration at `config.properties` in the benchmark directory. Note that the number of instance, `-- ninstance` is 1 by default. `--ncore_per_instance` can be set as appropriately by checking the number of cores per socket using `lscpu`.

For a full list of tunable configuration of launcher, refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md)

```
ipex_enable = True
cpu_launcher_enable=true
cpu_launcher_args=--ncore_per_instance 28 --socket_id 0
```

### 2. Benchmarking with Launcher
The rest of the steps for benchmarking follows the same steps shown [here](https://github.com/pytorch/serve/tree/master/benchmarks).

CPU usage is shown as below.
![sample_launcher](https://user-images.githubusercontent.com/93151422/143912711-cacbd38b-4be9-430a-810b-e5d3a9be9732.gif)
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ public final class ConfigManager {

// IPEX config option that can be set at config.properties
private static final String TS_IPEX_ENABLE = "ipex_enable";
private static final String TS_CPU_LAUNCHER_ENABLE = "cpu_launcher_enable";
private static final String TS_CPU_LAUNCHER_ARGS = "cpu_launcher_args";

private static final String TS_ASYNC_LOGGING = "async_logging";
private static final String TS_CORS_ALLOWED_ORIGIN = "cors_allowed_origin";
Expand Down Expand Up @@ -339,6 +341,14 @@ public boolean isMetricApiEnable() {
return Boolean.parseBoolean(getProperty(TS_ENABLE_METRICS_API, "true"));
}

public boolean isCPULauncherEnabled() {
return Boolean.parseBoolean(getProperty(TS_CPU_LAUNCHER_ENABLE, "false"));
}

public String getCPULauncherArgs() {
return getProperty(TS_CPU_LAUNCHER_ARGS, null);
}

public int getNettyThreads() {
return getIntProperty(TS_NUMBER_OF_NETTY_THREADS, 0);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,20 @@ public void startWorker(int port) throws WorkerInitializationException, Interrup

ArrayList<String> argl = new ArrayList<String>();
argl.add(EnvironmentUtils.getPythonRunTime(model));
if (configManager.isCPULauncherEnabled()) {
argl.add("-m");
argl.add("intel_extension_for_pytorch.cpu.launch");
argl.add("----ninstance");
argl.add("1");
String largs = configManager.getCPULauncherArgs();
if (largs != null && largs.length() > 1) {
String[] argarray = largs.split(" ");
for (int i = 0; i < argarray.length; i++) {
argl.add(argarray[i]);
}
}
}

argl.add(new File(workingDir, "ts/model_service_worker.py").getAbsolutePath());
argl.add("--sock-type");
argl.add(connector.getSocketType());
Expand Down