QKeras, predict/trace, API enhancements #195

thesps · 2020-05-28T13:15:56Z

This is a big one...
There are three significant new things in this PR: QKeras support, predict, and enhancements to the API. Both predict and expanded API were essential for the QKeras support.
In reverse order:

API

One can do much more by importing hls4ml than before. Previously this stopped at, basically:

import hls4ml
import yaml; cfg = open('keras-config.yml'); cfg = yaml.load(cfg)
hls_model = hls4ml.converters.keras_to_hls(cfg)

Now it's much richer and we can do:

import tensorflow as tf
import hls4ml
# load a model
kmodel = tf.keras.models.load_model('my_model.h5')
# get the 'HLSConfig' part of the config 'file' (now just dictionary)
hls_cfg = hls4ml.utils.config_from_keras_model(kmodel, granularity='name')
# Modify the precision and reuse if desired

# create the model object. Parameters that were normally outside the 'HLSConfig' part of the file are now function arguments
hls_model = hls4ml.converters.convert_from_keras_model(model, output_dir='my-hls-test', hls_config = hls_cfg, fpga_part=...)
# write the project, but also compile the csim library (more on that below)
hls_model.compile()
# run c-synthesis, logic synthesis
hls_model.build(synth=True, vsynth=True)

All the 'old' ways of working still apply, so hls4ml convert -c ..., hls4ml build -cs -p ... from the command line still work. And the API allows the 'old' way above: keras_to_hls with a config file returning an HLSModel (but you can do more with the model now).

`predict`, `trace`

This is the reason the branch is called csim_integration. Now with an HLSModel object (created as above with the API, for example), one can do:

y = hls_model.predict(X)

It's pretty self explanatory, but awesome. X is an array object just as you would use to predict with the Keras (or whatever) Python model, and the returned y is an array as well. Then you can compare your predictions vs. the original float model with ease, make ROC curves, etc.
Technically this is implemented by compiling the HLS project (together with the ap_fixed and other headers) into a shared object, and binding it to Python using ctypes. So what you get is "csim" but without running Vivado, and in fact this works without an installation of Vivado on the system(!).

trace is the related capability to get the output from individual layers from the model for lower-level debugging. This is useful when using custom precision throughout the model for example.
It's used like: trace = hls_model.trace(X). The returned object is a dictionary. The names of the (HLS) model layers are used as keys, and each one points to the array captured at the output of that layer.
A utility has been added to the profiling to get the equivalent for a Keras model: trace = hls4ml.model.profiling.get_ymodel(keras_model, X).
hls4ml does things like separating activations out of Dense layers into their own layers, and this utility does that too to make it easier to make comparisons.

QKeras support

We can now support QKeras models using most of the quantizers that QKeras has. Some more detail can be obtained from this presentation.
When a Keras model uses QDense, QConv, etc layers, the quantizer is extracted and used to quantize the weights. The HLS data type is set according to the settings of the quantizer.
The QKeras quantizer alpha parameter is handled with a new layer ApplyAlpha (uses BatchNorm for inference) to scale the weights after the Dense or Conv layer. The proper conversion is handled at the config-dictionary level, and with new Optimizer passes. To get proper performance, users should use the hls4ml config file utility demonstrated above:

hls_cfg = hls4ml.utils.config_from_keras_model(qkeras_model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(qkeras_model, ...)

Other

There is a new implementation of Softmax activation. The existing implementation had a few hard coded constants so was difficult to configure with different data types, and would occasionally bug and output the largest value for the smallest input value (I think a saturation issue).
The new version was also made more configurable, with the possibility for different types for the e^x and 1/x tables. I will follow up this PR with a comment benchmarking the performance, but I've found it to be smaller, faster and more accurate.

…gration

* add comparison support for each layer's ouput * clean up a bit * take ouput of layer and activation separately * API changes and add distribution plot * new method for normalizing the difference * mistakenly overwrote file when merge * complete support for comparing ouputs with trace, convert hls_model's outputs to numpy array * change codes according to comments * use np.asarray and fix bug on compiling

* bsah not sh * Fix to precision inference from qkeras config

…on properly. Propagate new Softmax to keras_to_hls.

QKeras updates

…respectively rely on are present.

Fix imports

thesps · 2020-06-04T02:45:16Z

I promised some more detail on the new Softmax implementation, here it is.

I created a jupyter notebook that you can see here: https://gist.github.com/thesps/c7e59cf5597804693b8663ddc9496b64

To summarise what I did:

Loaded a model trained on the jet tagging dataset
Extract the output array from the layer before the softmax activation with hls4ml.model.profiling.get_ymodel_keras, sampled with the test data
Create a new Keras model with only a Softmax layer (no need to train)
Simulate and synthesize this model with hls4ml (using the data captured from Keras)
Copy the nnet_activation.h from hls4ml v0.2.0 into the PR 195 environment (so everything else stays at PR 195 level)
Simulate and synthesize the model again (using the same data captured from Keras)
Make plots

So just to make it clear, I've isolated only the Softmax layer, and I'm running inference with the same data each time, extracted from Keras.
Here you can see the residual between Keras (CPU evaluation) and hls_model.predict aka CSIM for v0.2.0 and PR195 implementations:

Regular	Log axis

I don't think either is "perfect" but the new version is definitely more peaked around 0.

Perhaps more usefully are the ROC curves where the difference is a bit more visible:

And finally, the synthesis results (latency in clock cycles, resources in absolute numbers).

Version	Latency	BRAM	DSP	FF	LUT
PR195	5	4	5	240	123
v0.2.0	6	13	0	757	4561

Generally the new implementation will use, for N classes: N DSPs and ceil(N/2) + 1 BRAMs assuming 18 bits used for the tables.

The other new feature is that the two look up tables can use different precision, with the option exposed in the config file. For the results above I used ap_fixed<18,8> for all the tables in both versions, but you can tune the table sizes separately to get closer still to the Keras ROC curve.
For example, for this ROC curve I set exp_table_t = ap_fixed<18,8>, inv_table_t = ap_fixed<18,4> and get very close to the Keras floating-point evaluated ROC curves:

…ion for Softmax activation.

Add missing BatchNormalization include for ApplyAlpha layer

Absolute imports

…n/out types are different. Add exp_ and inv_ table_t to Softmax layer in config generator.

…into csim_integration

Fixes

veyron8800 · 2020-06-21T20:44:48Z

@thesps I'm attempting to utilize the QKeras support mentioned above, but no matter how I attempt the conversion (old fashion way with the command line or using the API) I get the same exception:

Exception: ERROR: Unsupported layer type: QConv2D

I can confirm that I am attempting the conversion while using the csim_integration branch.

My guess is that the QKeras layers are not being registered here, but I'm sure exactly how that block of code works.

Edit:
Looks like I was just missing some dependencies (pyparsing) which was causing the qkeras_layers.py import to fail. A quick check to see if it is installed before the try-catch block could help others from getting stuck with the same problem, as the catching of ImportError was concealing the problem.

thesps · 2020-06-29T11:03:30Z

Thanks @veyron8800. We think the pyparsing dependency problem is an issue in qkeras. It seems it isn't picked up or installed correctly when you install qkeras. So we're going to look into fixing that at their end. I guess you had installed qkeras in the environment you were using to translate? We're going to clarify in the docs that qkeras package is needed for translation of qkeras models in the docs, and look at making a more useful error. It would need a bit of a rejig of how the layer imports are handled.

So, I'm going to merge this now since it seems the "core" functionality is working well, and we will continue to work on these user-friendliness aspects as we go.

QKeras, predict/trace, API enhancements

vloncar and others added 30 commits February 24, 2020 15:37

Add AP types to template

8018b3b

compile 'n' predict from python (ooh yeah)

e5f09ea

Support prediction of multiple samples

c887454

Add build command to hls_model

e934a0e

Tracing support

ffa303e

Add AP types to template

eddb724

compile 'n' predict from python (ooh yeah)

7aeda11

Support prediction of multiple samples

fc277b1

Add build command to hls_model

36b6833

Tracing support

f32f88f

Merge remote-tracking branch 'origin/csim_integration' into csim_inte…

8e344d1

…gration

Allow tracing whole model

15092fe

API to support loading from live model objects

d88124d

Add visualization utils

70ff0fa

Basic configuration support through API

c2e7729

Clarify ploting function description

d36995e

Implement hsl4ml config command

f353c2f

Preliminary QKeras support in config command

f9b546d

Remove <complex> from header (cases issues on Mac)

df97f3c

Remove traces of Brisk backend (for now)

50a895c

don't use get_ysim_from_hls anymore, trace is now enough ... (#25)

7ba9177

Tiny fixes for config command

b7047e4

fix bugs for comparing output (#26)

04f9e42

Merge remote-tracking branch 'origin' into csim_integration

f10129d

Fix for compiling the csim bridge

69afcb7

Refactor keras_to_hls

b718ad4

Refactor quantization and add more QKeras bits

4013b93

bsah not sh (#27)

7481ea8

QKeras precision inference fix (#28)

f403be3

* bsah not sh * Fix to precision inference from qkeras config

thesps and others added 5 commits May 27, 2020 16:52

Remove 'alpha' from non-QKeras quantizers. Treat binary_tanh activati…

5173511

…on properly. Propagate new Softmax to keras_to_hls.

Make quantizer default to None for WeightVariable. Remove rogue include

0377d34

Merge pull request #30 from thesps/csim_integration

30ca8d8

QKeras updates

Only import QKeras handlers and profiling module if the modules they …

d7ff7b1

…respectively rely on are present.

Merge pull request #31 from thesps/csim_integration

062ef66

Fix imports

Duchstf self-requested a review June 2, 2020 14:33

Add missing BatchNormalization include for ApplyAlpha layer

d7e53de

thesps and others added 13 commits June 4, 2020 19:50

Add tree reduce implementation. Always use expression-balanced summat…

30e803c

…ion for Softmax activation.

Remove tabs, remove dead pragma

66c2df3

Merge pull request #32 from thesps/csim_integration

b9aaac8

Add missing BatchNormalization include for ApplyAlpha layer

Absolute imports

c9478a1

Merge pull request #33 from thesps/csim_integration

3ab2110

Absolute imports

Remain in the same directory after using API calls

2fbfcfc

Remove Python 2 from CI tests

645fae8

Read layer name from config dict

e492ae7

Print layer name in set_closest_reuse_factor

440fcc5

Remove old keras_to_hls converter

c346e67

Use quantizers for conv layers. Don't remove linear activation when i…

668475c

…n/out types are different. Add exp_ and inv_ table_t to Softmax layer in config generator.

Merge branch 'csim_integration' of https://github.com/vloncar/hls4ml …

7507433

…into csim_integration

Merge pull request #34 from thesps/csim_integration

84bc32c

Fixes

vloncar mentioned this pull request Jun 24, 2020

Run into "KeyError: 'name'" when converting. #200

Closed

thesps merged commit 773039f into fastmachinelearning:master Jun 29, 2020

Duchstf mentioned this pull request Jul 12, 2020

Documentation for new API functionalities, and possibly PyPI and example models #205

Merged

thesps mentioned this pull request Jul 24, 2020

Stable softmax #207

Merged

hamzajaved780 mentioned this pull request Nov 19, 2020

Quartus Backend #245

Closed

vloncar deleted the csim_integration branch January 15, 2021 11:06

calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023

Merge pull request fastmachinelearning#195 from vloncar/csim_integration

8d62904

QKeras, predict/trace, API enhancements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QKeras, predict/trace, API enhancements #195

QKeras, predict/trace, API enhancements #195

thesps commented May 28, 2020

thesps commented Jun 4, 2020

veyron8800 commented Jun 21, 2020 •

edited

Loading

thesps commented Jun 29, 2020

QKeras, predict/trace, API enhancements #195

QKeras, predict/trace, API enhancements #195

Conversation

thesps commented May 28, 2020

API

predict, trace

QKeras support

Other

thesps commented Jun 4, 2020

veyron8800 commented Jun 21, 2020 • edited Loading

thesps commented Jun 29, 2020

`predict`, `trace`

veyron8800 commented Jun 21, 2020 •

edited

Loading