Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QKeras, predict/trace, API enhancements #195

Merged
merged 66 commits into from
Jun 29, 2020

Conversation

thesps
Copy link
Contributor

@thesps thesps commented May 28, 2020

This is a big one...
There are three significant new things in this PR: QKeras support, predict, and enhancements to the API. Both predict and expanded API were essential for the QKeras support.
In reverse order:

API

One can do much more by importing hls4ml than before. Previously this stopped at, basically:

import hls4ml
import yaml; cfg = open('keras-config.yml'); cfg = yaml.load(cfg)
hls_model = hls4ml.converters.keras_to_hls(cfg)

Now it's much richer and we can do:

import tensorflow as tf
import hls4ml
# load a model
kmodel = tf.keras.models.load_model('my_model.h5')
# get the 'HLSConfig' part of the config 'file' (now just dictionary)
hls_cfg = hls4ml.utils.config_from_keras_model(kmodel, granularity='name')
# Modify the precision and reuse if desired

# create the model object. Parameters that were normally outside the 'HLSConfig' part of the file are now function arguments
hls_model = hls4ml.converters.convert_from_keras_model(model, output_dir='my-hls-test', hls_config = hls_cfg, fpga_part=...)
# write the project, but also compile the csim library (more on that below)
hls_model.compile()
# run c-synthesis, logic synthesis
hls_model.build(synth=True, vsynth=True) 

All the 'old' ways of working still apply, so hls4ml convert -c ..., hls4ml build -cs -p ... from the command line still work. And the API allows the 'old' way above: keras_to_hls with a config file returning an HLSModel (but you can do more with the model now).

predict, trace

This is the reason the branch is called csim_integration. Now with an HLSModel object (created as above with the API, for example), one can do:

y = hls_model.predict(X)

It's pretty self explanatory, but awesome. X is an array object just as you would use to predict with the Keras (or whatever) Python model, and the returned y is an array as well. Then you can compare your predictions vs. the original float model with ease, make ROC curves, etc.
Technically this is implemented by compiling the HLS project (together with the ap_fixed and other headers) into a shared object, and binding it to Python using ctypes. So what you get is "csim" but without running Vivado, and in fact this works without an installation of Vivado on the system(!).

trace is the related capability to get the output from individual layers from the model for lower-level debugging. This is useful when using custom precision throughout the model for example.
It's used like: trace = hls_model.trace(X). The returned object is a dictionary. The names of the (HLS) model layers are used as keys, and each one points to the array captured at the output of that layer.
A utility has been added to the profiling to get the equivalent for a Keras model: trace = hls4ml.model.profiling.get_ymodel(keras_model, X).
hls4ml does things like separating activations out of Dense layers into their own layers, and this utility does that too to make it easier to make comparisons.

QKeras support

We can now support QKeras models using most of the quantizers that QKeras has. Some more detail can be obtained from this presentation.
When a Keras model uses QDense, QConv, etc layers, the quantizer is extracted and used to quantize the weights. The HLS data type is set according to the settings of the quantizer.
The QKeras quantizer alpha parameter is handled with a new layer ApplyAlpha (uses BatchNorm for inference) to scale the weights after the Dense or Conv layer. The proper conversion is handled at the config-dictionary level, and with new Optimizer passes. To get proper performance, users should use the hls4ml config file utility demonstrated above:

hls_cfg = hls4ml.utils.config_from_keras_model(qkeras_model, granularity='name')
hls_model = hls4ml.converters.convert_from_keras_model(qkeras_model, ...)

Other

There is a new implementation of Softmax activation. The existing implementation had a few hard coded constants so was difficult to configure with different data types, and would occasionally bug and output the largest value for the smallest input value (I think a saturation issue).
The new version was also made more configurable, with the possibility for different types for the e^x and 1/x tables. I will follow up this PR with a comment benchmarking the performance, but I've found it to be smaller, faster and more accurate.

vloncar and others added 30 commits February 24, 2020 15:37
* add comparison support for each layer's ouput

* clean up a bit

* take ouput of layer and activation separately

* API changes and add distribution plot

* new method for normalizing the difference

* mistakenly overwrote file when merge

* complete support for comparing ouputs with trace, convert hls_model's outputs to numpy array

* change codes according to comments

* use np.asarray and fix bug on compiling
* bsah not sh

* Fix to precision inference from qkeras config
@Duchstf Duchstf self-requested a review June 2, 2020 14:33
@thesps
Copy link
Contributor Author

thesps commented Jun 4, 2020

I promised some more detail on the new Softmax implementation, here it is.

I created a jupyter notebook that you can see here: https://gist.github.com/thesps/c7e59cf5597804693b8663ddc9496b64

To summarise what I did:

  • Loaded a model trained on the jet tagging dataset
  • Extract the output array from the layer before the softmax activation with hls4ml.model.profiling.get_ymodel_keras, sampled with the test data
  • Create a new Keras model with only a Softmax layer (no need to train)
  • Simulate and synthesize this model with hls4ml (using the data captured from Keras)
  • Copy the nnet_activation.h from hls4ml v0.2.0 into the PR 195 environment (so everything else stays at PR 195 level)
  • Simulate and synthesize the model again (using the same data captured from Keras)
  • Make plots

So just to make it clear, I've isolated only the Softmax layer, and I'm running inference with the same data each time, extracted from Keras.
Here you can see the residual between Keras (CPU evaluation) and hls_model.predict aka CSIM for v0.2.0 and PR195 implementations:

Regular Log axis

I don't think either is "perfect" but the new version is definitely more peaked around 0.

Perhaps more usefully are the ROC curves where the difference is a bit more visible:
ROC

And finally, the synthesis results (latency in clock cycles, resources in absolute numbers).

Version Latency BRAM DSP FF LUT
PR195 5 4 5 240 123
v0.2.0 6 13 0 757 4561

Generally the new implementation will use, for N classes: N DSPs and ceil(N/2) + 1 BRAMs assuming 18 bits used for the tables.

The other new feature is that the two look up tables can use different precision, with the option exposed in the config file. For the results above I used ap_fixed<18,8> for all the tables in both versions, but you can tune the table sizes separately to get closer still to the Keras ROC curve.
For example, for this ROC curve I set exp_table_t = ap_fixed<18,8>, inv_table_t = ap_fixed<18,4> and get very close to the Keras floating-point evaluated ROC curves:
ROC

@veyron8800
Copy link
Contributor

veyron8800 commented Jun 21, 2020

@thesps I'm attempting to utilize the QKeras support mentioned above, but no matter how I attempt the conversion (old fashion way with the command line or using the API) I get the same exception:

Exception: ERROR: Unsupported layer type: QConv2D

I can confirm that I am attempting the conversion while using the csim_integration branch.

My guess is that the QKeras layers are not being registered here, but I'm sure exactly how that block of code works.

Edit:
Looks like I was just missing some dependencies (pyparsing) which was causing the qkeras_layers.py import to fail. A quick check to see if it is installed before the try-catch block could help others from getting stuck with the same problem, as the catching of ImportError was concealing the problem.

@thesps
Copy link
Contributor Author

thesps commented Jun 29, 2020

Thanks @veyron8800. We think the pyparsing dependency problem is an issue in qkeras. It seems it isn't picked up or installed correctly when you install qkeras. So we're going to look into fixing that at their end. I guess you had installed qkeras in the environment you were using to translate? We're going to clarify in the docs that qkeras package is needed for translation of qkeras models in the docs, and look at making a more useful error. It would need a bit of a rejig of how the layer imports are handled.

So, I'm going to merge this now since it seems the "core" functionality is working well, and we will continue to work on these user-friendliness aspects as we go.

@thesps thesps merged commit 773039f into fastmachinelearning:master Jun 29, 2020
@thesps thesps mentioned this pull request Jul 24, 2020
@hamzajaved780 hamzajaved780 mentioned this pull request Nov 19, 2020
@vloncar vloncar deleted the csim_integration branch January 15, 2021 11:06
calad0i pushed a commit to calad0i/hls4ml that referenced this pull request Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants