Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually unrolled HLS #41

Merged
merged 4 commits into from
Apr 3, 2023
Merged

Manually unrolled HLS #41

merged 4 commits into from
Apr 3, 2023

Conversation

thesps
Copy link
Owner

@thesps thesps commented Mar 30, 2023

This PR is inspired by #30.

In the main branch, the loop over ensemble trees in the HLS is like this (simplified):

for(int i = 0; i < n_trees; i++){
    tree_scores[i] = trees[i].decision_function(x);
}

Vivado/Vitis HLS seems to have a hard time unrolling this loop, for an unknown reason. In this PR we manually unroll this loop in the backend writer instead. So for a model with 5 trees this looks like (simplified):

  scores[0] = tree_0.decision_function(x);
  scores[1] = tree_1.decision_function(x);
  scores[2] = tree_2.decision_function(x);
  scores[3] = tree_3.decision_function(x);
  scores[4] = tree_4.decision_function(x);

Now we also need to create instances of each tree (tree_0, tree_1 etc) instead of an array of them in parameters.h, and create one model specific function with the unrolled inference code.

The impact is:

  • reduced resources (including after Vivado logic synthesis)
  • equal or slightly increased latency
  • much reduced C Synthesis time (6 hours reduced to 12 minutes in the best case)
  • increased C Synthesis memory usage

For large models (e.g. in the scan referenced below with 100 trees and depth of 8) master branch may not be able to synthesize the model while this PR can.
The old implementation with a for loop in the C++ is still available, with a new configuration parameter 'Unroll' made available to select between the implementations (defaults to True).

For reference, here are some plots from a scan of randomized trees (blue markers are master branch, red markers are this PR):

Screenshot 2023-03-30 at 15 09 48 Screenshot 2023-03-30 at 15 11 01
Screenshot 2023-03-30 at 14 58 17 Screenshot 2023-03-30 at 14 58 25

@brightneedle
Copy link

brightneedle commented Oct 10, 2023

Is there a particular version of XGBoost/Scikit-Learn used here?

Trying out the example pruned_xgboost_to_hls.py:

With XGBoost 2.0.0 I get AttributeError: 'list' object has no attribute 'get'
With XGBoost 1.7.6 this changes to in addParentAndDepth parents[j] = i IndexError: list assignment index out of range

In either case I have not been able to get pruned trees to compile.

@thesps
Copy link
Owner Author

thesps commented Oct 10, 2023

This is likely an issue with the XGBoost converter rather than the pruned model support. I can confirm though that this was tested with XGBoost 1.7.5. I just tried with 2.0.0 and I see the same issue as you with the XGBoost converter tests. I'm not sure about the issue you see with 1.7.6, we can look into that. It might be a corner case only in that model.

I'll open a new issue about updating the XGBoost converter again for 2.0.0

@thesps thesps mentioned this pull request Oct 10, 2023
@brightneedle
Copy link

Thanks @thesps! In the mean time can you confirm that the XGBoost converter works for pruned trees with 1.7.5? I still get parents[j] = i IndexError: list assignment index out of range with pruned_xgboost_to_hls.py

For completeness I use scikit-learn==1.3.0, conifer==1.2

@thesps
Copy link
Owner Author

thesps commented Oct 13, 2023

Thanks for the info, I checked that example as well and I replicated your error. I haven't worked out the fix yet but I can give some insight into the issue:
Since the error comes from the XilinxHLS backend, I converted the model with no configuration (Python backend) which is available in master branch but not yet in a release (will be in v1.3). Then I check if there are some trees with child indices out of range of the number of nodes:

model = conifer.converters.convert_from_xgboost(bst)
np.any([(np.any(np.array(tree.children_left) > tree.n_nodes()), np.any(np.array(tree.children_right) > tree.n_nodes())) for trees in model.trees for tree in trees])
True

So somehow the xgboost converter has read those out of range values from the model. This is where those child indices are set: https://github.com/thesps/conifer/blob/master/conifer/converters/xgboost.py#L51-L52

I will keep investigating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants