Expression balancing #66

thesps · 2024-04-26T14:44:19Z

The implementation of summation over tree scores in HLS backend prevents automatic expression balancing when saturation and rounding are used in the score precision type.

    Trees:
    for(int i = 0; i < n_trees; i++){
      Classes:
      for(int j = 0; j < fn_classes(n_classes); j++){
        score[j] += scores[i][j];
      }
    }

This should be replaced with a balanced tree reduce implementation.

francescobrivio · 2024-05-01T14:55:44Z

Hi @thesps!

I think I have a possible implementation of the solution in francescobrivio@341627c (but further testing is needed on this preliminary implementation!) which drastically reduce the latency in case the AP_SAT flag is used in the ScorePrecision.
However this comes at the cost of a slight increase in resources usage due to the need of "inverting"
scores[n_trees][fn_classes(n_classes)] (in order to apply the tree reduce method).

So I think this can potentially be further improved if we can change this line:

conifer/conifer/backends/xilinxhls/firmware/BDT_unrolled.h

Line 103 in bfe3d18

    
           void tree_scores(input_t x, score_t scores[n_trees][fn_classes(n_classes)]) const;

to fill the scores in "reversed order", i.e. score_t scores[fn_classes(n_classes)][n_trees], but I'm not sure if this requires much deeper changes in the code and/or if it's acceptable :D

What do you think?

francescobrivio · 2024-05-01T15:50:52Z

I have run a few tests using the same identical xgb model and changing the Conifer "version" and the ScorePrecision:

Conifer version	ScorePrecision	vsynth LUT	vsynth FF	Latency
`master`	`ap_fixed<11,4,AP_RND_CONV,AP_SAT>`	51936	5032	133
My commit	`ap_fixed<11,4,AP_RND_CONV,AP_SAT>`	51163	4489	9
--------	--------	--------	--------	--------
`master`	`ap_fixed<11,4,AP_RND_CONV>`	43329	3542	4
My commit	`ap_fixed<11,4,AP_RND_CONV>`	42992	4504	5

So I have to correct my previous post: in case of AP_SAT my commit shows better latency and resources, while it has a slightly negative impact on both when AP_SAT is not used (to be improved).

Note:
Please note that I still have to run the inference on few events to validate the conifer scores with respect to xgb.

thesps · 2024-05-02T07:27:49Z

Thanks for the development @francescobrivio this looks really promising! I think your proposal to swap the order of the indices of the scores array makes complete sense to avoid the inversion in silico and there should be no problem doing that.

As well as changing the BDT_rolled.cpp HLS code, you would need to change the Python writer for the fully unrolled optimization, that should be constrained to this line, I think:

newline += f'  scores[{it}][{ic}] = tree_{it}_{ic}.decision_function(x);\n'

francescobrivio · 2024-05-02T15:44:36Z

Thanks for the feedback Sioni!
Ok I have implemented the "swapping" of the scores array and here are the updated results:

Conifer version	ScorePrecision	vsynth LUT	vsynth FF	Latency
`master`	`ap_fixed<11,4,AP_RND_CONV,AP_SAT>`	51936	5032	133
My commit	`ap_fixed<11,4,AP_RND_CONV,AP_SAT>`	51163	4489	9
My commit V2	`ap_fixed<11,4,AP_RND_CONV,AP_SAT>`	51163	4489	9
--------	--------	--------	--------	--------
`master`	`ap_fixed<11,4,AP_RND_CONV>`	43329	3542	4
My commit	`ap_fixed<11,4,AP_RND_CONV>`	42992	4504	5
My commit V2	`ap_fixed<11,4,AP_RND_CONV>`	42988	3800	4

Considerations:

In case of using AP_SAT: the result with the swapped scores is exactly identical...I guess Vitis is smart enough to figure it out?
In case of not using AP_SAT: avoiding to swap scores does bring improvements in both latency and resources
Overall with the latest changes ("V2") there is gain everywhere except for the usage of FF in no-AP_SAT case, but given FF are the most abundant resource I think we can afford it?

@thesps if you have further suggestions I'm happy to try them out! If not, and you agree, I can open a PR with these changes.

thesps · 2024-05-07T08:59:14Z

Nice, this looks great to me. Please go ahead and open a PR 👍

thesps · 2024-05-13T14:00:44Z

Resolved by @francescobrivio in #68

francescobrivio mentioned this issue May 7, 2024

Add expression balancing for xilinxhls backend #68

Merged

thesps closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression balancing #66

Expression balancing #66

thesps commented Apr 26, 2024

francescobrivio commented May 1, 2024

francescobrivio commented May 1, 2024

thesps commented May 2, 2024

francescobrivio commented May 2, 2024

thesps commented May 7, 2024

thesps commented May 13, 2024

Expression balancing #66

Expression balancing #66

Comments

thesps commented Apr 26, 2024

francescobrivio commented May 1, 2024

francescobrivio commented May 1, 2024

thesps commented May 2, 2024

francescobrivio commented May 2, 2024

thesps commented May 7, 2024

thesps commented May 13, 2024