Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler cleanliness patches and incorporation of HLS APT #1470

Merged
merged 21 commits into from
Sep 25, 2024

Conversation

tomeichlersmith
Copy link
Member

replace this with a while loop instead to be more clear about the possibility of rounding errors affecting performance

For some reason, this only was shown on trunk after merging in unrelated (but in TrigScint?) code #1461 .

Check List

  • I successfully compiled ldmx-sw with my developments
  • I ran my developments and the following shows that they are successful.

replace this with a while loop instead to be more clear about the possibility of rounding errors affecting performance
@tomeichlersmith
Copy link
Member Author

Canceled the PR Validation since I am just making sure this builds with warnings-as-errors.

@tvami
Copy link
Member

tvami commented Sep 20, 2024

For some reason, this only was shown on trunk

Is it because the branch Rory used was not rebased to the version of trunk that had my update for the CI?

@tomeichlersmith
Copy link
Member Author

That's why I think this wasn't caught while test-building that branch, but I still don't know why this wasn't caught in previous rounds of more strict building since this code has been there for awhile. 🤷

Doesn't really matter, this is an easy fix.

@tomeichlersmith
Copy link
Member Author

crap, this just revealed more issues

https://github.com/LDMX-Software/ldmx-sw/actions/runs/10960598567/job/30435575518?pr=1470

the good news is that it builds and runs when not treating warnings-as-errors and so its not super urgent

- update digis{1,2,3} collection variable names
- remove useless (int) casts
- remove unused int index = count variables
@tvami
Copy link
Member

tvami commented Sep 20, 2024

Some of the new stuff is in HLS_arbitrary_Precision_Types, this line https://github.com/LDMX-Software/ldmx-sw/blob/trunk/CMakeLists.txt#L37C36-L37C65 was supressing those in the past but I think with this addition

setup_library(module TrigScint name Event
dependencies ROOT::Core
Hcal::Event
TrigScint::Event
Recon::Event
register_target)

it's resurfucing again

@tomeichlersmith
Copy link
Member Author

I fixed up our internal stuff, all that's left is the HLS stuff.

/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2109:32: warning: The left operand of '<<' is a garbage value [clang-analyzer-core.UndefinedBinaryOperatorResult]
            ? ((((int64_t)VAL) << (excess_bits)) >> (excess_bits))
                               ^
/home/tom/code/ldmx/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:293:14: note: Calling 'operator+<12, true, 12, true>'
  float pe = outTrk.Pad1.Seed.Amp + outTrk.Pad1.Sec.Amp;
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:1320:1: note: Calling constructor for 'ap_int_base<13, true>'
OP_BIN_AP(+, plus)
^
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:1310:37: note: expanded from macro 'OP_BIN_AP'
        _AP_W2, _AP_S2>::Rty##_base lhs(op);                                  \
                                    ^~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:179:10: note: Calling default constructor for 'ssdm_int_sim<13, true>'
  INLINE ap_int_base(const ap_int_base<_AP_W2, _AP_S2>& op) {
         ^~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_common.h:248:3: note: Calling default constructor for 'ap_private<13, true, true>'
  ssdm_int_sim() {}
  ^~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:1595:5: note: Calling 'ap_private::clearUnusedBits'
    clearUnusedBits();
    ^~~~~~~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2108:9: note: '?' condition is true
        _AP_S
        ^
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2109:32: note: The left operand of '<<' is a garbage value
            ? ((((int64_t)VAL) << (excess_bits)) >> (excess_bits))
                          ~~~  ^

which, as you point out @tvami , can probably be silenced by informing CMake that the HLS headers are not our problem.

github-actions bot and others added 2 commits September 20, 2024 14:49
we include them as a SYSTEM include in central CMakeLists.txt
and specifying them as SYSTEM informs CMake and its downstream compilers
to avoid analyzing them since they aren't our problem
@tvami
Copy link
Member

tvami commented Sep 24, 2024

So in the end, this is not just something we can hide under the rug, there is some issue in the input in

TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx

the track level seems fine
but either at the outTrk.Pad1 or outTrk.Pad1.Seed or outTrk.Pad1.Seed.Amp (same with the Sec) there is something uninitialized. I didnt find it tho :(

@tomeichlersmith
Copy link
Member Author

tomeichlersmith commented Sep 24, 2024

This gave me the idea to just use {} within the struct declarations telling C++ to use value initialization which avoids nonsense (random memory) values. e.g.

struct Digi {
int mID, bID;
int adc0, adc1, adc2, adc3, adc4, adc5;
int tdc0, tdc1, tdc2, tdc3, tdc4, tdc5;
};

I /think/ this got past the error on my local computer so I'm pushing it here to double check.

Edit: nevermind. This didn't work locally.

@tvami
Copy link
Member

tvami commented Sep 24, 2024

I was thinking if we could just set the defaults it would fix it, but given that this is C-style, I dont think we can do that. I think having

ap_int<12> mID{0}, bID{-1};

would actually solve the problem, but we cant do that, right?

@tomeichlersmith
Copy link
Member Author

I don't know if this code needs to be C-style or if that was just most convenient when trying to write the emulation.

@tvami
Copy link
Member

tvami commented Sep 24, 2024

OK, I tried, it doesnt work anyway... :(

@tvami
Copy link
Member

tvami commented Sep 24, 2024

I tried if getting rid of calcTCent would resolve it, but then I get this (meaning that the issue roots in constructing the lookup table )

/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:63:3: note: Loop condition is true.  Entering loop body
  for (int i = 0; i < NCENT; i++) {
  ^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:64:5: note: Loop condition is true.  Entering loop body
    for (int j = 0; j < COMBO; j++) {
    ^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:65:26: note: Calling 'operator-<12, true>'
      LOOKUP[i][j][0] = (i - A[1] + A[0]);
                         ^~~~~~~~

@tvami
Copy link
Member

tvami commented Sep 24, 2024

And honestly I dont understand what's happening here

  ap_int<12> A[3] = {0, 0, 0};
  ap_int<12> LOOKUP[NCENT][COMBO][2];

  LOOKUP[i][j][0] = (i - A[1] + A[0]);

isnt A[1] the same as A[0] and both are zeros?

@rodwyer100
Copy link
Contributor

A[i] is an alignment matrix. The idea is that the three layers (Pad1, Pad2, and Pad3) may be translated w.r.t eachother and said translation may be determined by the vector A as we only care how they are misaligned in one axis (the axis of granularity of the TS). The factor (i - A[1] + A[0]) only becomes nontrivial if A[1] and A[0] are different indicating an initially misaligned state. This will be important when we first start checking LDMX because if there is alignment issues we wont see it without scanning A.

@tvami
Copy link
Member

tvami commented Sep 24, 2024

I see, thanks @rodwyer100 !
Would it be ok to initialize the LOOKUP table to all 0 before you fill in the real values?

@tvami
Copy link
Member

tvami commented Sep 25, 2024

ehh

/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:62:3: note: Loop condition is true.  Entering loop body
  for (int i = 0; i < NCENT; ++i) {
  ^
    for (int j = 0; j < COMBO; ++j) {
    ^
      for (int k = 0; k < 2; ++k) {
      ^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:65:31: note: Calling constructor for 'ap_int<12>'
            LOOKUP[i][j][k] = ap_int<12>(0);
                              ^~~~~~~~~~~~~

it doesnt like anything with ap_int<12> constructor.

@tvami
Copy link
Member

tvami commented Sep 25, 2024

hey @tomeichlersmith @rodwyer100 @bryngemark
At this point I think we should bring the HLS submodule to be part of the ldmx-sw, and fix the VAL thing (I did it locally and it resolves it). The HLS repo was last touched 5 years ago, and it's not very big [funny thing that Shazhad from CMS did try to patch it a year ago -- the PR is still unmerged].
I also think it would have a natural place in Framework or under Tools, given that it's used both by Trigger and TrigScint

@rodwyer100
Copy link
Contributor

It should be okay to initialize it with certain values. I didn't understand your most current comment. Did that not work?

@rodwyer100
Copy link
Contributor

rodwyer100 commented Sep 25, 2024

Sorry now your second to most current. Edit: Ah I see. It doesn't like the ap_int<12> instantiation. I don't see why it wouldn't; Ill look at it but it should be allowed.

Copy link
Member

@tvami tvami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pushed in some changes that I explored to fix the problem. Although they did not fix it, I think it still makes sense to have them in, and this PR could be a general fixup PR. I tagged Rory for some specific questions below.

Otherwise I think we should merge this, have a separate PR for the inclusion of the HLS stuff

TrigScint/include/TrigScint/Firmware/objdef.h Show resolved Hide resolved
for (int i = 0; i < NCENT; ++i) {
for (int j = 0; j < COMBO; ++j) {
for (int k = 0; k < 2; ++k) {
LOOKUP[i][j][k] = ap_int<12>(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope putting this to zero is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clone this branch and check gimme a sec. The logic in the array does mean some default values will be bad, but 0 may be an okay one.

float one = (float)c.Pad1.Cent;
float two = (float)c.Pad2.Cent;
float three = (float)c.Pad3.Cent;
float mean = (one + two + three) / 3.0;
ap_int<12> Cent = (ap_int<10>)((int)(mean));
ap_int<12> Cent = (ap_int<12>)((int)(mean));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think this should be as I change it, but please check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not significantly affect the firmware part of this equation. If it agrees still on track positions, it will also work. I will clone this branch and check gimme a sec.

@rodwyer100
Copy link
Contributor

So something has been scrambled. I don't know which change did it, but I will need to inspect things before this gets commited. The track number distribution is very high; it is producing a high fake rate.

@rodwyer100
Copy link
Contributor

image
I will need to look event by event to see why this occurs.

@tomeichlersmith
Copy link
Member Author

Personally, I would rather keep a fork of HLS_arbitrary_precision and just redirect our submodule to our fork.

Similar to G4DarkBreM, I just think there is a possibility that this code would be used in other applications (LDMX or not) and so keeping it separate would be helpful.

@tvami
Copy link
Member

tvami commented Sep 25, 2024

rather keep a fork of HLS_arbitrary_precision

but it's such a small repo... having it as a submodule has the advantage that if it changes we can just update it whenever we think it's needed. But this did not and does not change even when it should (see the PR from Shahzad a year ago). And then for your argument about using in other applications: for outside ldmx I doubt people would take an ldmx-sw fork instead of having their own fork. And then for inside ldmx, isnt that an argument to have it in our software directly?

@tomeichlersmith
Copy link
Member Author

Yea.. you've convinced me

I'll put them in Tools and update the CMake

@tomeichlersmith tomeichlersmith changed the title dont use a float as a for-loop counter compiler cleanliness patches and incorporation of HSL APT Sep 25, 2024
@tomeichlersmith tomeichlersmith changed the title compiler cleanliness patches and incorporation of HSL APT compiler cleanliness patches and incorporation of HLS APT Sep 25, 2024
@tomeichlersmith tomeichlersmith marked this pull request as ready for review September 25, 2024 19:26
Copy link
Member

@tvami tvami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We are now back to be able to compile with the strict requirements. I hope Rory can find what change changed the track distributions

@tomeichlersmith
Copy link
Member Author

@rodwyer100 I am going to merge this even though it certainly contains the issues that you have observed just because there is a lot of code being introduced. I think your other PR #1473 can be a location for patches while you are introducing the hit stagger.

@tomeichlersmith tomeichlersmith merged commit ebbf27a into trunk Sep 25, 2024
11 of 16 checks passed
@tomeichlersmith tomeichlersmith deleted the patch-float-for-loop-counter branch September 25, 2024 20:11
@rodwyer100
Copy link
Contributor

I figured out what happened. Its a small change thats needed which won't change what was done. I will include it in my hit branch alongside a rebased branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants