-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add precomputed sums for Unweighted + bug fix #13
Conversation
CC @wasade |
The new logic adds a speedup on 1.8x on a 300k input using a GPU (not tested with CPU). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small question
The failure from the conda issue you had mentioned? |
bool all_zeros=true; | ||
|
||
#pragma acc loop seq | ||
for (uint64_t emb=0; emb<filled_embs; emb++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug in the previous PR.
Should have been filled_embs_els_round, as it is now in line 1181.
Use full emb sums when one side is fully zero in Unweighted.
Similar logic to what we have in Unweighted.
Also fix a memory access bug that could result in a core dump.