You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My question refer to file faiss/faiss/gpu/impl/Distances.cu lines 230 - 250.
A few notes to agree on:
The query norms are not required to select a query's k nearest neighbors, since it is a constant for each "centroid"
The "distanceBufView" object covers a large portion of memory
If we want to compute the exact distances for all elements (ignoreOutDistances = False), then we must run over distanceBufView twice.
Running over the distanceBufView is "slow"
If we agree with the above statements, I am prepared to ask my question: Why does runL2SelectMin not optionally include the queryNormNiew in its operation? I can understand how we might save some computation by removing an extra addition term. However, if this was the case the if-statement for "ignoreOutDistances" can just switch between the two versions of runL2SelectMin function: one with the queryNorms and one without them.
Okay, so people are busy and it's possible this optimization is just not worth people's time. I am wondering if this design was more deliberate.
I am thinking there is a more significant (maybe CUDA-related) reason to separate the queryNorms addition? Specifically, say we include the additional vector "queryNorm" in the "l2SelectMinK" function in the gpu/impl/L2Select.cu file. Maybe the inclusion of an additional vector impacts hardware utilization of warps, since they operate under some locality assumptions? By including this additional vector (like queryNorms), we might overflow some mysterious, behind-the-scenes CUDA-cache? I am far from a CUDA expert, so I am merely guessing here.
Any insight would be helpful. Thank you in advance.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
My question refer to file faiss/faiss/gpu/impl/Distances.cu lines 230 - 250.
A few notes to agree on:
If we agree with the above statements, I am prepared to ask my question: Why does runL2SelectMin not optionally include the queryNormNiew in its operation? I can understand how we might save some computation by removing an extra addition term. However, if this was the case the if-statement for "ignoreOutDistances" can just switch between the two versions of runL2SelectMin function: one with the queryNorms and one without them.
Okay, so people are busy and it's possible this optimization is just not worth people's time. I am wondering if this design was more deliberate.
I am thinking there is a more significant (maybe CUDA-related) reason to separate the queryNorms addition? Specifically, say we include the additional vector "queryNorm" in the "l2SelectMinK" function in the gpu/impl/L2Select.cu file. Maybe the inclusion of an additional vector impacts hardware utilization of warps, since they operate under some locality assumptions? By including this additional vector (like queryNorms), we might overflow some mysterious, behind-the-scenes CUDA-cache? I am far from a CUDA expert, so I am merely guessing here.
Any insight would be helpful. Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions