You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a general purpose issue for evaluating the effectiveness of our multithreading strategy in key areas of the protocol. In particular, in some cases it may be possible to achieve a substantial speedup by simply ensuring that work is more evenly distributed across threads. For example, when performing relation algebra (either in sumcheck or PG) we generally divide the work across threads by evenly distributing the rows. But because the gate types are ordered within the polynomials, the work for expensive relations (e.g. aux) can be isolated to a small number of threads while others receive relatively trivial work.
Note: in a similar vain, we saw large improvements in performance in the structured trace setting once we accounted for "inactive rows" in our multithreading strategy. When possible, we generally distribute work such that each thread receives an equal number of active rows, rather than naively dividing the domain evenly. This avoids the realistic scenario where some threads receive a set of entirely inactive rows, while others perform exclusively expensive relations.
A first step might be to evaluate just how unevenly the work is being distributed in performance critical areas.
The text was updated successfully, but these errors were encountered:
This is a general purpose issue for evaluating the effectiveness of our multithreading strategy in key areas of the protocol. In particular, in some cases it may be possible to achieve a substantial speedup by simply ensuring that work is more evenly distributed across threads. For example, when performing relation algebra (either in sumcheck or PG) we generally divide the work across threads by evenly distributing the rows. But because the gate types are ordered within the polynomials, the work for expensive relations (e.g. aux) can be isolated to a small number of threads while others receive relatively trivial work.
Note: in a similar vain, we saw large improvements in performance in the structured trace setting once we accounted for "inactive rows" in our multithreading strategy. When possible, we generally distribute work such that each thread receives an equal number of active rows, rather than naively dividing the domain evenly. This avoids the realistic scenario where some threads receive a set of entirely inactive rows, while others perform exclusively expensive relations.
A first step might be to evaluate just how unevenly the work is being distributed in performance critical areas.
The text was updated successfully, but these errors were encountered: