You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The above is our current permute shader. Instead of performing subsequent injective operations on the output buffer of permute, we could inline all of the injective operations like so:
you'll want to introduce an IR that keeps track of the size of each tensor and the "type" of each operation
you can coalesce operations with the same "type" - for the example you've given, you have elementwise operations of cos / exp / gelu - you can bundle these into a single node
for this, runtime code generation will be needed for each IR node, as you will no longer know ahead of time what your final execution environment will look like
Crucial and ties into Code Generation.
The above graph demonstrates the success of our current inplacing algorithm.
However, we need to take this a step further and go from
Inplacing
toInlining
.The above is our current permute shader. Instead of performing subsequent injective operations on the output buffer of
permute
, we could inline all of the injective operations like so:This (contrived) example would cause everything to be collapsed to a single node, and is super important.
The text was updated successfully, but these errors were encountered: