Can FlexAttention Optimize Masks for Large Table Constraints? #106

RaphaelMouravieff · 2025-01-15T13:47:36Z

Hello,

I’m currently working on fitting large tables into encoder models. To achieve this, I flatten the table into a text and calculate self-attention across the entire text. However, for very large tables, this approach becomes infeasible due to memory and computational constraints.

To address this, I aim to constrain the attention to specific parts of the table, such as tokens within the same row or column. My idea is to use a custom attention mask that limits interactions to these regions, reducing the complexity.

Given that Attention Gym supports optimized attention implementations with efficient memory usage and time-saving mechanisms, I was wondering:
• Do you think a mask of this nature (row- or column-constrained attention) could benefit from your implementation?
• Is it a problem if mask are different for every tables?

Thank you in advance !

Chillee · 2025-01-18T01:21:02Z

@RaphaelMouravieff Yes that sounds like it should work well!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can FlexAttention Optimize Masks for Large Table Constraints? #106

Can FlexAttention Optimize Masks for Large Table Constraints? #106

RaphaelMouravieff commented Jan 15, 2025

Chillee commented Jan 18, 2025

Can FlexAttention Optimize Masks for Large Table Constraints? #106

Can FlexAttention Optimize Masks for Large Table Constraints? #106

Comments

RaphaelMouravieff commented Jan 15, 2025

Chillee commented Jan 18, 2025