ring_flash_attn forward compatible with FA>=2.7.0 #364

DefTruth · 2024-11-26T07:43:20Z

make ring_flash_attn forward compatible with FA>=2.7.0, _flash_attn_forward interface were changed after v2.7.0.

before

https://github.com/Dao-AILab/flash-attention/blob/418d677192b483dfc1decfdf9aadca40b402485d/flash_attn/flash_attn_interface.py#L48

def _flash_attn_forward(
    q, k, v, dropout_p, softmax_scale, causal, window_size, softcap, alibi_slopes, return_softmax
):
    q, k, v = [maybe_contiguous(x) for x in (q, k, v)]
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
        q,
        k,
        v,
        None,
        alibi_slopes,
        dropout_p,
        softmax_scale,
        causal,
        window_size[0],
        window_size[1],
        softcap,
        return_softmax,
        None,
    )
    return out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state

after

https://github.com/Dao-AILab/flash-attention/blob/c555642172e281cae6da8a6cff4dfd9ff678ae85/flash_attn/flash_attn_interface.py#L77

@_torch_custom_op_wrapper("flash_attn::_flash_attn_forward", mutates_args=(), device_types="cuda")
def _flash_attn_forward(
    q: torch.Tensor,
    k: torch.Tensor,
    v: torch.Tensor,
    dropout_p: float,
    softmax_scale: float,
    causal: bool,
    window_size_left: int,
    window_size_right: int,
    softcap: float,
    alibi_slopes: Optional[torch.Tensor],
    return_softmax: bool
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    q, k, v = [maybe_contiguous(x) for x in (q, k, v)]
    out, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
        q,
        k,
        v,
        None,
        alibi_slopes,
        dropout_p,
        softmax_scale,
        causal,
        window_size_left,
        window_size_right,
        softcap,
        return_softmax,
        None,
    )
    return out, softmax_lse, S_dmask, rng_state

feifeibear

Thanks! I also noticed the bug.

DefTruth · 2024-11-26T07:54:46Z

This looks like a compromise so that FA can be torch.compile into a fullgraph

make ring_flash_attn forward compatible with FA>=2.7.0

82dbd7a

feifeibear approved these changes Nov 26, 2024

View reviewed changes

feifeibear merged commit a7bd749 into xdit-project:main Nov 26, 2024
2 of 3 checks passed

DefTruth deleted the forward-compat-fa branch November 28, 2024 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ring_flash_attn forward compatible with FA>=2.7.0 #364

ring_flash_attn forward compatible with FA>=2.7.0 #364

DefTruth commented Nov 26, 2024 •

edited

Loading

feifeibear left a comment

DefTruth commented Nov 26, 2024

ring_flash_attn forward compatible with FA>=2.7.0 #364

ring_flash_attn forward compatible with FA>=2.7.0 #364

Conversation

DefTruth commented Nov 26, 2024 • edited Loading

feifeibear left a comment

Choose a reason for hiding this comment

DefTruth commented Nov 26, 2024

DefTruth commented Nov 26, 2024 •

edited

Loading