Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve audio-to-text pipeline by enabling flash-attention [$750] #71

Closed
rickstaa opened this issue Dec 18, 2024 · 1 comment
Closed

Comments

@rickstaa
Copy link
Member

Overview

We have identified an opportunity to improve the current [audio-to-text](https://github.com/livepeer/go-livepeer/pull/3078/) pipeline in Livepeer AI Network by enabling [flash-attention](https://arxiv.org/abs/2307.08691/) that will speed up the pipeline significantly allowing for faster and almost realtime operation. We are seeking the community and bounty hunters support to quickly implement this optimisation so it can be available to developers working with Livepeer.

Problem

Implementing improved flash_attention to audio-to-text models in Livepeer AI Network.

Desired Solution

Improvement in speed of the model execution for audio-to-textpipeline.

Bounty Requirements

  1. Enable the optimisation on the [existing pipeline](https://github.com/livepeer/ai-worker/blob/main/runner/app/pipelines/audio_to_text.py/) by enabling memory efficient flash attention.
  2. Ensure that devices that don't yet support the optimisation should safely fallback to working Scaled Dot-Product Attention [SDPA](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/) implementation .
  3. Create a separate docker container image similar to [PR #185](Segment anything 2 pipeline image ai-runner#185) to avoid dependencies issues with other pipelines.

Applicant Requirements

  • Proven experience working with deep learning frameworks such as PyTorch, particularly in implementing attention mechanisms and optimising model performance.
  • Strong experience with [Python](https://www.python.org/).

Scope Exclusions

  • None. All areas related to the issue are within scope.

Implementation Tips

  1. Consult the documentation of the flash-attention from [pytorch](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/) to better understand how to enable it in audio-to-text pipeline.
  2. Validate performance improvements in the Flash Attention-enabled pipeline and ensure proper fallback functionality in unsupported devices.

Additional Resources

How to Apply

  1. Express Your Interest: Fill out [this form](https://www.notion.so/13f0a34856878045ba5be0218bc28d3f?pvs=21), making sure to specify the bounty you are interested in
  2. Wait for Review: Our team will review expressions of interest and select the best candidate.
  3. Get Assigned: If selected, we'll contact you and assign the bounty to you.
  4. Start Working: Dive into your task! If you need assistance or guidance, join the discussions in the #developer-lounge channel on our [Discord server](https://discord.gg/livepeer).
  5. Submit Your Work: Create a pull request in the relevant repository and request a review.
  6. Notify Us: Ping us on Discord when you’re pull request is ready for review.
  7. Receive Your Bounty: We'll arrange the bounty payment once your pull request is approved.
  8. Gain Recognition: Your valuable contributions will be showcased in our project's [changelog](https://livepeer-ai.productlane.com/changelog).

Contact Information

For questions or clarifications, please contact: [hans@livepeer.org](mailto:hans@livepeer.org)

@rickstaa
Copy link
Member Author

Completed and paid out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant