Skip to content

Latest commit

 

History

History
105 lines (76 loc) · 3.09 KB

README.md

File metadata and controls

105 lines (76 loc) · 3.09 KB

🤗 Optimum ExecuTorch

Optimize and deploy Hugging Face models with ExecuTorch

Documentation | ExecuTorch | Hugging Face

🚀 Overview

Optimum ExecuTorch enables efficient deployment of transformer models using Meta's ExecuTorch framework. It provides:

  • 🔄 Easy conversion of Hugging Face models to ExecuTorch format
  • ⚡ Optimized inference with hardware-specific optimizations
  • 🤝 Seamless integration with Hugging Face Transformers
  • 📱 Efficient deployment on various devices

⚡ Quick Installation

Install from source:

git clone https://github.com/huggingface/optimum-executorch.git
cd optimum-executorch
pip install .
  • 🔜 Install from pypi coming soon...

🎯 Quick Start

There are two ways to use Optimum ExecuTorch:

Option 1: Export and Load Separately

Step 1: Export your model

Use the CLI tool to convert your model to ExecuTorch format:

optimum-cli export executorch \
    --model "meta-llama/Llama-3.2-1B" \
    --task "text-generation" \
    --recipe "xnnpack" \
    --output_dir="meta_llama3_2_1b"

Step 2: Load and run inference

Use the exported model for text generation:

from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load the exported model
model = ExecuTorchModelForCausalLM.from_pretrained("./meta_llama3_2_1b")

# Initialize tokenizer and generate text
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
generated_text = model.text_generation(
    tokenizer=tokenizer,
    prompt="Simply put, the theory of relativity states that",
    max_seq_len=128
)

Option 2: Python API

from optimum.executorch import ExecuTorchModelForCausalLM
from transformers import AutoTokenizer

# Load and export the model on-the-fly
model_id = "meta-llama/Llama-3.2-1B"
model = ExecuTorchModelForCausalLM.from_pretrained(model_id, recipe="xnnpack")

# Generate text right away
tokenizer = AutoTokenizer.from_pretrained(model_id)
generated_text = model.text_generation(
    tokenizer=tokenizer,
    prompt="Simply put, the theory of relativity states that",
    max_seq_len=128
)

🛠️ Advanced Usage

Check our ExecuTorch GitHub repo directly for:

  • Custom model export configurations
  • Performance optimization guides
  • Deployment guides for Android, iOS, and embedded devices
  • Additional examples

🤝 Contributing

We love your input! We want to make contributing to Optimum ExecuTorch as easy and transparent as possible. Check out our:

📝 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📫 Get in Touch