LLaVA/README.md at main · haotian-liu/LLaVA #628

irthomasthomas · 2024-02-27T19:11:08Z

LLaVA/README.md at main · haotian-liu/LLaVA

LLaVA/README.md at main · haotian-liu/LLaVA

🌋 LLaVA: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

📢 LLaVA-NeXT Blog Project Page Demo Data Model Zoo

🤝Community Contributions: llama.cpp Colab 🤗Space Replicate AutoGen BakLLaVA

Improved Baselines with Visual Instruction Tuning Paper HF

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Visual Instruction Tuning (NeurIPS 2023, Oral) Paper HF

Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)

Release

[1/30] 🔥 LLaVA-NeXT (LLaVA-1.6) is out! With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the blog post, and explore the demo! Models are available in Model Zoo. Training/eval data and scripts coming soon.
[11/10] LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper
[11/2] LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper
[10/26] 🔥 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts) (script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.
[10/12] Check out the Korean LLaVA (Ko-LLaVA), created by ETRI, who has generously supported our research! 🤗 Demo
[10/5] 🔥 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo. The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here.
[9/26] LLaVA is improved with reinforcement learning from human feedback (RLHF) to improve fact grounding and reduce hallucination. Check out the new SFT and RLHF checkpoints at project LLavA-RLHF.
[9/22] LLaVA is accepted by NeurIPS 2023 as oral presentation, and LLaVA-Med is accepted by NeurIPS 2023 Datasets and Benchmarks Track as spotlight presentation.

More

[11/6] Support Intel dGPU and CPU platforms. More details here.
[10/12] LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!
[10/11] The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here!
[10/10] Roboflow Deep Dive: First Impressions with LLaVA-1.5.
[9/20] We summarize our empirical study of training 33B and 65B LLaVA models in a note. Further, if you are interested in the comprehensive review, evolution and trend of multimodal foundation models, please check out our recent survey paper "Multimodal Foundation Models: From Specialists to General-Purpose Assistants".

[7/19] We release a major upgrade, including support for LLaMA-2, LoRA training, 4-/8-bit inference, higher resolution (336x336), and a lot more. We release LLaVA Bench for benchmarking open-ended visual chat with results from Bard and Bing-Chat. We also support and verify training with RTX 3090 and RTX A6000. Check out LLaVA-from-LLaMA-2, and our model zoo!
[6/26] CVPR 2023 Tutorial on Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4! Please check out Slides Notes YouTube Bilibli.
[6/11] We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.
[6/1] We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.
[5/6] We are releasing LLaVA-Lighting-MPT-7B-preview, based on MPT-7B-Chat! See here for more details.
[5/2] We are releasing LLaVA-Lighting! Train a lite, multimodal GPT-4 with just $40 in 3 hours! See here for more details.
[4/27] Thanks to the community effort, LLaVA-13B with 4-bit quantization allows you to run on a GPU with as few as 12GB VRAM! Try it out here.
[4/17] We released LLaVA: Large Language and Vision Assistant. We propose visual instruction tuning, towards building large language and vision models with GPT-4 level capabilities. Checkout the paper and demo.

Code License

Usage and License Notices: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models for checkpoints trained using the dataset (e.g. Llama community license for LLaMA-2 and Vicuna-v1.5). This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations.

These models outperform, or perform on par with, the state of the art models of similar scale.
In the ever-evolving realm of artificial intelligence, the intersection of language understanding and visual perception has paved the way for groundbreaking multimodal models. We study different components and methods to merge pretrained vision and language models with the goal to build better visual language models.
As part of this first milestone, we release this LLaVA-fork enabling the Mistral-7B & Open-Hermes-2.5 language models to process images. We combine the pretrained LLMs (Vicuna, Mistral and OpenHermes 2.5) and Vision models (CLIP and SigLIP), and further enhance capabilities by finetuning the vision encoder.

Models detailed bellow are available here: https://huggingface.co/agi-collective
The code used is available here: https://github.com/AGI-Collective/Robin/releases/tag/v1.0.0
Also, some related work by our team on aligning multimodal models: https://arxiv.org/abs/2304.13765
LLaVA Architecture Overview
The LLaVA architecture, an acronym for Large Language and Vision Assistant, represents a multimodal Visual Language Model (VLM). At its core, LLaVA integrates a pretrained language model with a pretrained vision encoder, connected through a projection layer. In its original incarnation, the Vicuna model served as the language foundation, while the CLIP ViT-Large from OpenAI assumes the role of the vision encoder.
Building upon this foundation, as part of the first milestone we study the impact of different language models, vision encoders and the effect of finetuning the vision encoder on the performance of our multimodal model. Notably, our journey led us to experiment with the fusion of various versions of the Mistral AI LLM model and the DeepMind SigLip visual encoder.
Architecture Variations
Our model variations are best encapsulated in the table below, outlining the diverse combinations of language models, vision encoders and the fine-tuning strategy.

#459: llama2

### Details

Similarity score: 0.89 - [ ] [llama2](https://ollama.ai/library/llama2)

Llama 2

The most popular model for general use.

265.8K Pulls
Updated 4 weeks ago

Overview

Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

CLI

Open the terminal and run

ollama run llama2

API

Example using curl:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'

API documentation

Memory requirements

7b models generally require at least 8GB of RAM
13b models generally require at least 16GB of RAM
70b models generally require at least 64GB of RAM

If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

Model variants

Chat: fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with -chat in the tags tab.

Example: ollama run llama2
Pre-trained: without the chat fine-tuning. This is tagged as -text in the tags tab.

Example: ollama run llama2:text

By default, Ollama uses 4-bit quantization. To try other quantization levels, please use the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.

References

Suggested labels

{ "label-name": "llama2-model", "description": "A powerful text model for chat, dialogue, and general use.", "repo": "ollama.ai/library/llama2", "confidence": 91.74 }

#625: unsloth/README.md at main · unslothai/unsloth

### Details

Similarity score: 0.88 - [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)

unsloth/README.md at main · unslothai/unsloth

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Gemma 7b	▶️ Start on Colab	2.4x faster	58% less
Mistral 7b	▶️ Start on Colab	2.2x faster	62% less
Llama-2 7b	▶️ Start on Colab	2.2x faster	43% less
TinyLlama	▶️ Start on Colab	3.9x faster	74% less
CodeLlama 34b A100	▶️ Start on Colab	1.9x faster	27% less
Mistral 7b 1xT4	▶️ Start on Kaggle	5x faster*	62% less
DPO - Zephyr	▶️ Start on Colab	1.9x faster	19% less

This conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

🦥 Unsloth.ai News

📣 Gemma 7b on 6T tokens now works. And Gemma 2b notebook
📣 Added conversational notebooks and raw text notebooks
📣 2x faster inference added for all our models
📣 DPO support is now included. More info on DPO
📣 We did a blog with 🤗Hugging Face and are in their official docs! Check out the SFT docs and DPO docs
📣 Download models 4x faster from 🤗Hugging Face. Eg: unsloth/mistral-7b-bnb-4bit

🔗 Links and Resources

Type	Links
📚 Wiki & FAQ	Read Our Wiki
📜 Documentation	Read The Doc
💾 Installation	unsloth/README.md
Twitter (aka X)	Follow us on X
🥇 Benchmarking	Performance Tables
🌐 Released Models	Unsloth Releases
✍️ Blog	Read our Blogs

⭐ Key Features

All kernels written in OpenAI's Triton language. Manual backprop engine.
0% loss in accuracy - no approximation methods - all exact.
No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) Check your GPU! GTX 1070, 1080 works, but is slow.
Works on Linux and Windows via WSL.
Supports 4bit and 16bit QLoRA / LoRA finetuning via bitsandbytes.
Open source trains 5x faster - see Unsloth Pro for 30x faster training!
If you trained a model with 🦥Unsloth, you can use this cool sticker!

🥇 Performance Benchmarking

For the full list of reproducable benchmarking tables, go to our website

1 A100 40GB	🤗Hugging Face	Flash Attention	🦥Unsloth Open Source	🦥Unsloth Pro
Alpaca	1x	1.04x	1.98x	15.64x
LAION Chip2	1x	0.92x	1.61x	20.73x
OASST	1x	1.19x	2.17x	14.83x
Slim Orca	1x	1.18x	2.22x	14.82x

Benchmarking table below was conducted by 🤗Hugging Face.

Free Colab T4	Dataset	🤗Hugging Face	Pytorch 2.1.1	🦥Unsloth	🦥 VRAM reduction
Llama-2 7b	OASST	1x	1.19x	1.95x	-43.3%
Mistral 7b	Alpaca	1x	1.07x	1.56x	-13.7%
Tiny Llama 1.1b	Alpaca	1x	2.06x	3.87x	-73.8%
DPO with Zephyr	Ultra Chat	1x	1.09x	1.55x	-18.6%

View on GitHub

Suggested labels

#494: Awesome-Efficient-LLM: A curated list for Efficient Large Language Models

### Details

Similarity score: 0.88 - [ ] [horseee/Awesome-Efficient-LLM: A curated list for Efficient Large Language Models](https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration)

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models:

Knowledge Distillation
Network Pruning
Quantization
Inference Acceleration
Efficient MOE
Text Compression
Low-Rank Decomposition
Hardware/System Tuning
Survey
Leaderboard
🚀 Updates
Contributing

Inference Acceleration

…
Add your paper here, generate the required format, and submit a pull request.

Updates

Sep 27, 2023: Add tag for papers accepted at NeurIPS'23.
Sep 6, 2023: Add a new subdirectory project/ to organize those projects designed for developing a lightweight LLM.
July 11, 2023: Create a new subdirectory efficient_plm/ for papers applicable to PLMs (such as BERT, BART) but have yet to be verified for their effectiveness on LLMs.

Contributing

If you'd like to include your paper or need to update any details, please feel free to submit a pull request. You can generate the required markdown format for each paper by filling in the information in generate_item.py and execute python generate_item.py. We warmly appreciate your contributions to this list. Alternatively, you can email me with the links to your paper and code, and I would add your paper to the list at my earliest convenience.

URL: https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration

Suggested labels

{ "label-name": "efficient-llm-acceleration", "description": "Inference acceleration techniques for efficient large language models.", "repo": "horseee/Awesome-Efficient-LLM", "confidence": 70.8 }

#317: treaming-llm: Efficient Streaming Language Models with Attention Sinks

### Details

Similarity score: 0.88 - [ ] [mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks](https://github.com/mit-han-lab/streaming-llm)

Usage

Environment Setup

conda create -yn streaming python=3.8
conda activate streaming

pip install torch torchvision torchaudio
pip install transformers==4.33.0 accelerate datasets evaluate wandb scikit-learn scipy sentencepiece

python setup.py develop
Run Streaming Llama Chatbot

CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming
FAQ

What does "working on infinite-length inputs" imply for LLMs?

Handling infinite-length text with LLMs presents challenges. Notably, storing all previous Key and Value (KV) states demands significant memory, and models might struggle to generate text beyond their training sequence length. StreamingLLM addresses this by retaining only the most recent tokens and attention sinks, discarding intermediate tokens. This enables the model to generate coherent text from recent tokens without a cache reset — a capability not seen in earlier methods.

Is the context window of LLMs expanded?

No. The context window remains unchanged. Only the most recent tokens and attention sinks are retained, discarding middle tokens. This means the model can only process the latest tokens. The context window remains constrained by its initial pre-training. For instance, if Llama-2 is pre-trained with a context window of 4096 tokens, then the maximum cache size for StreamingLLM on Llama-2 remains 4096.

Can I input an extensive text, like a book, into StreamingLLM for summarization?

While you can input a lengthy text, the model will only recognize the latest tokens. Thus, if a book is an input, StreamingLLM might only summarize the concluding paragraphs, which might not be very insightful. As emphasized earlier, we neither expand the LLMs' context window nor enhance their long-term memory. StreamingLLM's strength lies in generating fluent text from recent tokens without needing a cache refresh.

What is the ideal use case for StreamingLLM?

StreamingLLM is optimized for streaming applications, such as multi-round dialogues. It's ideal for scenarios where a model needs to operate continually without requiring extensive memory or dependency on past data. An example is a daily assistant based on LLMs. StreamingLLM would let the model function continuously, basing its responses on recent conversations without needing to refresh its cache. Earlier methods would either need a cache reset when the conversation length exceeded the training length (losing recent context) or recompute KV states from recent text history, which can be time-consuming.

irthomasthomas added Algorithms Sorting, Learning or Classifying. All algorithms go here. MachineLearning ML Models, Training and Inference Models LLM and ML model repos and links Papers Research papers Research personal research notes for a topic labels Feb 27, 2024

irthomasthomas mentioned this issue Mar 6, 2024

LMOps/README.md at main · microsoft/LMOps #706

Open

1 task

ShellLM mentioned this issue Apr 22, 2024

AlpacaEval: Revolutionizing Model Evaluation with LLM-Based Automatic Tools #813

Open

1 task

ShellLM mentioned this issue May 9, 2024

DeepSeek-V2: A Strong, Economical, and Efficient MoE LLM of 236B total parameters #831

Open

1 task

ShellLM mentioned this issue Jun 19, 2024

FlagEmbedding: RAG LLMs, Long-Context, Embedding, Reranking models (BGE Reranker.) #838

Open

1 task

This was referenced Aug 1, 2024

Mistral NeMo | Mistral AI | Frontier AI in your hands #851

Open

magpie-ultra - a synthetic dataset for supervised fine-tuning using Llama 3.1 #870

Open

ShellLM mentioned this issue Aug 16, 2024

Announcing Together Inference Engine 2.0 with new Turbo and Lite endpoints #886

Open

1 task

ShellLM mentioned this issue Nov 13, 2024

SambaNova Cloud #948

Open

1 task

ShellLM mentioned this issue Dec 8, 2024

llm-adaptive-attacks/README.md at main · tml-epfl/llm-adaptive-attacks #957

Open

1 task

ShellLM mentioned this issue Jan 9, 2025

awesome-llm-webapps/README.md at main · snowfort-ai/awesome-llm-webapps #970

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA/README.md at main · haotian-liu/LLaVA #628

LLaVA/README.md at main · haotian-liu/LLaVA #628

irthomasthomas commented Feb 27, 2024

irthomasthomas commented Feb 27, 2024 •

edited

Loading

Llama 2

Overview

CLI

API

API documentation

Memory requirements

Model variants

References

Suggested labels

{ "label-name": "llama2-model", "description": "A powerful text model for chat, dialogue, and general use.", "repo": "ollama.ai/library/llama2", "confidence": 91.74 }

unsloth/README.md at main · unslothai/unsloth

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

🥇 Performance Benchmarking

Suggested labels

Awesome-Efficient-LLM

Inference Acceleration

Updates

Contributing

Suggested labels

{ "label-name": "efficient-llm-acceleration", "description": "Inference acceleration techniques for efficient large language models.", "repo": "horseee/Awesome-Efficient-LLM", "confidence": 70.8 }

LLaVA/README.md at main · haotian-liu/LLaVA #628

LLaVA/README.md at main · haotian-liu/LLaVA #628

Comments

irthomasthomas commented Feb 27, 2024

LLaVA/README.md at main · haotian-liu/LLaVA

🌋 LLaVA: Large Language and Vision Assistant

Release

Contents

Suggested labels

irthomasthomas commented Feb 27, 2024 • edited Loading

#184: Robin: Multimodal (Visual-Language) Models. - CERC-AAI Lab - Robin v1.0

#459: llama2

Llama 2

Overview

CLI

API

API documentation

Memory requirements

Model variants

References

Suggested labels

{ "label-name": "llama2-model", "description": "A powerful text model for chat, dialogue, and general use.", "repo": "ollama.ai/library/llama2", "confidence": 91.74 }

#625: unsloth/README.md at main · unslothai/unsloth

unsloth/README.md at main · unslothai/unsloth

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

🥇 Performance Benchmarking

Suggested labels

#494: Awesome-Efficient-LLM: A curated list for Efficient Large Language Models

Awesome-Efficient-LLM

Inference Acceleration

Updates

Contributing

Suggested labels

{ "label-name": "efficient-llm-acceleration", "description": "Inference acceleration techniques for efficient large language models.", "repo": "horseee/Awesome-Efficient-LLM", "confidence": 70.8 }

#317: treaming-llm: Efficient Streaming Language Models with Attention Sinks

irthomasthomas commented Feb 27, 2024 •

edited

Loading