-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaVA/README.md at main · haotian-liu/LLaVA #628
Comments
/### Related issues #184: Robin: Multimodal (Visual-Language) Models. - CERC-AAI Lab - Robin v1.0### DetailsSimilarity score: 0.89 - [ ] [CERC-AAI Lab - Robin v1.0](https://sites.google.com/view/irinalab/blog/robin-v1-0)
These models outperform, or perform on par with, the state of the art models of similar scale. Models detailed bellow are available here: https://huggingface.co/agi-collective #459: llama2### DetailsSimilarity score: 0.89 - [ ] [llama2](https://ollama.ai/library/llama2)Llama 2The most popular model for general use. 265.8K Pulls OverviewLlama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat. CLIOpen the terminal and run ollama run llama2 APIExample using curl: curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}' API documentationMemory requirements
If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory. Model variants
By default, Ollama uses 4-bit quantization. To try other quantization levels, please use the other tags. The number after the ReferencesSuggested labels{ "label-name": "llama2-model", "description": "A powerful text model for chat, dialogue, and general use.", "repo": "ollama.ai/library/llama2", "confidence": 91.74 }#625: unsloth/README.md at main · unslothai/unsloth### DetailsSimilarity score: 0.88 - [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)unsloth/README.md at main · unslothai/unsloth✨ Finetune for FreeAll notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
🦥 Unsloth.ai News
🔗 Links and Resources
⭐ Key Features
🥇 Performance Benchmarking
Suggested labels#494: Awesome-Efficient-LLM: A curated list for Efficient Large Language Models### DetailsSimilarity score: 0.88 - [ ] [horseee/Awesome-Efficient-LLM: A curated list for Efficient Large Language Models](https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration)Awesome-Efficient-LLMA curated list for Efficient Large Language Models:
Inference Acceleration
Updates
ContributingIf you'd like to include your paper or need to update any details, please feel free to submit a pull request. You can generate the required markdown format for each paper by filling in the information in Suggested labels{ "label-name": "efficient-llm-acceleration", "description": "Inference acceleration techniques for efficient large language models.", "repo": "horseee/Awesome-Efficient-LLM", "confidence": 70.8 }#317: treaming-llm: Efficient Streaming Language Models with Attention Sinks### DetailsSimilarity score: 0.88 - [ ] [mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks](https://github.com/mit-han-lab/streaming-llm)Usage Environment Setup conda create -yn streaming python=3.8 pip install torch torchvision torchaudio python setup.py develop CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming What does "working on infinite-length inputs" imply for LLMs? Handling infinite-length text with LLMs presents challenges. Notably, storing all previous Key and Value (KV) states demands significant memory, and models might struggle to generate text beyond their training sequence length. StreamingLLM addresses this by retaining only the most recent tokens and attention sinks, discarding intermediate tokens. This enables the model to generate coherent text from recent tokens without a cache reset — a capability not seen in earlier methods. Is the context window of LLMs expanded? No. The context window remains unchanged. Only the most recent tokens and attention sinks are retained, discarding middle tokens. This means the model can only process the latest tokens. The context window remains constrained by its initial pre-training. For instance, if Llama-2 is pre-trained with a context window of 4096 tokens, then the maximum cache size for StreamingLLM on Llama-2 remains 4096. Can I input an extensive text, like a book, into StreamingLLM for summarization? While you can input a lengthy text, the model will only recognize the latest tokens. Thus, if a book is an input, StreamingLLM might only summarize the concluding paragraphs, which might not be very insightful. As emphasized earlier, we neither expand the LLMs' context window nor enhance their long-term memory. StreamingLLM's strength lies in generating fluent text from recent tokens without needing a cache refresh. What is the ideal use case for StreamingLLM? StreamingLLM is optimized for streaming applications, such as multi-round dialogues. It's ideal for scenarios where a model needs to operate continually without requiring extensive memory or dependency on past data. An example is a daily assistant based on LLMs. StreamingLLM would let the model function continuously, basing its responses on recent conversations without needing to refresh its cache. Earlier methods would either need a cache reset when the conversation length exceeded the training length (losing recent context) or recompute KV states from recent text history, which can be time-consuming. |
LLaVA/README.md at main · haotian-liu/LLaVA
🌋 LLaVA: Large Language and Vision Assistant
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
📢 LLaVA-NeXT Blog Project Page Demo Data Model Zoo
🤝Community Contributions: llama.cpp Colab 🤗Space Replicate AutoGen BakLLaVA
Improved Baselines with Visual Instruction Tuning Paper HF
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Visual Instruction Tuning (NeurIPS 2023, Oral) Paper HF
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)
Release
More
Code License
Usage and License Notices: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models for checkpoints trained using the dataset (e.g. Llama community license for LLaMA-2 and Vicuna-v1.5). This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations.
Contents
Suggested labels
The text was updated successfully, but these errors were encountered: