Model Wishlist #156

EricLBuehler · 2024-04-16T13:37:38Z

NiuBlibing · 2024-04-23T03:54:54Z

qwen1.5-72B-Chat

NiuBlibing · 2024-04-23T03:55:05Z

llama3

EricLBuehler · 2024-04-23T21:58:53Z

@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly.

EricLBuehler · 2024-04-25T23:08:41Z

@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days.

cargecla1 · 2024-04-26T11:26:16Z

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

francis2tm · 2024-04-28T21:19:36Z

Hello!
Any plans for adding multimodal (e.g. llava) and embedding models?

EricLBuehler · 2024-04-28T21:24:38Z

Can you add https://huggingface.co/Snowflake/snowflake-arctic-instruct?

@cargecla1, yes! It will be a great use case for ISQ.

EricLBuehler · 2024-04-28T21:26:31Z

Hello!
Any plans for adding multimodal (e.g. llava) and embedding models?

@francis2tm, yes. I plan on supporting Llava and embedding models this week.

EricLBuehler · 2024-04-28T21:36:19Z

@NiuBlibing, you can run Qwen now with ISQ, which will quantize it.

kir-gadjello · 2024-04-29T01:59:23Z

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

chelbos · 2024-04-29T02:57:16Z

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt
https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

chelbos · 2024-04-29T03:00:06Z

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

jett06 · 2024-04-29T03:10:00Z

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist? Currently, this fails (built from master):

Running `./mistralrs-server gguf -m PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed -t microsoft/Phi-3-mini-128k-instruct -f /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf`
2024-04-29T03:08:35.180939Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: false
2024-04-29T03:08:35.180975Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-29T03:08:35.180982Z  INFO mistralrs_server: Loading model `microsoft/Phi-3-mini-128k-instruct` on Cpu...
2024-04-29T03:08:35.180989Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-04-29T03:08:35.181017Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181048Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
2024-04-29T03:08:35.181122Z  INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"    
2024-04-29T03:08:35.181133Z  INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
Error: Unknown GGUF architecture `phi3`

rodion-m · 2024-04-29T06:47:44Z

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

W4G1 · 2024-04-29T11:53:18Z

Command-R and Command-R+ from Cohere would be amazing 🙏

yongkangzhao · 2024-04-29T17:32:02Z

T5
LLAVA

EricLBuehler · 2024-04-29T17:41:19Z

@kir-gadjello

Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx.

Supporting a vision+language or multimodal model is very high priority right now.

@chelbos

Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt
https://huggingface.co/collections/deepseek-ai/deepseek-vl-65f295948133d9cf92b706d3

I'll add this one too.

Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ...

I will look into it!

@jett06

Could you add support to for GGUF quantized Phi-3-Mini to the wishlist?

Yes, absolutely, I think it should be easy. In the meantime, you can use ISQ to get the same speed.

@rodion-m

It'll be great to see WizardLM-2 and suzume. And thanks for a great tool!

Thanks! I think suzume is just finetuned Llama so that can be used already. I'll add WizardLM.

@W4G1

Command-R and Command-R+ from Cohere would be amazing 🙏

Yes, I'll add those.

@yongkangzhao

T5 and LLaVA

Yes, I'll add those. T5 will be a nice smaller model.

jett06 · 2024-04-29T19:22:32Z

@EricLBuehler Thanks for your reply, for adding my suggestion to the model wishlist, and for developing such an awesome project! It's very appreciated :)

ldt · 2024-04-30T13:50:09Z

Congrats for your great work!
+1 for vision models like Idefics2-8b or better would be awesome

maximus2600 · 2024-05-01T03:03:21Z

it would be nice to add some embedding models like nomic-text-embed.

progressionnetwork · 2024-05-04T07:06:39Z

Hello, first of all, I want to express my appreciation for the excellent work your team has accomplished on the mistral.rs engine. It's a great project.

I am currently developing a personal AI assistant using Rust, and I believe integrating additional features into your engine could significantly enhance its utility and appeal. Specifically, adding support for Whisper and incorporating Text-to-Speech (TTS) functionalities, such as StyleTTS or similar technologies, would be incredibly beneficial. This would enable the engine to handle LLM inference, speech-to-text, and text-to-speech processes in a unified system very fast (near runtime).

Implementing these features could transform the engine into a more versatile tool for developers like myself, who are keen on building more integrated and efficient AI applications.

EricLBuehler · 2024-05-09T15:25:01Z

@jett06, I just added quantized GGUF Phi-3 support in #276! That is without LongRope support currently, but you can use a plain model with ISQ.

jett06 · 2024-05-09T19:48:53Z

@EricLBuehler Woah, thank you so much! This will be lovely for us folks with less powerful computers or size constraints, you're awesome :)

EricLBuehler · 2024-05-09T21:20:37Z

@jett06, my pleasure! I just fixed a small bug (in case you saw the strange behavior), so it should be all ready to go now!

NeroHin · 2024-05-10T01:37:37Z

IBM's Granite series Code Models.

Granite Code Models

LLukas22 · 2024-05-11T16:50:01Z

@NeroHin

IBM's Granite series Code Models.

Granite Code Models

The 3b and 8b variants should already be supported as they are just based on the llama architecture.

The 20b and 34b variants are based on the GPTBigCode architecture which currently isn't implemented in mistral.rs.

chenwanqq · 2024-05-23T08:41:13Z

Hello! Any plans for adding multimodal (e.g. llava) and embedding models?

I'm working on it now.chenwanqq/candle-llava
It's not easy dude, tons of image preprocess and tensor concat.

dancixx · 2024-08-18T21:33:44Z

Hi guys, thanks for the awesome work. Is there any plan to support Idefics3 and InternVl2?

bhupesh-sf · 2024-08-20T06:55:15Z

hey, thanks for this awesome work as it allows people with fewer resources to run LLMs and VLMs on their machines.

Are we planning to support TTS, STT and image generation models as well? There is a lot of buzz around Flux.1 these days. There are also some good open-source models out there for voice cloning etc.

But once again I must appreciate projects like these to help out the community. 🥇

EricLBuehler · 2024-08-20T10:31:05Z

@bhupesh-sf, yes, I'm planning to expand into the multimodal space with a broad variety of models. As you suggested, TTS, STT, and image generation are all on the table as well as embedding models.

@dancixx, yes, I plan to add Idefics 3 at least!

jasinco · 2024-08-22T13:13:02Z

So does it support Deepseek Coder yet?

pigfoot · 2024-09-03T01:27:22Z

I'd appreciate if Qwen2-VL could be considered to add: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct

ariqpradipa · 2024-09-17T02:50:56Z

I suggest considering the addition of https://huggingface.co/openbmb/MiniCPM3-4B as well.

youcefs21 · 2024-09-17T19:46:37Z

Pixtral! https://mistral.ai/news/pixtral-12b/

ethanc8 · 2024-09-26T16:44:44Z

Here are two open image+text to image+text models (these are the only ones I know of):

Anole - Chameleon finetune with image output
Code | arXiv | 7B weights | Website

Lumina-mGPT - an independent project by Chinese researchers
training/inference code | paper | hf

The Lumina project also made many text-to-other-modes models, in their Lumina-T2X subproject.

oldgithubman · 2024-09-29T21:41:41Z

Qwen-2.5

EricLBuehler · 2024-09-30T03:32:55Z

Quick status update:

We have added the FLUX and Llama 3.2 Vision models recently, with Parler TTS support coming in #791.

If anyone would be able to attempt an implementation of any of the requested models, that would be incredible!

My idea of the priority of adding models is:

Parler TTS
Idefics 3
Pixtral
QwenVL-2
Anole/Chameleon

Please feel free to revise this order!

@youcefs21 I think Pixtral would certainly be an interesting addition! I will take a look at adding that.

@ethanc8 thanks for linking the models! I think Chameleon would be a cool add, the Anole model seems very interesting.

@pigfoot We could add QwenVL-2 :)

@dancixx Idefics 3 support was recently merged into transformers, so work can begin on that too.

@bhupesh-sf I have added FLUX with Parler TTS being implemented in #791!

EricLBuehler · 2024-09-30T03:33:18Z

@oldgithubman I just merged support for Qwen 2.5 in #805!

ChristianWeyer · 2024-10-02T17:42:45Z

My idea of the priority of adding models is:

Parler TTS

Idefics 3

Pixtral

QwenVL-2

Anole/Chameleon

Move Qwen2 VL up the list @EricLBuehler. It is super strong and super important - especially with the HTTP API server (and improved Metal support ;-)).

EricLBuehler · 2024-10-03T02:52:11Z

@ChristianWeyer that sounds good :)

jac-cbi · 2024-10-03T21:43:56Z

@EricLBuehler Could you look at adding support for Aryn/deformable-detr-DocLayNet? I'd like to be able to segment PDFs, screenshots, and output of headless chrome for further processing and storing in a vector DB.

Ideally, Would love to integrate mistral.rs into swiftide to pull the whole thing together locally and offline.

I've opened up an issue on swiftide (356) to add support for the same workflow.

EDIT: There's a parked swiftide issue to add support for mistralrs (56)

bhupesh-sf · 2024-10-11T17:43:08Z

Anole seems quite interesting, would prioritize it over pixtral as it can generate images as well

ethanc8 · 2024-10-12T00:39:18Z

Another interesting highly multimodal model is Emu3-Gen, which can take in images, video, and text, and output images, video, and text -- its video generation is slightly better than OpenSora, and it's also able to extend existing videos. You can see example generations on the website.

xydz · 2024-10-14T02:41:12Z

Please support the glm-4-9b-chat model. The model address link is https://huggingface.co/THUDM/glm-4-9b-chat.

mseri · 2024-11-02T17:02:51Z

What about zamba2 2.7B and [7B}(https://huggingface.co/Zyphra/Zamba2-7B) by Zyphra? They should be quite fast small models

cjs-axsh · 2024-11-08T06:20:56Z

https://huggingface.co/jinaai/jina-embeddings-v3

$ mistralrs-server --isq Q4K --interactive-mode plain --model-id jinaai/jina-embeddings-v3
Error: Unsupported Huggging Face Transformers -CausalLM model class `XLMRobertaModel`. Please raise an issue.

czonios · 2024-12-02T10:19:16Z

Is there any way this library would support Stable Diffusion in the near future (as I saw that FLUX is already supported) with quantization and LoRA adapter capabilities?

GraphicalDot · 2024-12-02T15:26:58Z

support for black-forest-labs/FLUX.1-Fill-dev .uqff format.

hak8or · 2024-12-07T02:20:57Z

The new qwq model that came out a week or so ago. It's very well regarded on the locallama subreddit (seems to be the communities darling at the moment), and tinkering with it myself has also yielded positive results. To be able to use it in mistral.rs would be fantastic!

https://huggingface.co/Qwen/QwQ-32B-Preview

dancixx · 2024-12-15T13:13:41Z

Hi guys,

is there a plan to support llama 3.3?

EricLBuehler · 2024-12-15T13:18:59Z

Hi @dancixx , Llama 3.3 is already supported as it uses the same architecture are 3.1/3.2.

test3211234 · 2024-12-25T04:27:58Z

@EricLBuehler Hi, this isn't so much a wish, but can you recommend me a model for video text detection? For accurately detecting text in any language, like the text in this video. I then want to accurately translate it considering the context of the video, even multiline text like shown, so can you recommend me something for that too? Would this be too slow? Thanks! Also, if I wanted to share this, how would I go about it? Do I just put this in the Cargo.toml:

mistral = { version = "*", git = "linktothisrepo.com" }

Or ask everyone to build it on their own system?

franklucky001 · 2024-12-27T10:04:24Z

support deepseek language model v2 v3, please. https://huggingface.co/deepseek-ai/DeepSeek-V3

EricLBuehler · 2025-01-05T04:28:28Z

@franklucky001 I just merged #1010 which adds Deepseek v2!

EricLBuehler · 2025-01-05T04:32:06Z

@test3211234

Also, if I wanted to share this, how would I go about it? Do I just put this in the Cargo.toml:

Yes, you would add mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git" }.

can you recommend me a model for video text detection?

I'd check out something like this LLaVA model or Qwen2-VL if you want to strictly take video input, otherwise, using any vision model frame-by-frame (or skipping some using some heuristic) might work too!

test3211234 · 2025-01-06T04:40:13Z

Thanks for the response. I think the only way to get the bounding boxes/specific data and stuff is a frame by frame approach. This would take really long and I don't know how to make it fast (probably like 2 months of straight runtime for one video haha). AI text detection, AI translation, AI font detection, AI inpainting, and then the other computer vision stuff. Any ideas?

youcefs21 · 2025-01-06T12:07:02Z

@test3211234 if you just need text, why not process frame by frame with something like https://huggingface.co/stepfun-ai/GOT-OCR2_0

should be relatively fast on a decent GPU, like 1-2 seconds per frame, skipping every other frame, you can probably do 45 minutes of video text extraction per day with a naive approach. You can probably optimize that more with some sort of parallelism.

EricLBuehler added the models Additions to model or architectures label Apr 16, 2024

EricLBuehler mentioned this issue Apr 16, 2024

Model wishlist #49

Closed

14 tasks

EricLBuehler pinned this issue Apr 16, 2024

Model Wishlist #156

Model Wishlist #156

Comments

EricLBuehler commented Apr 16, 2024 • edited Loading

Language models

Multimodal models

Embedding models

NiuBlibing commented Apr 23, 2024

NiuBlibing commented Apr 23, 2024

EricLBuehler commented Apr 23, 2024

EricLBuehler commented Apr 25, 2024

cargecla1 commented Apr 26, 2024

francis2tm commented Apr 28, 2024

EricLBuehler commented Apr 28, 2024

EricLBuehler commented Apr 28, 2024

EricLBuehler commented Apr 28, 2024

kir-gadjello commented Apr 29, 2024

chelbos commented Apr 29, 2024

chelbos commented Apr 29, 2024

jett06 commented Apr 29, 2024 • edited Loading

rodion-m commented Apr 29, 2024

W4G1 commented Apr 29, 2024

yongkangzhao commented Apr 29, 2024

EricLBuehler commented Apr 29, 2024

jett06 commented Apr 29, 2024

ldt commented Apr 30, 2024

maximus2600 commented May 1, 2024

progressionnetwork commented May 4, 2024

EricLBuehler commented May 9, 2024

jett06 commented May 9, 2024

EricLBuehler commented May 9, 2024

NeroHin commented May 10, 2024

LLukas22 commented May 11, 2024

chenwanqq commented May 23, 2024

dancixx commented Aug 18, 2024

bhupesh-sf commented Aug 20, 2024

EricLBuehler commented Aug 20, 2024

jasinco commented Aug 22, 2024

pigfoot commented Sep 3, 2024

ariqpradipa commented Sep 17, 2024 • edited Loading

youcefs21 commented Sep 17, 2024

ethanc8 commented Sep 26, 2024

oldgithubman commented Sep 29, 2024

EricLBuehler commented Sep 30, 2024

EricLBuehler commented Sep 30, 2024

ChristianWeyer commented Oct 2, 2024

EricLBuehler commented Oct 3, 2024

jac-cbi commented Oct 3, 2024 • edited Loading

bhupesh-sf commented Oct 11, 2024

ethanc8 commented Oct 12, 2024

xydz commented Oct 14, 2024

mseri commented Nov 2, 2024 • edited Loading

cjs-axsh commented Nov 8, 2024 • edited Loading

czonios commented Dec 2, 2024

GraphicalDot commented Dec 2, 2024

hak8or commented Dec 7, 2024

dancixx commented Dec 15, 2024

EricLBuehler commented Dec 15, 2024

test3211234 commented Dec 25, 2024

franklucky001 commented Dec 27, 2024

EricLBuehler commented Jan 5, 2025

EricLBuehler commented Jan 5, 2025

test3211234 commented Jan 6, 2025 • edited Loading

youcefs21 commented Jan 6, 2025

EricLBuehler commented Apr 16, 2024 •

edited

Loading

jett06 commented Apr 29, 2024 •

edited

Loading

ariqpradipa commented Sep 17, 2024 •

edited

Loading

jac-cbi commented Oct 3, 2024 •

edited

Loading

mseri commented Nov 2, 2024 •

edited

Loading

cjs-axsh commented Nov 8, 2024 •

edited

Loading

test3211234 commented Jan 6, 2025 •

edited

Loading