llama : add Phi-4-mini support (supersede #12099) #12108

ngxson · 2025-02-28T09:52:35Z

Supersede #12099 with these changes:

No need to check for longrope when loading tensors ==> same with LLM_ARCH_LLAMA
Fix convert_hf_to_gguf_update.py
Add missing tokenizer .inp/.out files

Test with llama-cli:

You are a helpful assistant

> hi
Hello! How can I assist you today?

> write some unicode
Sure! Unicode characters can represent a wide array of symbols from various languages, mathematical operators, and much more. Here are some examples:

- Smile face: 😊
- Heart: ❤️
- Star: ⭐
- Music notes: 🎵
- Infinity: ∞
- Copyright: ©
- Mathematical pi: π
- Mathematical integral: ∫
- Mathematical infinity: ∞
- Euro sign: €
- Square root: √
- Copyright symbol: ℧
- Greek letter alpha: α

Would you like to see more examples, or is there a specific character or symbol you're interested in?

> write chinese characters
Certainly! Chinese characters, also known as Hanzi in Mandarin, are logograms used in the Chinese writing system. Here are a few examples of Chinese characters, along with their pinyin (Romanization) and English meanings:

1. 爱 (Ài) - Love
2. 和 (Hé) - Harmony, and
3. 人 (Rén) - Person
4. 学 (Xué) - Study, learn, or education
5. 水 (Shū) - Water
6. 火 (Huǒ) - Fire
7. 木 (Mù) - Wood
8. 金 (Jīn) - Gold
9. 土 (Tǔ) - Soil
10. 气 (Qì) - Air, atmosphere

Each Chinese character can have a specific meaning and sometimes multiple meanings depending on the context. Learning Chinese characters can be quite rewarding, as it allows for a deeper understanding of Chinese culture and language nuances. Would you like more examples or help with something else?

> 
llama_perf_sampler_print:    sampling time =      34.91 ms /   219 runs   (    0.16 ms per token,  6272.38 tokens per second)
llama_perf_context_print:        load time =    2003.88 ms
llama_perf_context_print: prompt eval time =    6230.83 ms /    42 tokens (  148.35 ms per token,     6.74 tokens per second)
llama_perf_context_print:        eval time =    6517.55 ms /   343 runs   (   19.00 ms per token,    52.63 tokens per second)
llama_perf_context_print:       total time =   21252.44 ms /   385 tokens
Interrupted by user

bartowski1182 · 2025-02-28T14:46:16Z

Using latest llama.cpp release and converting Phi 4 mini instruct gave me this error:

kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
  File "/llama.cpp/gguf-py/gguf/gguf_writer.py", line 945, in _pack_val
    raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type

…2108) * Added Phi-4-mini-instruct support * Update regex per ngxson * Change the vocab base to Xenova/gpt-4o * fix conversion update script * no need to check longrope * minor style fix * fix python style --------- Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>

nisparks and others added 6 commits February 27, 2025 17:19

Added Phi-4-mini-instruct support

1837951

Update regex per ngxson

3968c5a

Change the vocab base to Xenova/gpt-4o

958c7ca

fix conversion update script

4a33410

no need to check longrope

f08c231

minor style fix

46d2d1a

ngxson requested a review from ggerganov February 28, 2025 09:52

ngxson mentioned this pull request Feb 28, 2025

Add Phi-4-mini-instruct support #12099

Closed

github-actions bot added the python python script changes label Feb 28, 2025

fix python style

c375e96

ggerganov approved these changes Feb 28, 2025

View reviewed changes

ggerganov mentioned this pull request Feb 28, 2025

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

Draft

21 tasks

ericcurtin approved these changes Feb 28, 2025

View reviewed changes

ngxson merged commit c43a3e7 into master Feb 28, 2025
52 checks passed

ericcurtin deleted the xsn/phi-4 branch February 28, 2025 12:43

temsa mentioned this pull request Mar 1, 2025

phi4 multimodal and mini instruct support ollama/ollama#9387

Open

Animaxx added a commit to Animaxx/llama.cpp that referenced this pull request Mar 2, 2025

https://github.com/ggml-org/llama.cpp/pull/12108/

ede4e0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add Phi-4-mini support (supersede #12099) #12108

llama : add Phi-4-mini support (supersede #12099) #12108

ngxson commented Feb 28, 2025 •

edited

Loading

bartowski1182 commented Feb 28, 2025

llama : add Phi-4-mini support (supersede #12099) #12108

llama : add Phi-4-mini support (supersede #12099) #12108

Conversation

ngxson commented Feb 28, 2025 • edited Loading

bartowski1182 commented Feb 28, 2025

ngxson commented Feb 28, 2025 •

edited

Loading