Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: llama-minicpmv-cli does not accept "mmproj" or "image" arguments when compiled for Android #9420

Closed
theoctopusride opened this issue Sep 10, 2024 · 0 comments · Fixed by #9429
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

Comments

@theoctopusride
Copy link

theoctopusride commented Sep 10, 2024

What happened?

mmproj and image are invalid arguments when using llama-minicpmv-cli binary file, but both of these arguments are used in the "example usage" line when running the binary file. Neither of these arguments are listed in "llama-minicpmv-cli --help" section. Attempting to run on android phone (Qualcomm 8650) through adb shell.

Name and Version

version: 3721 (49006c6) built with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for aarch64-unknown-linux-android34

What operating system are you seeing the problem on?

No response

Relevant log output

no "mmproj" or "image" listed in help. When either of these arguments are used (as in the example usage at the bottom of this text), both error out as "invalid arguments"

oriole:/data/local/tmp $ ./llama-minicpmv-cli --help                                                                                                                                                                           
----- common params -----                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
----- sampling params -----                                                                                                                                                                                                                                                                                                                                                                                                                                   
--samplers SAMPLERS                     samplers that will be used for generation in the order, separated by                                                                                                                                                           
';'                                                                                                                                                                                                                            
(default: top_k;tfs_z;typ_p;top_p;min_p;temperature)                                                                                                                                   
-s,    --seed SEED                      RNG seed (default: 4294967295, use random seed for 4294967295)                                                                                                                         
--sampling-seq SEQUENCE                 simplified sequence for samplers that will be used (default: kfypmt)                                                                                                                   
--ignore-eos                            ignore end of stream token and continue generating (implies                                                                                                                                                                    
--logit-bias EOS-inf)                                                                                                                                                                  --
penalize-nl                           penalize newline tokens (default: false)                                                                                                                                               
--temp N                                temperature (default: 0.8)                                                                                                                                                             
--top-k N                               top-k sampling (default: 40, 0 = disabled)                                                                                                                                             
--top-p N                               top-p sampling (default: 0.9, 1.0 = disabled)                                                                                                                                          
--min-p N                               min-p sampling (default: 0.1, 0.0 = disabled)                                                                                                                                          
--tfs N                                 tail free sampling, parameter z (default: 1.0, 1.0 = disabled)                                                                                                                         
--typical N                             locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)                                                                                                                   
--repeat-last-n N                       last n tokens to consider for penalize (default: 64, 0 = disabled, -1                                                                                                                                                          
= ctx_size)                                                                                                                                                                            --
repeat-penalty N                      penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)                                                                                                                      
--presence-penalty N                    repeat alpha presence penalty (default: 0.0, 0.0 = disabled)                                                                                                                           
--frequency-penalty N                   repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)                                                                                                                          
--dynatemp-range N                      dynamic temperature range (default: 0.0, 0.0 = disabled)                                                                                                                               
--dynatemp-exp N                        dynamic temperature exponent (default: 1.0)                                                                                                                                            
--mirostat N                            use Mirostat sampling.                                                                                                                                                                                                         
Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if                                                                                                                                                          
used.                                                                                                                                                                                                                          
(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)                                                                                                                             
--mirostat-lr N                         Mirostat learning rate, parameter eta (default: 0.1)                                                                                                                                   
--mirostat-ent N                        Mirostat target entropy, parameter tau (default: 5.0)                                                                                                                                  
-l,    --logit-bias TOKEN_ID(+/-)BIAS   modifies the likelihood of token appearing in the completion,                                                                                                                                                                  
i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',                                                                                                                                                          
or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'                                                                                                                     
--grammar GRAMMAR                       BNF-like grammar to constrain generations (see samples in grammars/                                                                                                                                                            
dir) (default: '')                                                                                                                                                                     --
grammar-file FNAME                    file to read grammar from                                                                                                                                                              
-j,    --json-schema SCHEMA             JSON schema to constrain generations (https://json-schema.org/), e.g.                                                                                                                                                          
`{}` for any JSON object                                                                                                                                                                                                       
For schemas w/ external $refs, use --grammar +                                                                                                                                                                                 
example/json_schema_to_grammar.py instead                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
----- example-specific params -----                                                                                                                                                                                                                                                                                                                                                                                                                           
-h,    --help, --usage                  print usage and exit                                                                                                                                                                   
--version                               show version and build info                                                                                                                                                            
-v,    --verbose                        print verbose information                                                                                                                                                              
--verbosity N                           set specific verbosity level (default: 0)                                                                                                                                              
-t,    --threads N                      number of threads to use during generation (default: -1)                                                                                                                                                                       
(env: LLAMA_ARG_THREADS)                                                                                                                                                               
-tb,   --threads-batch N                number of threads to use during batch and prompt processing (default:                                                                                                                                                          
same as --threads)                                                                                                                                                                     -C,    
--cpu-mask M                     CPU affinity mask: arbitrarily long hex. Complements cpu-range                                                                                                                                                                 
(default: "")                                                                                                                                                                          -Cr,   --
cpu-range lo-hi                range of CPUs for affinity. Complements --cpu-mask                                                                                                                                     
--cpu-strict <0|1>                      use strict CPU placement (default: 0)                                                                                                                                                  
--prio N                                set process/thread priority : 0-normal, 1-medium, 2-high, 3-realtime                                                                                                                                                           
(default: 0)                                                                                                                                                                           --poll 
<0...100>                        use polling level to wait for work (0 - no polling, default: 50)                                                                                                                       
-Cb,   --cpu-mask-batch M               CPU affinity mask: arbitrarily long hex. Complements cpu-range-batch                                                                                                                                                           
(default: same as --cpu-mask)                                                                                                                                                          
-Crb,  --cpu-range-batch lo-hi          ranges of CPUs for affinity. Complements --cpu-mask-batch                                                                                                                              
--cpu-strict-batch <0|1>                use strict CPU placement (default: same as --cpu-strict)                                                                                                                               
--prio-batch N                          set process/thread priority : 0-normal, 1-medium, 2-high, 3-realtime                                                                                                                                                           
(default: 0)                                                                                                                                                                           --poll-
batch <0|1>                      use polling to wait for work (default: same as --poll)                                                                                                                                 
-c,    --ctx-size N                     size of the prompt context (default: 0, 0 = loaded from model)                                                                                                                                                                 
(env: LLAMA_ARG_CTX_SIZE)                                                                                                                                                              
-n,    --predict, --n-predict N         number of tokens to predict (default: -1, -1 = infinity, -2 = until                                                                                                                                                            
context filled)                                                                                                                                                                                                                
(env: LLAMA_ARG_N_PREDICT)                                                                                                                                                             
-b,    --batch-size N                   logical maximum batch size (default: 2048)                                                                                                                                                                                     
(env: LLAMA_ARG_BATCH)                                                                                                                                                                 
-ub,   --ubatch-size N                  physical maximum batch size (default: 512)                                                                                                                                                                                     
(env: LLAMA_ARG_UBATCH)                                                                                                                                                                
--keep N                                number of tokens to keep from the initial prompt (default: 0, -1 =                                                                                                                                                             
all)                                                                                                                                                                                   -fa,   --
flash-attn                     enable Flash Attention (default: disabled)                                                                                                                                                                                     
(env: LLAMA_ARG_FLASH_ATTN)                                                                                                                                                            
-p,    --prompt PROMPT                  prompt to start generation with                                                                                                                                                        
-f,    --file FNAME                     a file containing the prompt (default: none)                                                                                                                                           
-bf,   --binary-file FNAME              binary file containing the prompt (default: none)                                                                                                                                      
-e,    --escape                         process escapes sequences (\n, \r, \t, \', \", \\) (default: true)                                                                                                                     
--no-escape                             do not process escape sequences                                                                                                                                                        
--rope-scaling {none,linear,yarn}       RoPE frequency scaling method, defaults to linear unless specified by                                                                                                                                                          
the model                                                                                                                                                                              --
rope-scale N                          RoPE context scaling factor, expands context by a factor of N                                                                                                                          
--rope-freq-base N                      RoPE base frequency, used by NTK-aware scaling (default: loaded from                                                                                                                                                           
model)                                                                                                                                                                                 --rope-
freq-scale N                     RoPE frequency scaling factor, expands context by a factor of 1/N                                                                                                                      
--yarn-orig-ctx N                       YaRN: original context size of model (default: 0 = model training                                                                                                                                                              
context size)                                                                                                                                                                          --
yarn-ext-factor N                     YaRN: extrapolation mix factor (default: -1.0, 0.0 = full                                                                                                                                                                      
interpolation)                                                                                                                                                                         --
yarn-attn-factor N                    YaRN: scale sqrt(t) or attention magnitude (default: 1.0)                                                                                                                              
--yarn-beta-slow N                      YaRN: high correction dim or alpha (default: 1.0)                                                                                                                                      
--yarn-beta-fast N                      YaRN: low correction dim or beta (default: 32.0)                                                                                                                                       
-gan,  --grp-attn-n N                   group-attention factor (default: 1)                                                                                                                                                    
-gaw,  --grp-attn-w N                   group-attention width (default: 512.0)                                                                                                                                                 
-dkvc, --dump-kv-cache                  verbose print of the KV cache                                                                                                                                                          
-nkvo, --no-kv-offload                  disable KV offload                                                                                                                                                                     
-ctk,  --cache-type-k TYPE              KV cache data type for K (default: f16)                                                                                                                                                
-ctv,  --cache-type-v TYPE              KV cache data type for V (default: f16)                                                                                                                                                
-dt,   --defrag-thold N                 KV cache defragmentation threshold (default: -1.0, < 0 - disabled)                                                                                                                                                             
(env: LLAMA_ARG_DEFRAG_THOLD)                                                                                                                                                          
-np,   --parallel N                     number of parallel sequences to decode (default: 1)                                                                                                                                    
--mlock                                 force system to keep model in RAM rather than swapping or compressing                                                                                                                  
--no-mmap                               do not memory-map model (slower load but may reduce pageouts if not                                                                                                                                                            
using mlock)                                                                                                                                                                           --
numa TYPE                             attempt optimizations that help on some NUMA systems                                                                                                                                                                           
- distribute: spread execution evenly over all nodes                                                                                                                                                                           
- isolate: only spawn threads on CPUs on the node that execution                                                                                                                                                               
- started on                                                                                                                                                                                                                     
- numactl: use the CPU map provided by numactl                                                                                                                                                                                 
- if run without this previously, it is recommended to drop the system                                                                                                                                                           
- page cache before using this                                                                                                                                                                                                   
- see https://github.com/ggerganov/llama.cpp/issues/1437                                                                                                                                 
--ngl,  --gpu-layers, --n-gpu-layers N   number of layers to store in VRAM                                                                                                                                                                                              
(env: LLAMA_ARG_N_GPU_LAYERS)                                                                                                                                                          
-sm,   --split-mode {none,layer,row}    how to split the model across multiple GPUs, one of:                                                                                                                                                                           
- none: use one GPU only                                                                                                                                                                                                       
- layer (default): split layers and KV across GPUs                                                                                                                                                                             
- row: split rows across GPUs                                                                                                                                                          
-ts,   --tensor-split N0,N1,N2,...      fraction of the model to offload to each GPU, comma-separated list of                                                                                                                                                          
proportions, e.g. 3,1                                                                                                                                                                  -
mg,   --main-gpu INDEX                 the GPU to use for the model (with split-mode = none), or for                                                                                                                                                                  
intermediate results and KV (with split-mode = row) (default: 0)                                                                                                                       
--check-tensors                         check model tensor data for invalid values (default: false)                                                                                                                            
--override-kv KEY=TYPE:VALUE            advanced option to override model metadata by key. may be specified                                                                                                                                                            
multiple times.                                                                                                                                                                                                                
types: int, float, bool, str. example: --override-kv                                                                                                                                                                           
tokenizer.ggml.add_bos_token=bool:false                                                                                                                                                
--lora FNAME                            path to LoRA adapter (can be repeated to use multiple adapters)                                                                                                                        
--lora-scaled FNAME SCALE               path to LoRA adapter with user defined scaling (can be repeated to use                                                                                                                                                         
multiple adapters)                                                                                                                                                                     --
control-vector FNAME                  add a control vector                                                                                                                                                                                                           
note: this argument can be repeated to add multiple control vectors                                                                                                                    
--control-vector-scaled FNAME SCALE     add a control vector with user defined scaling SCALE                                                                                                                                                                           
note: this argument can be repeated to add multiple scaled control                                                                                                                                                             
vectors                                                                                                                                                                                --
control-vector-layer-range START END                                                                                                                                                                                                                                 
layer range to apply the control vector(s) to, start and end inclusive                                                                                                                 
-m,    --model FNAME                    model path (default: `models/$filename` with filename from `--hf-file`                                                                                                                                                         
or `--model-url` if set, otherwise models/7B/ggml-model-f16.gguf)                                                                                                                                                              
(env: LLAMA_ARG_MODEL)                                                                                                                                                                 
-mu,   --model-url MODEL_URL            model download url (default: unused)                                                                                                                                                                                           
(env: LLAMA_ARG_MODEL_URL)                                                                                                                                                             
-hfr,  --hf-repo REPO                   Hugging Face model repository (default: unused)                                                                                                                                                                                
(env: LLAMA_ARG_HF_REPO)                                                                                                                                                               
-hff,  --hf-file FILE                   Hugging Face model file (default: unused)                                                                                                                                                                                      
(env: LLAMA_ARG_HF_FILE)                                                                                                                                                               
-hft,  --hf-token TOKEN                 Hugging Face access token (default: value from HF_TOKEN environment                                                                                                                                                            
variable)                                                                                                                                                                                                                      
(env: HF_TOKEN)                                                                                                                                                                        -ld,   
--logdir LOGDIR                  path under which to save YAML logs (no logging if unset)                                                                                                                               
--log-test                              Log test                                                                                                                                                                               
--log-disable                           Log disable                                                                                                                                                                            
--log-enable                            Log enable                                                                                                                                                                             
--log-new                               Log new                                                                                                                                                                                
--log-append                            Log append                                                                                                                                                                             
--log-file FNAME                        Log file                                                                                                                                                                                                                                                                                                                                                                                                               
example usage: ./llama-minicpmv-cli -m <llava-v1.5-7b/ggml-model-q5_k.gguf> --mmproj <llava-v1.5-7b/mmproj-model-
f16.gguf> --image <path/to/an/image.jpg> --image <path/to/another/image.jpg> [--temp 0.1] [-p "describe the image in 
detail."]                                                                                                                                                                                                                
note: a lower temperature value like 0.1 is recommended for better quality.
@theoctopusride theoctopusride added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant