We can complete Fine-tuning combined with Lora through the Apple MLX framework command line. (If you want to know more about the operation of MLX Framework, please read Inference Phi-3 with Apple MLX Framework
By default, MLX Framework requires the jsonl format of train, test, and eval, and is combined with Lora to complete fine-tuning jobs.
- jsonl data format :
{"text": "<|user|>\nWhen were iron maidens commonly used? <|end|>\n<|assistant|> \nIron maidens were never commonly used <|end|>"}
{"text": "<|user|>\nWhat did humans evolve from? <|end|>\n<|assistant|> \nHumans and apes evolved from a common ancestor <|end|>"}
{"text": "<|user|>\nIs 91 a prime number? <|end|>\n<|assistant|> \nNo, 91 is not a prime number <|end|>"}
....
-
Our example uses TruthfulQA's data , but the amount of data is relatively insufficient, so the fine-tuning results are not necessarily the best. It is recommended that learners use better data based on their own scenarios to complete.
-
The data format is combined with the Phi-3 template
Please download data from this link , please inculde all .jsonl in data folder
Please run this command in terminal
python -m mlx_lm.lora --model microsoft/Phi-3-mini-4k-instruct --train --data ./data --iters 1000
-
This is LoRA fine-tuning, MLX framework not published QLoRA
-
You can set config.yaml to change some arguments,such as
# The path to the local model directory or Hugging Face repo.
model: "microsoft/Phi-3-mini-4k-instruct"
# Whether or not to train (boolean)
train: true
# Directory with {train, valid, test}.jsonl files
data: "data"
# The PRNG seed
seed: 0
# Number of layers to fine-tune
lora_layers: 32
# Minibatch size.
batch_size: 1
# Iterations to train for.
iters: 1000
# Number of validation batches, -1 uses the entire validation set.
val_batches: 25
# Adam learning rate.
learning_rate: 1e-6
# Number of training steps between loss reporting.
steps_per_report: 10
# Number of training steps between validations.
steps_per_eval: 200
# Load path to resume training with the given adapter weights.
resume_adapter_file: null
# Save/load path for the trained adapter weights.
adapter_path: "adapters"
# Save the model every N iterations.
save_every: 1000
# Evaluate on the test set after training
test: false
# Number of test set batches, -1 uses the entire test set.
test_batches: 100
# Maximum sequence length.
max_seq_length: 2048
# Use gradient checkpointing to reduce memory use.
grad_checkpoint: true
# LoRA parameters can only be specified in a config file
lora_parameters:
# The layer keys to apply LoRA to.
# These will be applied for the last lora_layers
keys: ["o_proj","qkv_proj"]
rank: 64
scale: 1
dropout: 0.1
Please run this command in terminal
python -m mlx_lm.lora --config lora_config.yaml
You can run fine-tuning adapter in terminal,like this
python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --adapter-path ./adapters --max-token 2048 --prompt "Why do chameleons change colors? " --eos-token "<|end|>"
and run original model to compare result
python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --max-token 2048 --prompt "Why do chameleons change colors? " --eos-token "<|end|>"
You can try to compare the results of Fine-tuning with the original model
python -m mlx_lm.fuse --model microsoft/Phi-3-mini-4k-instruct
Before use, please configure your llama.cpp environment
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
pip install -r requirements.txt
python convert.py 'Your meger model path' --outfile phi-3-mini-ft.gguf --outtype f16
Note:
-
Now supports quantization conversion of fp32, fp16 and INT 8
-
The merged model is missing tokenizer.model, please download it from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
set Ollma Model file(If not install ollama ,please read [Ollama QuickStart](../02.QuickStart/Ollama_QuickStart.md)
FROM ./phi-3-mini-ft.gguf
PARAMETER stop "<|end|>"
run command in terminal
ollama create phi3ft -f Modelfile
ollama run phi3ft "Why do chameleons change colors?"
Congratulations! Master fine-tuning with the MLX Framework