Skip to content

Commit

Permalink
fix: remove extra char from configmap (#526)
Browse files Browse the repository at this point in the history
This PR removes the extra newline/space char (probably from a different
encoding) from the yaml. This extra char causes display issue as the
whole config file is converted to string with "\n" to represent newline.
The resultant configmap is not readable.

Before the change, 
```
 k get configmap/lora-params-template -n workspace -o yaml
apiVersion: v1
data:
  training_config.yaml: "training_config:\n  ModelConfig: # Configurable Parameters:
    https://huggingface.co/docs/transformers/v4.40.2/en/model_doc/auto#transformers.AutoModelForCausalLM.from_pretrained\n
    \   torch_dtype: \"bfloat16\"\n    local_files_only: true\n    device_map: \"auto\"\n\n
    \ QuantizationConfig: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/quantization#transformers.BitsAndBytesConfig\n
    \   load_in_4bit: false\n\n  LoraConfig: # Configurable Parameters: https://huggingface.co/docs/peft/v0.8.2/en/package_reference/lora#peft.LoraConfig\n
    \   r: 8\n    lora_alpha: 8\n    lora_dropout: 0.0\n\n  TrainingArguments: # Configurable
    Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/trainer#transformers.TrainingArguments\n
    \   output_dir: \"/mnt/results\"\n    # num_train_epochs: <Defaults to 3, adjustable>\n
    \   ddp_find_unused_parameters: false # Default to false to prevent errors during
    distributed training.\n    save_strategy: \"epoch\" # Default to save at end of
    each epoch\n    per_device_train_batch_size: 1\n\n  DataCollator: # Configurable
    Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling\n
    \   mlm: true # Default setting; included to show DataCollator can be updated.\n\n
    \ DatasetConfig: # Configurable Parameters: https://github.com/Azure/kaito/blob/main/presets/tuning/text-generation/cli.py#L44\n
    \   shuffle_dataset: true\n    train_test_split: 1 # Default to using all data
    for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning
    data.\n  # Expected Dataset format: \n"
```

After the change: 
```
k get configmap/lora-params-template -n workspace -o yaml
apiVersion: v1
data:
  training_config.yaml: |
    training_config:
      ModelConfig: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/model_doc/auto#transformers.AutoModelForCausalLM.from_pretrained
        torch_dtype: "bfloat16"
        local_files_only: true
        device_map: "auto"

      QuantizationConfig: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/quantization#transformers.BitsAndBytesConfig
        load_in_4bit: false

      LoraConfig: # Configurable Parameters: https://huggingface.co/docs/peft/v0.8.2/en/package_reference/lora#peft.LoraConfig
        r: 8
        lora_alpha: 8
        lora_dropout: 0.0

      TrainingArguments: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/trainer#transformers.TrainingArguments
        output_dir: "/mnt/results"
        # num_train_epochs: <Defaults to 3, adjustable>
        ddp_find_unused_parameters: false # Default to false to prevent errors during distributed training.
        save_strategy: "epoch" # Default to save at end of each epoch
        per_device_train_batch_size: 1

      DataCollator: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling
        mlm: true # Default setting; included to show DataCollator can be updated.

      DatasetConfig: # Configurable Parameters: https://github.com/Azure/kaito/blob/main/presets/tuning/text-generation/cli.py#L44
        shuffle_dataset: true
        train_test_split: 1 # Default to using all data for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning data.
        # {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
        # e.g. https://huggingface.co/datasets/philschmid/dolly-15k-oai-style
```
  • Loading branch information
Fei-Guo authored Jul 18, 2024
1 parent 0cbb06f commit cefdab9
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 7 deletions.
7 changes: 3 additions & 4 deletions charts/kaito/workspace/templates/lora-params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ data:
TrainingArguments: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/trainer#transformers.TrainingArguments
output_dir: "/mnt/results"
# num_train_epochs: <Defaults to 3, adjustable>
ddp_find_unused_parameters: false # Default to false to prevent errors during distributed training.
ddp_find_unused_parameters: false # Default to false to prevent errors during distributed training
save_strategy: "epoch" # Default to save at end of each epoch
per_device_train_batch_size: 1
Expand All @@ -31,8 +31,7 @@ data:
DatasetConfig: # Configurable Parameters: https://github.com/Azure/kaito/blob/main/presets/tuning/text-generation/cli.py#L44
shuffle_dataset: true
train_test_split: 1 # Default to using all data for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning data.
# Expected Dataset format:
train_test_split: 1 # Default to using all data for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning data
# Expected Dataset format:
# {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
# e.g. https://huggingface.co/datasets/philschmid/dolly-15k-oai-style
6 changes: 3 additions & 3 deletions charts/kaito/workspace/templates/qlora-params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ data:
TrainingArguments: # Configurable Parameters: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/trainer#transformers.TrainingArguments
output_dir: "/mnt/results"
# num_train_epochs: <Defaults to 3, adjustable>
ddp_find_unused_parameters: false # Default to false to prevent errors during distributed training.
ddp_find_unused_parameters: false # Default to false to prevent errors during distributed training
save_strategy: "epoch" # Default to save at end of each epoch
per_device_train_batch_size: 1
Expand All @@ -34,7 +34,7 @@ data:
DatasetConfig: # Configurable Parameters: https://github.com/Azure/kaito/blob/main/presets/tuning/text-generation/cli.py#L44
shuffle_dataset: true
train_test_split: 1 # Default to using all data for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning data.
# Expected Dataset format:
train_test_split: 1 # Default to using all data for fine-tuning due to strong pre-trained baseline and typically limited fine-tuning data
# Expected Dataset format:
# {"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
# e.g. https://huggingface.co/datasets/philschmid/dolly-15k-oai-style

0 comments on commit cefdab9

Please sign in to comment.