Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing past_key_values as a tuple is deprecated, but unclear how to resolve #33489

Closed
2 of 4 tasks
RonanKMcGovern opened this issue Sep 14, 2024 · 9 comments · Fixed by #33541
Closed
2 of 4 tasks

passing past_key_values as a tuple is deprecated, but unclear how to resolve #33489

RonanKMcGovern opened this issue Sep 14, 2024 · 9 comments · Fixed by #33541
Labels

Comments

@RonanKMcGovern
Copy link

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.44.2
  • Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.24.7
  • Safetensors version: 0.4.5
  • Accelerate version: 0.34.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: NA
  • Using GPU in script?: yes
  • GPU type: NVIDIA A40

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer, SFTConfig
from accelerate import Accelerator
from peft import LoraConfig
import math, os, random
from datetime import datetime

# Select rows to train on
initial_rows = 50000
annealing_rows = 10000
eval_rows = 10000  # Only 10000 rows for evaluation

batch_size = 8
ga = 4

learning_rate=1e-3

def setup_environment():
    os.environ['WANDB_DISABLED'] = 'true'
    return Accelerator()

def load_model_and_tokenizer():
    model_name = "Trelis/80M-0.0090-cosmopedia"
    model_kwargs = {
        "torch_dtype": torch.bfloat16,
    }
    tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-360M-Instruct")
    model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
    return model, tokenizer

def load_and_preprocess_train_dataset(start_idx, num_rows):
    dataset = load_dataset("TIGER-Lab/WebInstructSub", split="train",
                           streaming=True
                          )
    dataset = dataset.skip(start_idx).take(num_rows)
    
    def format_instruction(example):
        return {
            "messages": [
                {"role": "user", "content": example["question"]},
                {"role": "assistant", "content": example["answer"]}
            ]
        }
    
    formatted_dataset = dataset.map(format_instruction)
    return formatted_dataset

def format_instruction_for_trainer(example):
    tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-360M-Instruct")
    
    return tokenizer.apply_chat_template(
        example["messages"],
        truncation=True,
        padding="max_length",
        max_length=2048,
        tokenize=False,
    )

def load_and_preprocess_eval_dataset():
    dataset = load_dataset("TIGER-Lab/WebInstructSub", split="train")
    
    # Get the total number of rows in the dataset
    total_rows = len(dataset)
    
    # Generate a list of random indices
    random_indices = random.sample(range(total_rows), eval_rows)
    
    # Select the random rows
    dataset = dataset.select(random_indices)
    
    def format_instruction(example):
        return {
            "messages": [
                {"role": "user", "content": example["question"]},
                {"role": "assistant", "content": example["answer"]}
            ]
        }
    
    formatted_dataset = dataset.map(format_instruction, remove_columns=dataset.column_names)
    return formatted_dataset

def main():
    accelerator = setup_environment()
    
    model, tokenizer = load_model_and_tokenizer()
    print(model.device)
    
    # Combined training dataset (streaming)
    total_rows = initial_rows + annealing_rows
    train_dataset = load_and_preprocess_train_dataset(0, total_rows)
    
    # Evaluation dataset (non-streaming, last 1000 rows)
    eval_dataset = load_and_preprocess_eval_dataset()
    
    # Calculate steps
    num_epochs = 1
    total_steps = (total_rows * num_epochs) // (batch_size * ga)
    initial_steps = (initial_rows * num_epochs) // (batch_size * ga)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    run_name = f"SFT-{total_rows}rows-lr{learning_rate}-{timestamp}"
    
    training_args = SFTConfig(
        output_dir=f"./Trelis_local/80M-0.015-cosmopedia-SFT-{run_name}",
        run_name=run_name,
        logging_dir=f"./logs/{run_name}",
        eval_strategy="steps",
        save_strategy="steps",
        report_to="tensorboard",
        num_train_epochs=num_epochs,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        warmup_steps=20,
        logging_steps=int(total_steps * 0.1),
        eval_steps=int(total_steps * 0.1),
        save_steps=int(total_steps * 0.1),
        learning_rate=learning_rate,
        bf16=True,
        max_steps=total_steps,
        gradient_accumulation_steps=ga,
    )
    
    # Trainer
    trainer = SFTTrainer(
        model=model,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        args=training_args,
        tokenizer=tokenizer,
        max_seq_length=2048,
        formatting_func=format_instruction_for_trainer,
        optimizers=(optimizer, lr_scheduler)  # Use custom optimizer and scheduler
    )
    
    trainer = accelerator.prepare(trainer)
    
    print(f"Starting instruction fine-tuning on {total_rows} rows of data (streaming)...")
    trainer.train()
    print("Instruction fine-tuning completed. Saving model...")
    
    trainer.save_model("./finetuned_model_small_messages")

if __name__ == "__main__":
    main()

Expected behavior

Getting this error:

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)

I don't expect an error here, and it's unclear what I need to update if I'm to use an appropriate Cache class.

@ArthurZucker
Copy link
Collaborator

hey! That's really not on your side, thanks for reporting!
cc @zucchini-nlp or @gante probably or job to make sure that this does not appear when training, and if SFTTrainer does use generate that we pass DynamicCache instead! 🤗

@zucchini-nlp
Copy link
Member

Maybe we can start removing old cache support from a bunch of models like Llama and clean up for v4.46 release. I didn't dig when exactly the warning appears but I guess it's the eval stage, and since model isn't anymore in self.training the warning is raised.

A workaround for @RonanKMcGovern can be setting use_cache=False after loading the model, because SFT doesn't really generate except for a small sample if needed. So it is model.generation_config.use_cache=False

@RonanKMcGovern
Copy link
Author

RonanKMcGovern commented Sep 17, 2024 via email

@gante
Copy link
Member

gante commented Sep 17, 2024

@ArthurZucker @zucchini-nlp

The warning only pops up when a) past_key_values is passed; b) past_key_values is in the tuple of tuples format. This means the trainer is relying on the (deprecated) format internally, and thus we need to update it.

@zucchini-nlp do you have the bandwidth to update the trainer regarding this warning? 🤗 (since you'll be touching trainer to allow generation to happen)

@zucchini-nlp
Copy link
Member

@gante actually we don't check for Noneness which makes sense because if one want to use cache but hasn't passed any, we init from scratch 😅

if (
use_cache and not isinstance(past_key_values, Cache) and not self.training
): # kept for BC (non `Cache` `past_key_values` inputs)

I think we can force use_cache=False within Trainer ourselves in that case

@gante
Copy link
Member

gante commented Sep 17, 2024

actually we don't check for Noneness

@zucchini-nlp doh 🤦

I'm opening a PR asap to only throw this warning in the presence of non-None past_key_values

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Sep 17, 2024

@gante isn't it BC breaking as prev we init empty tuple when use_cache=True and never expected users to pass anything? At least for models that started supporting cache class recently, but llama and others prob can be fixed

EDIT: may bad, warning only when None hehe, gotcha

@thistlillo
Copy link

I am a simple user of the HF classes and I have absolutely no idea what this warning message, which I receive, is about.

@ArthurZucker
Copy link
Collaborator

Yeah sorry it should be gone now on most cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants