GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

s-natsubori · 2024-01-11T07:55:37Z

I updated vLLM version from 0.2.1.post1 to 0.2.7.
Model Generation is Broken when tensor_parallel_size >=2.
(tensor_parallel_size=1 is NOT bloken）

First I found it with my Starchatβ+awq model,
and it repro with non awq model HuggingFaceH4/starchat-beta
and bigcode/starcoderbase-1b too.

Base Env

docker image based nvcr.io/nvidia/pytorch:23.08-py3
Nvidia Driver Version: 525.85.12
CUDA Version: 12.0
GPU Tesla T4
Model bigcode/starcoderbase-1b

Old Env

transformers==4.35.0
vllm==0.2.1.post1
xformers==0.0.22

Engine args

INFO 01-11 07:12:45 llm_engine.py:72] Initializing an LLM engine with config: model='/usr/local/model/llm', tokenizer='/usr/local/model/llm', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, seed=0)

Input

import numpy as np
import scipy as sp

def hello_world():

generated

    print('hello world')

def show_me_a_table(table):
    for row in table:
        print(row)

New Env

transformers==4.36.0
autoawq==0.1.7
vllm==0.2.7
xformers==0.0.23.post1

Engine args

INFO 01-11 07:14:24 llm_engine.py:73] Initializing an LLM engine with config: model='/usr/local/model/llm', tokenizer='/usr/local/model/llm', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)

Input

import numpy as np
import scipy as sp

def hello_world():

generated

return "Hello world"

def Hello_world():
    return "Hello"
    def Hello":
    return "Hello"
    def():
        def
        def(def):
            def
        def def():
            def
        def
        def def: def def defDEFdefDEFDEFDEFDEFDEFdefDEFDEFDEFdefdefdefdefDEFDEFdefdefdefdefdefdefdefdefdefdefdefdefdefdefDEFDEFdefdefdefdefdefdefdefdefdefDefDefdefdef

or starchat-beta generate

What
¿

Cu

What
¿

¿

With old env , only update vllm==0.2.2 , generation is broken.
So I guess it is Bug with tensor_parallel.

The text was updated successfully, but these errors were encountered:

esmeetu · 2024-01-16T11:09:58Z

Hi, @s-natsubori. Could you try the latest main branch? I think #2379 can solve this problem.

s-natsubori · 2024-01-17T04:30:26Z

@esmeetu
Thanks following.
It works perfect. amazing!!

s-natsubori changed the title ~~GPTBigCodeForCausalLM, tensor_parallel_size >= 2, Generation bloken. Is this BUG?~~ GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? Jan 11, 2024

s-natsubori mentioned this issue Jan 22, 2024

chat template jinja file for starchat model? #2420

Closed

esmeetu closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

s-natsubori commented Jan 11, 2024 •

edited

Loading

esmeetu commented Jan 16, 2024 •

edited

Loading

s-natsubori commented Jan 17, 2024

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

Comments

s-natsubori commented Jan 11, 2024 • edited Loading

Base Env

Old Env

New Env

esmeetu commented Jan 16, 2024 • edited Loading

s-natsubori commented Jan 17, 2024

s-natsubori commented Jan 11, 2024 •

edited

Loading

esmeetu commented Jan 16, 2024 •

edited

Loading