Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

Closed
s-natsubori opened this issue Jan 11, 2024 · 2 comments
Closed

GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? #2417

s-natsubori opened this issue Jan 11, 2024 · 2 comments

Comments

@s-natsubori
Copy link

s-natsubori commented Jan 11, 2024

I updated vLLM version from 0.2.1.post1 to 0.2.7.
Model Generation is Broken when tensor_parallel_size >=2.
(tensor_parallel_size=1 is NOT bloken)

First I found it with my Starchatβ+awq model,
and it repro with non awq model HuggingFaceH4/starchat-beta
and bigcode/starcoderbase-1b too.

Base Env

  • docker image based nvcr.io/nvidia/pytorch:23.08-py3
  • Nvidia Driver Version: 525.85.12
  • CUDA Version: 12.0
  • GPU Tesla T4
  • Model bigcode/starcoderbase-1b

Old Env

  • transformers==4.35.0
  • vllm==0.2.1.post1
  • xformers==0.0.22

Engine args

INFO 01-11 07:12:45 llm_engine.py:72] Initializing an LLM engine with config: model='/usr/local/model/llm', tokenizer='/usr/local/model/llm', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, seed=0)

Input

import numpy as np
import scipy as sp

def hello_world():

generated

    print('hello world')

def show_me_a_table(table):
    for row in table:
        print(row)

New Env

  • transformers==4.36.0
  • autoawq==0.1.7
  • vllm==0.2.7
  • xformers==0.0.23.post1

Engine args

INFO 01-11 07:14:24 llm_engine.py:73] Initializing an LLM engine with config: model='/usr/local/model/llm', tokenizer='/usr/local/model/llm', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)

Input

import numpy as np
import scipy as sp

def hello_world():

generated

return "Hello world"

def Hello_world():
    return "Hello"
    def Hello":
    return "Hello"
    def():
        def
        def(def):
            def
        def def():
            def
        def
        def def: def def defDEFdefDEFDEFDEFDEFDEFdefDEFDEFDEFdefdefdefdefDEFDEFdefdefdefdefdefdefdefdefdefdefdefdefdefdefDEFDEFdefdefdefdefdefdefdefdefdefDefDefdefdef

or starchat-beta generate

What
¿

Cu

What
¿

¿

With old env , only update vllm==0.2.2 , generation is broken.
So I guess it is Bug with tensor_parallel.

@s-natsubori s-natsubori changed the title GPTBigCodeForCausalLM, tensor_parallel_size >= 2, Generation bloken. Is this BUG? GPTBigCodeForCausalLM, TP >= 2, output is bloken. Is this BUG? Jan 11, 2024
@esmeetu
Copy link
Collaborator

esmeetu commented Jan 16, 2024

Hi, @s-natsubori. Could you try the latest main branch? I think #2379 can solve this problem.

@s-natsubori
Copy link
Author

@esmeetu
Thanks following.
It works perfect. amazing!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants