Adding GPT-NeoX #164

aflah02 · 2024-02-07T21:08:11Z

I followed along the instructions here to add GPT-NeoX support which would bring support for the Pythia model family and other similar architecture models.

Reference: #157 (comment)

FIXED (Keeping Logs for Future Reference):
I was able to debug most errors but I'm stuck on this particular error which happens once I start requesting on the endpoint (i.e. it loads correctly I assume) -

INFO:     Started server process [1344199]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30013 (Press CTRL+C to quit)
INFO:     127.0.0.1:37998 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 17. #remaining_req: 0. #running_req: 0
Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 180, in forward_step
    self.forward_fill_batch(new_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 369, in forward_fill_batch
    logits, (logprobs, normalized_logprobs) = self.model_runner.forward(
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 486, in forward
    return self.forward_extend(**kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 391, in forward_extend
    return self.model.forward(input_ids, input_metadata.positions, input_metadata)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/models/gpt_neox.py", line 236, in forward
    return self.logits_processor(
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/layers/logits_processor.py", line 32, in forward
    last_logits = torch.matmul(last_hidden, weight.T)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParallelLMHead' object has no attribute 'T'

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 180, in forward_step
    self.forward_fill_batch(new_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 369, in forward_fill_batch
    logits, (logprobs, normalized_logprobs) = self.model_runner.forward(
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 486, in forward
    return self.forward_extend(**kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_runner.py", line 391, in forward_extend
    return self.model.forward(input_ids, input_metadata.positions, input_metadata)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/models/gpt_neox.py", line 236, in forward
    return self.logits_processor(
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/layers/logits_processor.py", line 32, in forward
    last_logits = torch.matmul(last_hidden, weight.T)
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParallelLMHead' object has no attribute 'T'

/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py:204: UserWarning: Warning: available_size=714944, max_total_num_token=714961
KV cache pool leak detected!
  warnings.warn(
/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py:204: UserWarning: Warning: available_size=714944, max_total_num_token=714961
KV cache pool leak detected!
  warnings.warn(

Any idea what might be going wrong? It seems that the error is related to the LogitProcessor which I'm not very familiar with. I've tried to copy the logic from the llama implementation for the same

aflah02 · 2024-02-07T21:15:13Z

Update: I just noticed the missing part and changed that which fixes the old issue but now I get a new error -

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 165, in exposed_step
    self.forward_step()
  File "/NS/llm-1/nobackup/afkhan/anaconda3/envs/sglangfact/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 192, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/model_rpc.py", line 429, in forward_decode_batch
    next_token_ids, next_token_probs = batch.sample(logits)
  File "/NS/llm-1/work/afkhan/sglang/python/sglang/srt/managers/router/infer_batch.py", line 452, in sample
    sampled_index = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

aflah02 · 2024-02-07T21:24:45Z

I did some further testing. It runs perfectly for - https://github.com/aflah02/sglang/blob/main/examples/usage/choices_logprob.py
But fails for https://github.com/aflah02/sglang/blob/main/examples/quick_start/srt_example_chat.py with the error above
Seems like the issue might be elsewhere

aflah02 · 2024-02-09T20:00:03Z

@merrymercy Any thoughts? Not sure why one tutorial works while the other doesn't

merrymercy · 2024-02-11T14:14:18Z

@aflah02

Can you try this tutorial? https://github.com/sgl-project/sglang/blob/main/examples/quick_start/srt_example_complete.py
The chat example does not work properly, possibly due to the vicuna chat template. The default chat template is vicuna, but GPT-NeoX has not been tuned on that template.
Can you add more print statements to see where the nan comes from? Does it occur in early transformers layers? Does it only occur in the last layer?

aflah02 · 2024-02-20T18:38:32Z

@merrymercy
For Part 2 It seems that the error mainly occurs in the last few layers/last layer. Some of the logs are here for the chat example - logs.txt
The first tutorial also gives a similar error

aflah02 · 2024-03-01T18:11:23Z

@merrymercy Any thoughts on what might be going wrong here? I don't know whether a template can make such breaking issues

merrymercy · 2024-03-11T02:19:36Z

@aflah02 I have no idea. I typically debug these kinds of wired bugs by comparing intermediate tensors layer by layer between sglang and huggingface/vllm implementations, similar to your print statements.

aflah02 · 2024-06-12T14:30:28Z

@merrymercy Sorry for being inactive, life got really busy the past few months. I don't have the bandwidth nowadays to take this on and if you want to then feel free to work on this

merrymercy · 2024-06-12T22:39:49Z

I will close this for now

Added GPT-NeoX

73845f5

aflah02 marked this pull request as draft February 7, 2024 21:08

Fix Error due to missing attribute access

b3373fd

aflah02 marked this pull request as ready for review February 7, 2024 21:26

aflah02 changed the title ~~Adding GPT-NeoX [WIP]~~ Adding GPT-NeoX Feb 7, 2024

merrymercy self-assigned this Mar 11, 2024

merrymercy force-pushed the main branch from 36daf8e to adc9742 Compare May 28, 2024 05:51

merrymercy closed this Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GPT-NeoX #164

Adding GPT-NeoX #164

aflah02 commented Feb 7, 2024 •

edited

Loading

aflah02 commented Feb 7, 2024 •

edited

Loading

aflah02 commented Feb 7, 2024

aflah02 commented Feb 9, 2024

merrymercy commented Feb 11, 2024 •

edited

Loading

aflah02 commented Feb 20, 2024

aflah02 commented Mar 1, 2024

merrymercy commented Mar 11, 2024

aflah02 commented Jun 12, 2024

merrymercy commented Jun 12, 2024

Adding GPT-NeoX #164

Adding GPT-NeoX #164

Conversation

aflah02 commented Feb 7, 2024 • edited Loading

aflah02 commented Feb 7, 2024 • edited Loading

aflah02 commented Feb 7, 2024

aflah02 commented Feb 9, 2024

merrymercy commented Feb 11, 2024 • edited Loading

aflah02 commented Feb 20, 2024

aflah02 commented Mar 1, 2024

merrymercy commented Mar 11, 2024

aflah02 commented Jun 12, 2024

merrymercy commented Jun 12, 2024

aflah02 commented Feb 7, 2024 •

edited

Loading

aflah02 commented Feb 7, 2024 •

edited

Loading

merrymercy commented Feb 11, 2024 •

edited

Loading