Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

推理报错:RuntimeError: expected scalar type Half but found Float #210

Open
zhouchangju opened this issue Jun 3, 2023 · 2 comments

Comments

@zhouchangju
Copy link

zhouchangju commented Jun 3, 2023

在提问之前,请务必先看看这个注意事项!!!
已查看,没找到同样的问题。

Screenshot0603-2013

如果你遇到问题需要我们帮助,你可以从以下角度描述你的信息,以便于我们可以理解或者复现你的错误(学会如何提问不仅是能帮助我们理解你,也是一个自查的过程):
1、你使用了哪个脚本、使用的什么命令
bash generate.sh

2、你的参数是什么(脚本参数、命令参数)

TOT_CUDA="0,1,2,3" #Upgrade bitsandbytes to the latest version to enable balanced loading of multiple GPUs, for example: pip install bitsandbytes==0.39.0
BASE_MODEL="../models/llama-7B" #"decapoda-research/llama-13b-hf"
LORA_PATH="../output/lora-Vicuna-output-instruction" #"./lora-Vicuna/checkpoint-final"
USE_LOCAL=0 # 1: use local model, 0: use huggingface model
TYPE_WRITER=1 # whether output streamly
if [[ USE_LOCAL -eq 1 ]]
then
cp sample/instruct/adapter_config.json $LORA_PATH
fi

#Upgrade bitsandbytes to the latest version to enable balanced loading of multiple GPUs
CUDA_VISIBLE_DEVICES=0  python ../generate.py \
    --model_path $BASE_MODEL \
    --lora_path $LORA_PATH \
    --use_local $USE_LOCAL \
    --use_typewriter $TYPE_WRITER

3、你是否修改过我们的代码
只修改过generate.sh的参数,没修改过代码

4、你用的哪个数据集
自己网上抓取的约5700个代码片段,用来训练代码生成

然后你可以从环境的角度描述你的问题,这些问题我们在readme已经相关的问题及解决可能会有描述:
1、哪个操作系统
Ubuntu 20.04.4 LTS

2、使用的什么显卡、多少张
3090单卡

3、python的版本
3.8.10

4、python各种库的版本

transformers @ git+https://ghproxy.com/https://github.com/huggingface/transformers.git
trlx @ git+https://ghproxy.com/https://github.com/CarperAI/trlx.git@b91da7b03d8e9fa0c0d6dce10a8f2611aca3013f
peft @ git+https://ghproxy.com/https://github.com/huggingface/peft.git@13e53fc7ee5d89d59b16523051006dddf0fb7a49

wandb==0.13.10
triton==2.0.0

accelerate==0.15.0
appdirs==1.4.4
bitsandbytes==0.39.0
datasets==2.8.0
deepspeed==0.8.3
evaluate==0.4.0
fairscale==0.4.13
torch==1.13.1
torchvision==0.14.1
gradio==3.20.0
huggingface-hub==0.13.3
loralib==0.1.1
nvitop==1.0.0

sentencepiece==0.1.96
tensorboard==2.12.0
texttable==1.6.7
tokenizers==0.13.2
tqdm==4.65.0

然后你也可以从运行的角度来描述你的问题:
1、报错信息是什么,是哪个代码的报错(可以将完整的报错信息都发给我们)
通过Gradio网页,点击submit按钮,请求predict接口时产生报错:

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/gradio/routes.py", line 384, in run_predict
    output = await app.get_blocks().process_api(
  File "/root/miniconda3/lib/python3.8/site-packages/gradio/blocks.py", line 1032, in process_api
    result = await self.call_function(
  File "/root/miniconda3/lib/python3.8/site-packages/gradio/blocks.py", line 858, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/root/miniconda3/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/root/miniconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/root/miniconda3/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/root/miniconda3/lib/python3.8/site-packages/gradio/utils.py", line 448, in async_iteration
    return next(iterator)
  File "/root/miniconda3/lib/python3.8/site-packages/gradio/interface.py", line 647, in fn
    for output in self.fn(*args):
  File "../generate.py", line 152, in evaluate
    for generation_output in model.stream_generate(
  File "/root/Chinese-Vicuna/utils.py", line 657, in stream_beam_search
    outputs = self(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 529, in forward
    return self.base_model(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
    outputs = self.model(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora.py", line 358, in forward
    result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Half but found Float

2、GPU、CPU是否工作正常
正常

附上我训练的信息:
训练脚本:bash finetune.sh

# 指令式微调
TOT_CUDA="2,3"
CUDAs=(${TOT_CUDA//,/ })
CUDA_NUM=${#CUDAs[@]}
PORT="12345"

DATA_PATH="../data/0603.1/train.json" #"../dataset/instruction/guanaco_non_chat_mini_52K-utf8.json" #"./sample/merge_sample.json"
OUTPUT_PATH="../output/lora-Vicuna-output-instruction"
MODEL_PATH="../models/llama-7B"
TEST_SIZE=700

CUDA_VISIBLE_DEVICES=0 python ../finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size $TEST_SIZE

另外修改了finetune.py中的如下配置:

# 从128改为了64
BATCH_SIZE = 64
# 从256改为了1024,因为是代码生成,需要返回的token长一些
CUTOFF_LEN = 1024
@igorwang
Copy link

igorwang commented Jun 5, 2023

估计是依赖兼容有问题,我训练的也遇到同样的问题,代码仓库回滚到944365eaae5c676a32428532299024a4b8a7fc4a版本后进行推理就没问题了

@Facico
Copy link
Owner

Facico commented Jun 29, 2023

应该是推理脚本当时把8bit注释掉了,注释的“#”删掉应该就行了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants