-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用torch-npu更改代码进行训练,得出的lora权重无法进行推演 #631
Comments
Hi @ChjxL 这个报错一般是权重有损,或者权重文件与mindspore checkpoint格式不一致导致。对于您的情况,后者的可能性更大。torch+npu训练后的权重格式与mindspore并不相同,需要将torch ckpt转换成ms模型对应格式才可以。 python tools/model_conversion/convert_weights.py \
--source PATH_TO_TORCH.pth \
--target PATH_TO_SAVE_MS.ckpt \
--model sdv1 \
--source_version pt |
你好,非常感谢您的回复,根据您提供的方案进行了尝试,我将得出来的权重模型进行转换后再进行推理,最终还是报错,报错信息如下: |
fixed in #644 |
|
您这边是什么场景?用torch npu训练。然后转mindspore推理? |
您好,我主要适用于扩散模型的训练;我尝试过直接用mindspore训练,但是觉得效果不够好,所以尝试用torch npu 训练看看能不能获得更好的效果,最终用mindspore进行推理,但是没成功;也试过用nvidia训练,但是训练的lora无法通过脚本转换成ms的格式。 |
Hardware Environment | 硬件环境
Ascend 910B1
Software Environment | 软件环境
Related log / screenshot | 完整日志
Describe the expected behavior | 期望输出
模型可以正常推演
Steps to reproduce the issue | 复现报错的步骤
1.我们使用了https://github.com/kohya-ss/sd-scripts 这个训练代码,然后在train_network.py这个脚本里面导入了咱们的脚本转换库: from torch_npu.contrib import transfer_to_npu
2.执行如下命令进行lora的训练
accelerate launch --num_cpu_threads_per_process 1 train_network.py --pretrained_model_name_or_path=/home/ma-user/work/train_lora/ckpt/majicMIXV7.safetensors --dataset_config=/home/ma-user/work/train_lora/dataset/10_wgdz/data.toml --output_dir=/home/ma-user/work/train_lora/dataset/10_wgdz/out --output_name=wgdz --save_model_as=ckpt --learning_rate=1e-4 --optimizer_type="Lion" --lr_scheduler=cosine_with_restarts --network_module=networks.lora --network_dim=64 --network_alpha=64 --max_train_steps=1200
3.训练没问题,可以正常完成,然后得到了一个ckpt的权重文件
4.拿这个权重文件进行推理,到mindone/examples/stable_diffusion_v2下执行推理命令: python text_to_image.py
--prompt "wgdz"
--n_iter 1
--n_samples 2
--use_lora True
--lora_ckpt_path /home/ma-user/work/train_lora/ckpt/wgdz.ckpt
-v 1.5
结果报错了,具体的错误信息如下:
Special notes for this issue | 其他信息
模型文件权限正常,不是权限导致的问题
The text was updated successfully, but these errors were encountered: