[Chatllama] error when load dataset when use deepspeed #229

bino282 · 2023-03-09T03:55:22Z

hi, when I use deepspeed , I encountered this error:
[2023-03-09 10:46:33,647] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Traceback (most recent call last):
File "/datahdd/nhanv/Projects/NLP/chatllama/artifacts/main.py", line 50, in
actor_trainer = ActorTrainer(config.actor)
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/chatllama/rlhf/actor.py", line 324, in init
) = deepspeed.initialize(
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 319, in init
self.training_dataloader = self.deepspeed_io(training_data)
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1674, in deepspeed_io
raise ValueError("Training data must be a torch Dataset")
ValueError: Training data must be a torch Dataset
How can I fix it?

Xuan-ZW · 2023-03-09T09:55:33Z

I got the bug too. Has any one debug.

PierpaoloSorbellini · 2023-03-09T09:58:37Z

@bino282 thank you for reaching out. We know that currently we have some issue with DeepSpeed we already working to fix it. Could you please share with us your current setup?

Xuan-ZW · 2023-03-09T10:30:56Z

@PierpaoloSorbellini The setup as following:
from pathlib import Path
from setuptools import setup, find_packages

REQUIREMENTS = [
"beartype",
"deepspeed",
"einops",
"fairscale",
"langchain>=0.0.103",
"torch",
"tqdm",
"transformers",
"datasets",
"openai",
]

this_directory = Path(file).parent
long_description = (this_directory / "README.md").read_text(encoding="utf8")

setup(
name="chatllama-py",
version="0.0.2",
packages=find_packages(),
install_requires=REQUIREMENTS,
long_description=long_description,
include_package_data=True,
long_description_content_type="text/markdown",
)

phste · 2023-03-10T18:12:54Z

I was able to fix the problem "Training data must be a torch Dataset". The parameter training_data of deepspeed.initialize must be altered to training_data=self.train_dataset,. I changed it in actor.py and reward.py. Then deepspeed worked for me. Hopefully this information helps.

PierpaoloSorbellini · 2023-04-03T14:29:55Z

Hi @phste @Xuan-ZW @bino282
With the PR #306 soon to be merged most of the deepspeed problems should have been addressed!

diegofiori mentioned this issue Mar 10, 2023

what is the training data format in chatllama actor_config/reward_config. #217

Closed

shrinath-suresh mentioned this issue Mar 14, 2023

[Chatllama] RLHF Training - RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method #262

Closed

PierpaoloSorbellini changed the title ~~error when load dataset when use deepspeed~~ [Chatllama] error when load dataset when use deepspeed Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chatllama] error when load dataset when use deepspeed #229

[Chatllama] error when load dataset when use deepspeed #229

bino282 commented Mar 9, 2023

Xuan-ZW commented Mar 9, 2023

PierpaoloSorbellini commented Mar 9, 2023

Xuan-ZW commented Mar 9, 2023

phste commented Mar 10, 2023

PierpaoloSorbellini commented Apr 3, 2023

[Chatllama] error when load dataset when use deepspeed #229

[Chatllama] error when load dataset when use deepspeed #229

Comments

bino282 commented Mar 9, 2023

Xuan-ZW commented Mar 9, 2023

PierpaoloSorbellini commented Mar 9, 2023

Xuan-ZW commented Mar 9, 2023

phste commented Mar 10, 2023

PierpaoloSorbellini commented Apr 3, 2023