Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chatllama] error when load dataset when use deepspeed #229

Open
bino282 opened this issue Mar 9, 2023 · 5 comments
Open

[Chatllama] error when load dataset when use deepspeed #229

bino282 opened this issue Mar 9, 2023 · 5 comments

Comments

@bino282
Copy link

bino282 commented Mar 9, 2023

hi, when I use deepspeed , I encountered this error:
[2023-03-09 10:46:33,647] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Traceback (most recent call last):
File "/datahdd/nhanv/Projects/NLP/chatllama/artifacts/main.py", line 50, in
actor_trainer = ActorTrainer(config.actor)
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/chatllama/rlhf/actor.py", line 324, in init
) = deepspeed.initialize(
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 319, in init
self.training_dataloader = self.deepspeed_io(training_data)
File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1674, in deepspeed_io
raise ValueError("Training data must be a torch Dataset")
ValueError: Training data must be a torch Dataset
How can I fix it?

@Xuan-ZW
Copy link

Xuan-ZW commented Mar 9, 2023

I got the bug too. Has any one debug.

@PierpaoloSorbellini
Copy link
Collaborator

@bino282 thank you for reaching out. We know that currently we have some issue with DeepSpeed we already working to fix it. Could you please share with us your current setup?

@Xuan-ZW
Copy link

Xuan-ZW commented Mar 9, 2023

@PierpaoloSorbellini The setup as following:
from pathlib import Path
from setuptools import setup, find_packages

REQUIREMENTS = [
"beartype",
"deepspeed",
"einops",
"fairscale",
"langchain>=0.0.103",
"torch",
"tqdm",
"transformers",
"datasets",
"openai",
]

this_directory = Path(file).parent
long_description = (this_directory / "README.md").read_text(encoding="utf8")

setup(
name="chatllama-py",
version="0.0.2",
packages=find_packages(),
install_requires=REQUIREMENTS,
long_description=long_description,
include_package_data=True,
long_description_content_type="text/markdown",
)

@phste
Copy link

phste commented Mar 10, 2023

I was able to fix the problem "Training data must be a torch Dataset". The parameter training_data of deepspeed.initialize must be altered to training_data=self.train_dataset,. I changed it in actor.py and reward.py. Then deepspeed worked for me. Hopefully this information helps.

@PierpaoloSorbellini
Copy link
Collaborator

Hi @phste @Xuan-ZW @bino282
With the PR #306 soon to be merged most of the deepspeed problems should have been addressed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants