-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flan-T5 quality decreases with bigger models when using fastertransformer #95
Comments
Running with |
How about using bf16? |
bf16 works but it slower than fp32 |
I have observed the same.
But I have seen slightly better |
That's interesting. |
Hi, I took a look at this issue and it does not seem to be a bug in FT. FP32 produces large activations after FC2 (~120K) for some inputs. This causes FC2 to produce NANs in FP16 which will impact the accuracy. There seems to be other instances of this for flan-t5 like here and here. Also from my experiments (using 22.09 container at least) BF16 seems to give speedups over FP32. I used 80GB A100 PCIE. Results & command for FP32:
Results & command for BF16:
HF gives good results in FP16 because they clamp the outputs of self attn, cross attn and ffn to be within the FP16 range. We don't currently do this in FT. |
@lakshaykc do you really test HF on FP16? I try HF flan-t5-xl on FP32 and FP16 and observe similar accuracies like FT HF FP32: bhsueh@9cc4f0c2782c:/home/scratch.bhsueh_sw/FasterTransformer_new/build$ python3 ../examples/pytorch/t5/summarization.py --ft_model_location /home/scratch.bhsueh_gpu_2/models/flan-t5/flan-t5-xl/c-models --hf_model_location /home/scratch.bhsueh_gpu_2/models/flan-t5/flan-t5-xl/ --test_hf --cache_path /data/gpt_dataset/ccdv/ --data_type fp32
Reusing dataset cnn_dailymail (/data/gpt_dataset/ccdv/ccdv___cnn_dailymail/3.0.0/3.0.0/0107f7388b5c6fae455a5661bcd134fc22da53ea75852027040d8d1e997f101f)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 237.81it/s]
[INFO] load HF model spend 39.482971 sec
Token indices sequence length is longer than the specified maximum sequence length for this model (744 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------
HF Generated :
Article : (CNN)James Best, best known for his portrayal of bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," died Monday after a brief illness. He was 88. Best died in hospice in Hickory, North Carolina, of complications from pneumonia, said Steve Latshaw, a longtime friend and Hollywood colleague. Although he'd been a busy actor for decades in theater and in Hollywood, Best didn't become famous until 1979, when "The Dukes of Hazzard's" cornpone charms began beaming into millions of American homes almost every Friday night. For seven seasons, Best's Rosco P. Coltrane chased the moonshine-running Duke boys back and forth across the back roads of fictitious Hazzard County, Georgia, although his "hot pursuit" usually ended with him crashing his patrol car. Although Rosco was slow-witted and corrupt, Best gave him a childlike enthusiasm that got laughs and made him endearing. His character became known for his distinctive "kew-kew-kew" chuckle and for goofy catchphrases such as "cuff 'em and stuff 'em!" upon making an arrest. Among the most popular shows on TV in the early '80s, "The Dukes of Hazzard" ran until 1985 and spawned TV movies, an animated series and video games. Several of Best's "Hazzard" co-stars paid tribute to the late actor on social media. "I laughed and learned more from Jimmie in one hour than from anyone else in a whole year," co-star John Schneider, who played Bo Duke, said on Twitter. "Give Uncle Jesse my love when you see him dear friend." "Jimmy Best was the most constantly creative person I have ever known," said Ben Jones, who played mechanic Cooter on the show, in a Facebook post. "Every minute of his long life was spent acting, writing, producing, painting, teaching, fishing, or involved in another of his life's many passions." Born Jewel Guy on July 26, 1926, in Powderly, Kentucky, Best was orphaned at 3 and adopted by Armen and Essa Best, who renamed him James and raised him in rural Indiana. Best served in the Army during World War II before launching his acting career. In the 1950s and 1960s, he accumulated scores of credits, playing a range of colorful supporting characters in such TV shows as "The Twilight Zone," "Bonanza," "The Andy Griffith Show" and "Gunsmoke." He later appeared in a handful of Burt Reynolds' movies, including "Hooper" and "The End." But Best will always be best known for his "Hazzard" role, which lives on in reruns. "Jimmie was my teacher, mentor, close friend and collaborator for 26 years," Latshaw said. "I directed two of his feature films, including the recent 'Return of the Killer Shrews,' a sequel he co-wrote and was quite proud of as he had made the first one more than 50 years earlier." People we've lost in 2015 . CNN's Stella Chan contributed to this story.
Highlights : James Best, who played the sheriff on "The Dukes of Hazzard," died Monday at 88 .
"Hazzard" ran from 1979 to 1985 and was among the most popular shows on TV .
Summary : ['James Best, best known for his portrayal of bumbling sheriff Rosco P. Coltrane on TV\'s "The Dukes of Hazzard," died Monday after a brief illness..']
---------------------------------------------------------
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:41<00:00, 1.96s/it]
Hugging Face (total latency: 41.06286600000001 sec)
beam_id: 0
rouge1 : 35.156759711225185
rouge2 : 14.88218931836181
rougeL : 24.515717871560494
rougeLsum : 24.731586345417877 HF FP16: bhsueh@9cc4f0c2782c:/home/scratch.bhsueh_sw/FasterTransformer_new/build$ python3 ../examples/pytorch/t5/summarization.py --ft_model_location /home/scratch.bhsueh_gpu_2/models/flan-t5/flan-t5-xl/c-models --hf_model_location /home/scratch.bhsueh_gpu_2/models/flan-t5/flan-t5-xl/ --test_hf --cache_path /data/gpt_dataset/ccdv/ --data_type fp16
Reusing dataset cnn_dailymail (/data/gpt_dataset/ccdv/ccdv___cnn_dailymail/3.0.0/3.0.0/0107f7388b5c6fae455a5661bcd134fc22da53ea75852027040d8d1e997f101f)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 248.45it/s]
[INFO] load HF model spend 70.423421 sec
Token indices sequence length is longer than the specified maximum sequence length for this model (744 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------
HF Generated :
Article : (CNN)James Best, best known for his portrayal of bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," died Monday after a brief illness. He was 88. Best died in hospice in Hickory, North Carolina, of complications from pneumonia, said Steve Latshaw, a longtime friend and Hollywood colleague. Although he'd been a busy actor for decades in theater and in Hollywood, Best didn't become famous until 1979, when "The Dukes of Hazzard's" cornpone charms began beaming into millions of American homes almost every Friday night. For seven seasons, Best's Rosco P. Coltrane chased the moonshine-running Duke boys back and forth across the back roads of fictitious Hazzard County, Georgia, although his "hot pursuit" usually ended with him crashing his patrol car. Although Rosco was slow-witted and corrupt, Best gave him a childlike enthusiasm that got laughs and made him endearing. His character became known for his distinctive "kew-kew-kew" chuckle and for goofy catchphrases such as "cuff 'em and stuff 'em!" upon making an arrest. Among the most popular shows on TV in the early '80s, "The Dukes of Hazzard" ran until 1985 and spawned TV movies, an animated series and video games. Several of Best's "Hazzard" co-stars paid tribute to the late actor on social media. "I laughed and learned more from Jimmie in one hour than from anyone else in a whole year," co-star John Schneider, who played Bo Duke, said on Twitter. "Give Uncle Jesse my love when you see him dear friend." "Jimmy Best was the most constantly creative person I have ever known," said Ben Jones, who played mechanic Cooter on the show, in a Facebook post. "Every minute of his long life was spent acting, writing, producing, painting, teaching, fishing, or involved in another of his life's many passions." Born Jewel Guy on July 26, 1926, in Powderly, Kentucky, Best was orphaned at 3 and adopted by Armen and Essa Best, who renamed him James and raised him in rural Indiana. Best served in the Army during World War II before launching his acting career. In the 1950s and 1960s, he accumulated scores of credits, playing a range of colorful supporting characters in such TV shows as "The Twilight Zone," "Bonanza," "The Andy Griffith Show" and "Gunsmoke." He later appeared in a handful of Burt Reynolds' movies, including "Hooper" and "The End." But Best will always be best known for his "Hazzard" role, which lives on in reruns. "Jimmie was my teacher, mentor, close friend and collaborator for 26 years," Latshaw said. "I directed two of his feature films, including the recent 'Return of the Killer Shrews,' a sequel he co-wrote and was quite proud of as he had made the first one more than 50 years earlier." People we've lost in 2015 . CNN's Stella Chan contributed to this story.
Highlights : James Best, who played the sheriff on "The Dukes of Hazzard," died Monday at 88 .
"Hazzard" ran from 1979 to 1985 and was among the most popular shows on TV .
Summary : ['James Best, best known for his portrayal of bumbling Sheriff Rosco P. Coltrane on "The Dukes of Hazzard," died Monday at his home in Hickory, North Carolina, of complications from pneumonia..']
---------------------------------------------------------
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:26<00:00, 1.25s/it]
Hugging Face (total latency: 26.188205000000004 sec)
beam_id: 0
rouge1 : 21.601152079417847
rouge2 : 6.627345555624651
rougeL : 16.39804173113905
rougeLsum : 16.578730031985682 |
Hey @byshiue, I'm really sorry I missed this message. It got buried in my emails. I just tested again and you are right, the HF fp16 models perform similar to FT fp16 models. Something in my original setup was probably messed up that did fp16 for FT and fp32 for HF. You can close this issue. |
There is a very high chance it is related to this problem: huggingface/transformers#20287 (comment) The |
That would explain what I'm seeing. @byshiue Would we need to update huggingface_t5_ckpt_convert.py to keep |
Description
Reproduced Steps
I'm following the instructions by @byshiue to test Flan-T5 with faster transformer from here.
Please try the following scripts on latest main branch. You don't need to do any modification on converter.
model_checkpoint_path
ofconfig.pbtxt
of t5 to beflan-t5-small/c-models/1-gpu/
,data_type
to befp16
. Then we can run the testThe Issue Is Described Below
When I increase the model size -
flan-t5-small
,flan-t5-base
,flan-t5-large
,flan-t5-xl
,flan-t5-xxl
, the quality of the summarization drops as measured by rouge scores especially forflan-t5-xl
andflan-t5-xxl
. In factflan-t5-xxl
output is meaningless. Any ideas why this could be happening?Below are the outputs for each of the models. I've omitted the article itself in the interest fo space.
flan-t5-small
flan-t5-base
flan-t5-large
flan-t5-xl
flan-t5-xxl
The text was updated successfully, but these errors were encountered: