[`from_pretrained`] Make from_pretrained fast again #27709

ArthurZucker · 2023-11-26T16:23:17Z

what does this PR do

Skips all layer initialization when loading from pretrained without accelerate.

From ~20seconds to 5 seconds for a 7B model like llama.

The Weights are effectively initialized in « init_weights_ » of the pretrained method. All internal calls are skipped

Check that if a linear layer is missing it will be initialized! (Loading AutoModelForCausalLM from AutoModel

fixes #26258 and fixes #18505

model = XXXX.from_pretrained(model_id, torch_dtype=torch.float16, low_cpu_mem_usage=True") might fail

ArthurZucker · 2023-12-11T07:14:09Z

explicit overwrite breaks fx and is not that faster
Non initialized is not always zeros (failing tests) just make sure it’s not initialized

LysandreJik

Smart, albeit a bit manual 😄

LGTM

tests/test_modeling_common.py

src/transformers/modeling_utils.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

…fast-init

HuggingFaceDocBuilderDev · 2023-12-11T10:32:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Skip nn.Module.reset_parameters * Actually skip * Check quality * Maybe change all inits * Fix init issues: only modify public functions * Add a small test for now * Style * test updates * style * nice tes * style * make it even faster * one more second * remove fx icompatible * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <hi@lysand.re> * skip * fix quality * protect the import --------- Co-authored-by: Lysandre Debut <hi@lysand.re>

ArthurZucker added 14 commits November 26, 2023 17:22

Skip nn.Module.reset_parameters

aacad2b

Actually skip

f340239

Check quality

9ef7a90

Maybe change all inits

193ede0

Fix init issues: only modify public functions

c649db4

Add a small test for now

1d845e8

Style

6ad429e

merge

13fc67c

test updates

7672309

style

a004fdf

nice tes

880d905

style

010e5d7

make it even faster

524da03

one more second

5e099d6

ArthurZucker marked this pull request as ready for review December 11, 2023 07:04

ArthurZucker requested a review from LysandreJik December 11, 2023 07:17

LysandreJik approved these changes Dec 11, 2023

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

tests/test_modeling_common.py Outdated Show resolved Hide resolved

tests/test_modeling_common.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

ArthurZucker and others added 7 commits December 11, 2023 10:38

remove fx icompatible

d3099b8

Update tests/test_modeling_common.py

18e1c01

Co-authored-by: Lysandre Debut <hi@lysand.re>

Update tests/test_modeling_common.py

6b32ed2

Co-authored-by: Lysandre Debut <hi@lysand.re>

skip

db41d38

Merge branch 'fast-init' of github.com:huggingface/transformers into …

ff6e26b

…fast-init

fix quality

14f78dd

protect the import

79d95cc

ArthurZucker merged commit 0676d99 into main Dec 11, 2023
21 checks passed

ArthurZucker deleted the fast-init branch December 11, 2023 11:38

ArthurZucker mentioned this pull request Dec 11, 2023

Suppress reset_parameters of torch.nn.Linear,Conv2d... inside no_init_weights #18505

Closed

pacman100 mentioned this pull request Jan 8, 2024

[BUG] Very high loss when using DeepSpeed with CPU offloading for versions>=4.36.0. #28391

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`from_pretrained`] Make from_pretrained fast again #27709

[`from_pretrained`] Make from_pretrained fast again #27709

ArthurZucker commented Nov 26, 2023 •

edited

Loading

ArthurZucker commented Dec 11, 2023

LysandreJik left a comment

HuggingFaceDocBuilderDev commented Dec 11, 2023

[from_pretrained] Make from_pretrained fast again #27709

[from_pretrained] Make from_pretrained fast again #27709

Conversation

ArthurZucker commented Nov 26, 2023 • edited Loading

what does this PR do

ArthurZucker commented Dec 11, 2023

LysandreJik left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 11, 2023

[`from_pretrained`] Make from_pretrained fast again #27709

[`from_pretrained`] Make from_pretrained fast again #27709

ArthurZucker commented Nov 26, 2023 •

edited

Loading