Load dynamic module (remote code) only once if code isn't change #33162

XuehaiPan · 2024-08-28T10:11:24Z

What does this PR do?

Add an indicator __transformers_module_hash__ to the remote code module.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ydshieh

ydshieh · 2024-08-28T15:50:30Z

Hi @XuehaiPan, thank you for taking the action.

My first question is: what if the target (remote code) is changed since the last time it has been loaded.

I don't know very well what is the context mentioned in #30370 (comment), but IIRD, that happens within a single python process (i.e. multiple loading of the same module). And if that is the case, my question on top of this comment have to be considered, right?

Maybe @tmm1 could explain a bit more the situation, ideally with a code snippet to demonstrate the issue.

XuehaiPan · 2024-08-28T16:03:18Z

My first question is: what if the target (remote code) is changed since the last time it has been loaded.

@ydshieh Thanks for raising this. I change this indicator to the hash of the source code.

ydshieh · 2024-08-28T16:10:58Z

Nice. I still prefer for @tmm1 to elaborate the issue a bit more though.

But in the meantime, could you tell me more about the usage of threading.Lock() in this PR 🙏 ?

XuehaiPan · 2024-08-28T17:07:09Z

But in the meantime, could you tell me more about the usage of threading.Lock() in this PR 🙏 ?

It's just for thread safety. We are modifying the global variable sys.modules.

tmm1 · 2024-08-28T21:33:33Z

prefer for @tmm1 to elaborate the issue a bit more though

sure, here is a simple example:

import sys
from accelerate import init_empty_weights
from transformers import AutoModelForCausalLM

def load_modeling_code(model_name):
    with init_empty_weights():
        model = AutoModelForCausalLM.from_pretrained(
            model_name, trust_remote_code=True
        )
        return sys.modules[model.__class__.__module__]

model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Base"

# 1. patch modeling code
mod = load_modeling_code(model_name)
import liger_kernel
mod.CrossEntropyLoss = liger_kernel.transformers.cross_entropy.LigerCrossEntropyLoss
mod.DeepseekV2RMSNorm = liger_kernel.transformers.rms_norm.LigerRMSNorm
mod.DeepseekV2MLP.forward = liger_kernel.transformers.swiglu.LigerSwiGLUMLP.forward

# 2. check patch
mod = load_modeling_code(model_name)
print(mod.CrossEntropyLoss == liger_kernel.transformers.cross_entropy.LigerCrossEntropyLoss)

# 3. create and use model
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

currently on main branch, the patch is missing in step (2) and (3)

XuehaiPan · 2024-08-29T06:09:41Z

It looks reasonable to assume the import statement returns the cached module object if it has already been imported.

Ideally, we may need to simulate this:

sys.path.insert(0, HF_MODULES_CACHE)
module = importlib.import_module(name)  # returns the same module object if it has already been imported (i.e. return sys.modules[name])
assert sys.path[0] == HF_MODULES_CACHE
del sys.path[0]

ydshieh · 2024-08-29T13:48:30Z

Hi @tmm1 Thank you for providing the detailed information.

I am wondering if it would make (more) sense for the patching code to have a factory method like

def load_modeling_code(model_name):
    with init_empty_weights():
        model = AutoModelForCausalLM.from_pretrained(
            model_name, trust_remote_code=True
        )
        return sys.modules[model.__class__.__module__]

def patch_modeling_code(model_name):

    # 1. patch modeling code
    mod = load_modeling_code(model_name)
    import liger_kernel
    mod.CrossEntropyLoss = liger_kernel.transformers.cross_entropy.LigerCrossEntropyLoss
    mod.DeepseekV2RMSNorm = liger_kernel.transformers.rms_norm.LigerRMSNorm
    mod.DeepseekV2MLP.forward = liger_kernel.transformers.swiglu.LigerSwiGLUMLP.forward

    return mod

model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Base"
mod = patch_modeling_code(model_name)

But I am open to what @XuehaiPan have done so far.

LysandreJik · 2024-08-29T13:57:26Z

cc @Rocketknight1 as well :)

tmm1 · 2024-08-30T00:00:54Z

I am wondering if it would make (more) sense for the patching code to have a factory method like

it could work, but often the user wants to apply some patch then invoke a well known trainer framework. such a framework would not accept a factory callback

more importantly, the factory would still need to call AutoModelForCausalLM.from_pretrained two times:

to download and load the remote code, so it is available inside sys.modules for patching
without init_empty_weights(), and to actually use the patched code

without this PR, the second invocation will remove the patch. so it would be impossible to get a model object that actually is patched.

ydshieh · 2024-08-30T13:42:54Z

OK, I didn't know

# 3. create and use model
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

in your previous comment is actually needed. Makes sense now.

I will let @Rocketknight1 to have a look on the changes of this PR too.

Rocketknight1 · 2024-08-30T15:20:04Z

It seems clean to me! I can think of two failure cases:

This will not detect changes in other files (for example, if the model also has utils/functions in a separate file from the main modeling file, the hash of the main modeling file will not change and so it will not be reimported.
I think this could cause a regression if users load a remote_code model, monkey-patch methods, and then reload the model to clear their changes, since now their changes will persist. However, I suspect there are no users depending on this weird behaviour, so it's not a serious problem.

Both of these are very minor issues that cannot really be fixed in this PR, and I don't think they should block it, so I'm happy with it!

ydshieh · 2024-08-30T15:40:53Z

Hi @XuehaiPan It would be great if you can add a corresponding test, like a (simple version of) patched module (like what @tmm1 has) will remain the same.

XuehaiPan · 2024-09-01T08:34:18Z

It would be great if you can add a corresponding test

Done.

HuggingFaceDocBuilderDev · 2024-09-02T15:21:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/dynamic_module_utils.py

tests/models/auto/test_feature_extraction_auto.py

tests/models/auto/test_tokenization_auto.py

ydshieh

very nice. a few nit comments

ydshieh · 2024-09-03T13:19:56Z

In my previous comment

        # The configuration file is cached in the snapshot directory. So the module file is not changed after dumping

in other files (other than tests/models/auto/test_configuration_auto.py)

here is not about configuration

I can go with the current version, just let me know if you want to add it back (with the correct comments).

What I mean is that we have to change the comment The configuration file to the actual file name (like the image processor file). Otherwise, that part of test is actually ok.

src/transformers/dynamic_module_utils.py

XuehaiPan · 2024-09-03T14:02:08Z

I can go with the current version, just let me know if you want to add it back (with the correct comments).

@ydshieh I add a commit to address this.

ydshieh · 2024-09-03T17:59:57Z

@Rocketknight1 in case if you want to take a final look. (and/or feel free to ping a core maintainer 🙏 )

Rocketknight1 · 2024-09-04T14:53:39Z

I'm happy with it, I think! The test failure seems unrelated, and I like the core goal of allowing the same model to be imported twice without wasting time, and without getting two different output classes.

cc @XuehaiPan maybe rebase onto main and see if that fixes the test?

cc @LysandreJik for core maintainer approval!

LysandreJik

Thanks! This looks good to me if approved by the esteemed @Rocketknight1

Rocketknight1 · 2024-09-05T17:03:59Z

@XuehaiPan let me know if you're happy to merge, or if there's anything else you want to tweak first!

XuehaiPan · 2024-09-06T03:13:35Z

@XuehaiPan let me know if you're happy to merge, or if there's anything else you want to tweak first!

@Rocketknight1 I think this is the final version of the PR.

Rocketknight1 · 2024-09-06T11:49:31Z

Merging, in that case. Thank you for the PR!

…gingface#33162) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review

XuehaiPan mentioned this pull request Aug 28, 2024

Do not use deprecated SourceFileLoader.load_module() in dynamic module loading #30370

Merged

5 tasks

XuehaiPan force-pushed the remote-code-once branch from b9a0d02 to 1b86abe Compare August 28, 2024 16:00

tmm1 mentioned this pull request Aug 28, 2024

monkey-patch transformers to simplify monkey-patching modeling code axolotl-ai-cloud/axolotl#1877

Merged

ydshieh self-assigned this Aug 29, 2024

XuehaiPan force-pushed the remote-code-once branch 10 times, most recently from b7c4b72 to 742fda2 Compare August 31, 2024 19:09

XuehaiPan changed the title ~~Load remote code only once~~ Load dynamic module (remote code) only once if code isn't change Sep 1, 2024