Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: How to access mlp layer using the current version vllm(0.4.0) #8278

Closed
1 task done
waterluck opened this issue Sep 9, 2024 · 1 comment
Closed
1 task done
Labels
usage How to use vllm

Comments

@waterluck
Copy link

Your current environment


Description:

I am currently updating code that was written based on an older version of vllm (version 0.2.7). In the previous implementation, I accessed the mlp layer using the following code snippet:

obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp

However, after updating to the latest version of vllm, this line now raises the following error:

AttributeError: 'LLMEngine' object has no attribute 'driver_worker'

It seems that the architecture of vllm has changed in the newer version, and I am unsure how to access the mlp layer now.

Below is the relevant part of the code where I use this method:

from vllm import LLM, SamplingParams
model = LLM(model=args.model, tensor_parallel_size=torch.cuda.device_count(), enforce_eager=True)

if args.activation_mask:
    activation_masks = torch.load(args.activation_mask)

for activation_mask, mask_lang in zip(activation_masks, mask_langs):
    if activation_mask:
        def factory(mask):
            def llama_forward(self, x):
                gate_up, _ = self.gate_up_proj(x)
                i = gate_up.size(-1)
                activation = F.silu(gate_up[:, :, : i // 2])
                activation.index_fill_(2, mask, 0)
                x = activation * gate_up[:, :, i // 2 :]
                x, _ = self.down_proj(x)
                return x

            def bloom_forward(self, x: torch.Tensor):
                x, _ = self.dense_h_to_4h(x)
                x = self.gelu_impl(x)
                x.index_fill_(2, mask, 0)
                x, _ = self.dense_4h_to_h(x)
                return x

            if is_llama:
                return llama_forward
            else:
                return bloom_forward

        for i, layer_mask in enumerate(activation_mask):
            if is_llama:
                obj = model.llm_engine.driver_worker.model_runner.model.model.layers[i].mlp
            else:
                obj = model.llm_engine.driver_worker.model_runner.model.transformer.h[i].mlp
            obj.forward = MethodType(factory(layer_mask.to('cuda')), obj)

  for lang in langs:
        texts, sampling_params, = load_dataset(lang, sampling_params)
        outputs = model.generate(texts, sampling_params)

Questions:

  1. What is the correct method to access the mlp layer in the new version of vllm?
  2. Has there been a change in how the model architecture is structured in the new versions? If so, could you please guide me on how to adjust the above code to work with the updated architecture?

Any guidance would be appreciated. Thanks!

How would you like to use vllm

I don't know how to integrate it with new version vllm.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@waterluck waterluck added the usage How to use vllm label Sep 9, 2024
@DarkLight1337
Copy link
Member

Sorry for missing this, see #9318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants