Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainable parameters in each stage #15

Open
jujeongho0 opened this issue Feb 3, 2025 · 2 comments
Open

Trainable parameters in each stage #15

jujeongho0 opened this issue Feb 3, 2025 · 2 comments
Labels
good first issue Good for newcomers

Comments

@jujeongho0
Copy link

Thanks for great work!

I have a question about trainable parameters in each stage.

Here's what I think, but is it right?

  • Stage 1: Vision Encoder + Projector
  • Stage 2: Vision Encoder + Projector + LLM
  • Stage 3: Vision Encoder + Video Compressor + Projector + LLM (I'm curious about this part.)
  • Stage 4: Vision Encoder + Video Compressor + Projector + LLM

Thank you.

@lixin4ever
Copy link
Collaborator

Roughly correct except for the video compressor.

The devised video compressor is not a trainable component but a deterministic rule-based function applied to video input only. In the video compressor, we calculate the difference between the consecutive patches in the temporal dimension and discard those very similar (i.e., the patch difference is very small) to the previous ones.

@lixin4ever lixin4ever added the good first issue Good for newcomers label Feb 4, 2025
@dominickrei
Copy link

A related question, how are these training stages integrated into the training scripts?

From what I understand, the training arguments for module-specific learning rates control which modules are trainable (train.py L608-622), but all scripts in scripts/train/ pass all 3 learning rates. I appreciate any clarification if I am misunderstanding the code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants