Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add Model Hooks for Accessing and Customizing Model Activations #3266

Open
2 tasks done
shuyhere opened this issue Feb 3, 2025 · 0 comments
Open
2 tasks done
Assignees
Labels

Comments

@shuyhere
Copy link

shuyhere commented Feb 3, 2025

Checklist

Motivation

Description

It would be beneficial to introduce model hooks that allow users to access and modify model activations. This feature would enable greater flexibility for tasks such as visualization, debugging, and custom processing of intermediate representations.

Use case

  • Extract intermediate outputs for interpretability analysis, such as LogitLens-style investigations.
  • Expose internal activations, enabling users to cache activations and implement functions to edit, remove, or replace them dynamically during inference, for example representation engineering.

While this may introduce some performance overhead, it would enhance interpretability research and enable efficient model editing.

Related resources

model hook resources

related issues and use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants