Skip to content

Latest commit

 

History

History
34 lines (20 loc) · 3.2 KB

README.md

File metadata and controls

34 lines (20 loc) · 3.2 KB

PAIR Interpretability

This repo contains code and articles on PAIR interpretability projects.

Scalable Influence and Fact Tracing for Large Language Model Pretraining (ICLR'25)

See blog post, for a light introduction to the paper. There is also a public demo, and the dedicated github repo. The full paper is Scalable Influence and Fact Tracing for Large Language Model Pretraining -- Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, Ian Tenney (RH)

Racing Thoughts: Explaining Large Language Model Contextualization Errors (NAACL'25)

Racing Thoughts: Explaining Contextualization Errors Within Large Language Models -- Michael A. Lepori, Mike Mozer, Asma Ghandeharioun (RH)

Who's asking? User personas and the mechanics of latent misalignment (NeurIPS'24)

Who's asking? User personas and the mechanics of latent misalignment -- Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon, at NeurIPS'24.

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)

The Patchscopes mini-site & the interactive explorable contain a brief introduction to the longer paper (ICML'24) by Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon.

Visualizing and Measuring the Geometry of BERT

bert-tree and context-atlas are repos for two interactive blogposts/visualizations for the paper Visualizing and Measuring the Geometry of BERT :

  1. Language, trees, and geometry in neural networks explores the geometry of syntactic information in BERT (bert-tree)

  2. Language, Context, and Geometry in Neural Network explores semantics and context in BERT. See the accompanying tool, Context Atlas, for more details (context-atlas)

Deep dreaming on text

text-dream contains different experiments and tools to work with deep dreaming for text.

LinguisticLens

data-synth-syntax contains LinguisticLens, a tool for visualizing generated text data.