This repo contains code and articles on PAIR interpretability projects.
See blog post, for a light introduction to the paper. There is also a public demo, and the dedicated github repo. The full paper is Scalable Influence and Fact Tracing for Large Language Model Pretraining -- Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, Ian Tenney (RH)
Racing Thoughts: Explaining Contextualization Errors Within Large Language Models -- Michael A. Lepori, Mike Mozer, Asma Ghandeharioun (RH)
Who's asking? User personas and the mechanics of latent misalignment -- Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon, at NeurIPS'24.
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models (ICML'24)
The Patchscopes mini-site & the interactive explorable contain a brief introduction to the longer paper (ICML'24) by Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon.
bert-tree and context-atlas are repos for two interactive blogposts/visualizations for the paper Visualizing and Measuring the Geometry of BERT :
-
Language, trees, and geometry in neural networks explores the geometry of syntactic information in BERT (bert-tree)
-
Language, Context, and Geometry in Neural Network explores semantics and context in BERT. See the accompanying tool, Context Atlas, for more details (context-atlas)
text-dream contains different experiments and tools to work with deep dreaming for text.
data-synth-syntax contains LinguisticLens, a tool for visualizing generated text data.