PAIR Interpretability

This repo contains code and articles on PAIR interpretability projects.

See blog post, for a light introduction to the paper. There is also a public demo, and the dedicated github repo. The full paper is Scalable Influence and Fact Tracing for Large Language Model Pretraining -- Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, Ian Tenney (RH)

Who's asking? User personas and the mechanics of latent misalignment -- Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon, at NeurIPS'24.

The Patchscopes mini-site & the interactive explorable contain a brief introduction to the longer paper (ICML'24) by Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori, Lucas Dixon.

bert-tree and context-atlas are repos for two interactive blogposts/visualizations for the paper Visualizing and Measuring the Geometry of BERT :

Language, trees, and geometry in neural networks explores the geometry of syntactic information in BERT (bert-tree)
Language, Context, and Geometry in Neural Network explores semantics and context in BERT. See the accompanying tool, Context Atlas, for more details (context-atlas)

text-dream contains different experiments and tools to work with deep dreaming for text.

data-synth-syntax contains LinguisticLens, a tool for visualizing generated text data.

Provide feedback