Skip to content

Commit

Permalink
polishing writing a little in index.html
Browse files Browse the repository at this point in the history
A few language polishings to the website.
  • Loading branch information
iislucas authored Jan 14, 2024
1 parent 8e281d4 commit 1012d70
Showing 1 changed file with 16 additions and 16 deletions.
32 changes: 16 additions & 16 deletions patchscopes/index.html
Original file line number Diff line number Diff line change
@@ -124,16 +124,16 @@ <h1 class="title is-1 publication-title">🩺 <samp>Patchscopes</samp>: A Unifyi
<br>
</p>
<h3 class="subtitle is-size-5-tablet has-text-left pb-5" style="font-weight: normal">
Given a representation, we propose to decode specific information from it by “patching” it into a separate inference pass that encourages the extraction of that information, independently of the original context. <span>
We propose a framework that decodes specific information from a represetion within an LLM by “patching” it into the inference pass on a different prompt that has been designed to encourages the extraction of that information. <span>
A "<samp>Patchscope</samp>" is a configuration of our framework that can be viewed as an inspection tool geared towards a particular objective. <span><br><br>

For example, this figure shows a <samp>Patchscope</samp> for decoding what is encoded in the representation of <i>"CEO"</i> in the source prompt (left). <span>
We use a target prompt (right) comprised of few-shot demonstrations of token repetitions, which encourages decoding the token identity given a hidden representation.<span><br><span><br>
For example, this figure shows a simple <samp>Patchscope</samp> for decoding what is encoded in the representation of <i>"CEO"</i> in the source prompt (left). <span>
We patch a target prompt (right) comprised of few-shot demonstrations of token repetitions, which encourages decoding the token identity given a hidden representation.<span><br><span><br>

<u>Step 1:</u> Run the forward computation on the source prompt in the source model. <span><br>
<u>Step 2:</u> Apply an optional transformation to the source hidden state at source layer. <span><br>
<u>Step 2:</u> Apply an optional transformation to the source layer's represetion. <span><br>
<u>Step 3:</u> Run the forward computation on the target prompt up to the target layer in the target model. <span><br>
<u>Step 4:</u> Patch the target representation of <i>"?"</i> at the target layer with the transformed representation (from step 2), and continue the forward computation from that layer onward. <span><br>
<u>Step 4:</u> Patch the target representation of <i>"?"</i> at the target layer; replacing it with the transformed representation (from step 2), and continue the forward computation from that layer onward. <span><br>
</h3>
</div>
</div>
@@ -155,11 +155,11 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
</h2>
<h3 class="subtitle is-size-4-tablet has-text-left has-background-info-light pr-4 pl-4 pt-3 pb-3">
<p>
Inspecting the information encoded in hidden representations of large language models (LLMs) can explain models' behavior and verify their alignment with human values.
Inspecting the information encoded in hidden representations of a large language model (LLM) can help explain the model's behavior and verify its alignment with human values.
Given the capabilities of LLMs in generating human-understandable text, we propose leveraging the model itself to explain its internal representations in natural language. <br><br>
We introduce a framework called <samp>Patchscopes</samp> and show how it can be used to answer a wide range of questions about an LLM's computation.
We show that prior interpretability methods based on projecting representations into the vocabulary space and intervening on the LLM computation can be viewed as instances of this framework.
Moreover, several of their shortcomings such as failure in inspecting early layers or lack of expressivity can be mitigated by <samp>Patchscopes</samp>. <br><br>
We show that many prior interpretability methods based on projecting representations into the vocabulary space and intervening on the LLM computation can be viewed as instances of this framework.
Moreover, several shortcomings of prior methods, such as failure in inspecting early layers or lack of expressivity, can be mitigated by <samp>Patchscopes</samp>. <br><br>
Beyond unifying prior inspection techniques, <samp>Patchscopes</samp> also opens up <em>new</em> possibilities such as using a more capable model to explain the representations of a smaller model, and unlocks new applications such as self-correction in multi-hop reasoning.
</p>
</h3>
@@ -171,8 +171,8 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
</h2>
<h3 class="subtitle is-size-4-tablet has-text-left pr-4 pl-4 pt-3 pb-3">
<p>
<samp>Patchscopes</samp> can be configured to answer a wide range of questions about an LLM's computation. Many prominent interpretability methods can be cast as its special instances, and several of their limitations such as failure in inspecting early layers or lack of expressivity can be mitigated with a new <samp>Patchscope</samp>. <br><br>
Additionally, its generality enables novel inspection possibilities and helps address questions that are hard to answer with existing methods. For example how do LLMs contextualize input entity names in early layers? This is where vocabulary projections mostly fail and other methods only provide a binary signal of whether the entity has been resolved, at best. We present a <samp>Patchscope</samp> that verbalizes the gradual entity resolution process.
<samp>Patchscopes</samp> can be configured to answer a wide range of questions about an LLM's computation. Many prominent interpretability methods can be cast as special instances, and several of their limitations such as failure in inspecting early layers, or lack of expressivity, can be mitigated with a new <samp>Patchscope</samp>. <br><br>
Additionally, <samp>Patchscopes</samp> generality enables novel inspection possibilities and helps address questions that are hard to answer with existing methods. For example, how do LLMs contextualize input entity names in early layers? This is where vocabulary projections mostly fail and other methods only provide a binary signal of whether the entity has been resolved; but a <samp>Patchscope</samp> can be easily created to verbalize the gradual entity resolution process and works at early layers.
</p>
<p style="text-align:center;">
<br>
@@ -187,7 +187,7 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
</h2>
<h3 class="subtitle is-size-4-tablet has-text-left pr-4 pl-4 pt-3 pb-3">
<p>
We show that a simple few-shot token identity <samp>Patchscope</samp> works very well, significantly better than mainstream vocab projection methods across multiple LLMs, from layer 10 onwards.
A simple few-shot token identity <samp>Patchscope</samp> works very well from layer 10 onwards, significantly better than mainstream vocab projection methods across multiple LLMs.
</p>
<p style="text-align:center;">
<br>
@@ -219,7 +219,7 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
<h3 class="subtitle is-size-4-tablet has-text-left pr-4 pl-4 pt-3 pb-3">
<p>
How LLMs contextualize input entity names in early layers is hard to answer with existing methods. This is where vocab projection methods mostly fail and other methods only provide a binary signal of whether the entity has been resolved.
However, a few-shot entity description <samp>Patchscopes</samp> can verbalize the gradual entity resolution process in the very early layers.
However, a few-shot entity description <samp>Patchscope</samp> can verbalize the gradual entity resolution process in the very early layers.
</p>
<p style="text-align:center;">
<br>
@@ -234,7 +234,7 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered has-
</h2>
<h3 class="subtitle is-size-4-tablet has-text-left pr-4 pl-4 pt-3 pb-3">
<p>
We show that you can even get more expressive descriptions using a more capable model of the same family to explain the entity resolution process of a smaller model, e.g., using Vicuna 13B to explain Vicuna 7B.
You can even get more expressive descriptions using a more capable model of the same family to explain the entity resolution process of a smaller model, e.g., using Vicuna 13B to explain Vicuna 7B.
</p>
<p style="text-align:center;">
<br>
@@ -277,9 +277,9 @@ <h2 class="subtitle is-size-3-tablet has-text-weight-bold has-text-centered mr-0
</h2>
</p>
<h3 class="subtitle is-size-5-tablet has-text-left pb-5" style="font-weight: normal">
We present <samp>Patchscopes</samp>, a simple and effective framework that leverages the ability of LLMs to generate human-like text for decoding information from intermediate LLM representations. <br><br>
We show that existing interpretability methods can be cast as specific instances of <samp>Patchscopes</samp>, which cover only a small portion of all the possible configurations of the framework. Moreover, using new underexplored <samp>Patchscopes</samp> substantially improves our ability to decode various types of information from the model's internal computation, such as the output prediction and knowledge attributes, typically outperforming prominent methods that rely on projection to the vocabulary and probing. <br><br>
Our framework enables new capabilities, such as analyzing the contextualization process of input tokens in the very early layers of the model, and is beneficial for practical applications, such as multi-hop reasoning correction. </h3>
<samp>Patchscopes</samp> is a simple and effective framework that leverages the ability of LLMs to generate human-like text to decode information from intermediate LLM representations. <br><br>
We show that many existing interpretability methods can be cast as specific configurations of the more general <samp>Patchscopes</samp> framework. Moreover, using new underexplored <samp>Patchscopes</samp> substantially improves our ability to decode various types of information from a model's internal computation, such as output prediction and knowledge attributes, typically outperforming prominent methods that rely on projection to the vocabulary and probing. <br><br>
Our framework also enables new forms of interpretability, such as analyzing the contextualization process of input tokens in the very early layers of the model, and is beneficial for practical applications, such as multi-hop reasoning correction. </h3>
</div>
</div>
</div>

0 comments on commit 1012d70

Please sign in to comment.