From dfc31ab07376f9abacc1b4297aae589f83d7680a Mon Sep 17 00:00:00 2001
From: Sean Rosario <seanrosario@Seans-MBP.pgh.petuum.com>
Date: Mon, 13 Nov 2017 22:53:30 -0500
Subject: [PATCH] Added more key points to InferSent notes

---
 ...-learning-of-universal-sentence-representations.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/notes/supervised-learning-of-universal-sentence-representations.md b/notes/supervised-learning-of-universal-sentence-representations.md
index 84ecab5..18b4faf 100644
--- a/notes/supervised-learning-of-universal-sentence-representations.md
+++ b/notes/supervised-learning-of-universal-sentence-representations.md
@@ -7,8 +7,13 @@ TLDR; The authors show that supervised training on the NLI task can produce high
 
 - The 4 sentence encoding architectures used are:
     - LSTM/GRU: Essentially the encoder of a seq2seq model
-    - BiLSTM: Bi-directional LSTM where each dim of the two (forwards and backwards) encoding are either summed or max-pooled
+    - BiLSTM: Bi-directional LSTM where each dim of the two (forwards and backwards) encoding are either summed or max-pooled.
     - Self-attentive network:  Weighted linear combination (Attention) over each hidden state vectors of a BiLSTM
-    - Hierarchical ConvNet: The authors introduce a variation of the AdaSent model, where at each layer of the CNN, a max pool is taken over the feature maps. Each of these max pooled vectors are concatenated to obtain the final sentence encoding.
+    - Hierarchical ConvNet: The authors introduce a variation of the AdaSent model, where at each layer of the CNN, a max pool is taken over the feature maps. Each max pooled vector is concatenated to obtain the final sentence encoding.
+
+- The BiLSTM-Max w/ 4096 dim encoding performs best out of all on the SNLI task as well as on transfer tasks.
+
+- Some models are sensitive to over to over-specialization on the SNLI training task. This means that some models can perform better on the SNLI task but don't transfer as well compared to other models
+
+- The trained models are used to get sentence representations and test performance on 12 different transfer tasks such as classification (eg: sentiment analysis, Subj/obj), entailment (eg: SICK dataset), caption-image retrieval and a few other tasks.
 
-- The trained models are used to get sentence representations for different tasks such as classification (eg: sentiment analysis, Subj/obj), entailment (eg: SICK dataset), caption-image retrieval and a few other tasks.
\ No newline at end of file