Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add papers from discord and blogs #6

Merged
merged 4 commits into from
Sep 20, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ Check out [Lightly**SSL**](https://github.com/lightly-ai/lightly) a computer vis
| [Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach](https://arxiv.org/abs/2405.15613) | [![arXiv](https://img.shields.io/badge/arXiv-2405.15613-b31b1b.svg)](https://arxiv.org/abs/2405.15613) |
| [GLID: Pre-training a Generalist Encoder-Decoder Vision Model](https://arxiv.org/abs/2404.07603) | [![arXiv](https://img.shields.io/badge/arXiv-2404.07603-b31b1b.svg)](https://arxiv.org/abs/2404.07603) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1CEaZ00z-0hqGKp5cTN8fxP6tsHiHkFye/view?usp=sharing) |
| [Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391) | [![arXiv](https://img.shields.io/badge/arXiv-2401.14391-b31b1b.svg)](https://arxiv.org/abs/2401.14391) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1LtIPoes3y1ZOHD-UBeKgj9AYBoQ-nO5A/view?usp=sharing) |
| [You Don't Need Data-Augmentation in Self-Supervised Learning](https://arxiv.org/abs/2406.09294) | [![arXiv](https://img.shields.io/badge/arXiv-2406.09294-b31b1b.svg)](https://arxiv.org/abs/2406.09294) |
| [Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?](https://arxiv.org/abs/2406.10743) | [![arXiv](https://img.shields.io/badge/arXiv-2406.10743-b31b1b.svg)](https://arxiv.org/abs/2406.10743) |
| [Asymmetric Masked Distillation for Pre-Training Small Foundation Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Asymmetric_Masked_Distillation_for_Pre-Training_Small_Foundation_Models_CVPR_2024_paper.pdf) | [![CVPR](https://img.shields.io/badge/CVPR-2024-b31b1b.svg)](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Asymmetric_Masked_Distillation_for_Pre-Training_Small_Foundation_Models_CVPR_2024_paper.pdf) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/MCG-NJU/AMD) |
| [Revisiting Feature Prediction for Learning Visual Representations from Video](https://arxiv.org/abs/2404.08471) | [![arXiv](https://img.shields.io/badge/arXiv-2404.08471-b31b1b.svg)](https://arxiv.org/abs/2404.08471) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/facebookresearch/jepa) |
| [Rethinking Patch Dependence for Masked Autoencoders](https://arxiv.org/abs/2401.14391) | [![arXiv](https://img.shields.io/badge/arXiv-2401.14391-b31b1b.svg)](https://arxiv.org/abs/2401.14391) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/TonyLianLong/CrossMAE) |
| [ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning](https://arxiv.org/abs/2405.15160) | [![arXiv](https://img.shields.io/badge/arXiv-2405.15160-b31b1b.svg)](https://arxiv.org/abs/2405.15160) |

## 2023

Expand All @@ -26,6 +32,19 @@ Check out [Lightly**SSL**](https://github.com/lightly-ai/lightly) a computer vis
| [DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193) | [![arXiv](https://img.shields.io/badge/arXiv-2304.07193-b31b1b.svg)](https://arxiv.org/abs/2304.07193) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/11szszgtsYESO3QF8jkFsLFTVtN797uH2/view?usp=sharing) |
| [Segment Anything](https://arxiv.org/abs/2304.02643) | [![arXiv](https://img.shields.io/badge/arXiv-2304.02643-b31b1b.svg)](https://arxiv.org/abs/2304.02643) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/18yPuL8J6boi5pB1NRO6VAUbYEwmI3tFo/view?usp=sharing) |
| [Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture](https://arxiv.org/abs/2301.08243) | [![arXiv](https://img.shields.io/badge/arXiv-2301.08243-b31b1b.svg)](https://arxiv.org/abs/2301.08243) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1l5nHxqqbv7o3ESw3DLBqgJyXILJ0FgH6/view?usp=sharing) |
| [Self-supervised Object-Centric Learning for Videos](https://arxiv.org/abs/2310.06907) | [![NeurIPS](https://img.shields.io/badge/NeurIPS_2023-2310.06907-b31b1b.svg)](https://arxiv.org/abs/2310.06907) |
| [Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution](https://proceedings.neurips.cc/paper_files/paper/2023/file/06ea400b9b7cfce6428ec27a371632eb-Paper-Conference.pdf) | [![NeurIPS](https://img.shields.io/badge/NeurIPS_2023-2310.06907-b31b1b.svg)](https://proceedings.neurips.cc/paper_files/paper/2023/file/06ea400b9b7cfce6428ec27a371632eb-Paper-Conference.pdf) |
| [An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization](https://proceedings.neurips.cc/paper_files/paper/2023/file/6b1d4c03391b0aa6ddde0b807a78c950-Paper-Conference.pdf) | [![NeurIPS](https://img.shields.io/badge/NeurIPS_2023-b31b1b.svg)](https://proceedings.neurips.cc/paper_files/paper/2023/file/6b1d4c03391b0aa6ddde0b807a78c950-Paper-Conference.pdf) |
| [The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning](https://arxiv.org/abs/2307.10907) | [![arXiv](https://img.shields.io/badge/arXiv-2307.10907-b31b1b.svg)](https://arxiv.org/abs/2307.10907) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/apple/ml-entropy-reconstruction) |
| [Fast Segment Anything](https://arxiv.org/abs/2306.12156) | [![arXiv](https://img.shields.io/badge/arXiv-2306.12156-b31b1b.svg)](https://arxiv.org/abs/2306.12156) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/CASIA-IVA-Lab/FastSAM) |
| [Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/abs/2306.14289) | [![arXiv](https://img.shields.io/badge/arXiv-2306.14289-b31b1b.svg)](https://arxiv.org/abs/2306.14289) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/ChaoningZhang/MobileSAM) |
| [What Do Self-Supervised Vision Transformers Learn?](https://arxiv.org/abs/2305.00729) | [![arXiv](https://img.shields.io/badge/ICLR_2023-2305.00729-b31b1b.svg)](https://arxiv.org/abs/2305.00729) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/naver-ai/cl-vs-mim) |
| [Improved baselines for vision-language pre-training](https://arxiv.org/abs/2305.08675) | [![arXiv](https://img.shields.io/badge/arXiv-2305.08675-b31b1b.svg)](https://arxiv.org/abs/2305.08675) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/facebookresearch/clip-rocket) |
| [Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need](https://arxiv.org/abs/2303.15256) | [![arXiv](https://img.shields.io/badge/arXiv-2303.15256-b31b1b.svg)](https://arxiv.org/abs/2303.15256) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/VivienCabannes/rates) |
SauravMaheshkar marked this conversation as resolved.
Show resolved Hide resolved
| [EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything](https://arxiv.org/abs/2312.00863) | [![arXiv](https://img.shields.io/badge/arXiv-2312.00863-b31b1b.svg)](https://arxiv.org/abs/2312.00863) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/yformer/EfficientSAM) |
| [DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions](https://arxiv.org/abs/2309.03576) | [![arXiv](https://img.shields.io/badge/arXiv-2309.03576-b31b1b.svg)](https://arxiv.org/abs/2309.03576) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/Haochen-Wang409/DropPos) |
| [VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf) | [![CVPR](https://img.shields.io/badge/CVPR-2023-b31b1b.svg)](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_VideoMAE_V2_Scaling_Video_Masked_Autoencoders_With_Dual_Masking_CVPR_2023_paper.pdf) |
| [MGMAE: Motion Guided Masking for Video Masked Autoencoding](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_MGMAE_Motion_Guided_Masking_for_Video_Masked_Autoencoding_ICCV_2023_paper.pdf) | [![CVPR](https://img.shields.io/badge/CVPR-2023-b31b1b.svg)](https://openaccess.thecvf.com/content/ICCV2023/papers/Huang_MGMAE_Motion_Guided_Masking_for_Video_Masked_Autoencoding_ICCV_2023_paper.pdf) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/MCG-NJU/MGMAE) |

## 2022

Expand All @@ -40,6 +59,13 @@ Check out [Lightly**SSL**](https://github.com/lightly-ai/lightly) a computer vis
| [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) | [![arXiv](https://img.shields.io/badge/arXiv-2203.12602-b31b1b.svg)](https://arxiv.org/abs/2203.12602) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1F0oyiyyxCKzWS9Gv8TssHxaCMFnAoxfb/view?usp=sharing) |
| [Improving Visual Representation Learning through Perceptual Understanding](https://arxiv.org/abs/2212.14504) | [![arXiv](https://img.shields.io/badge/arXiv-2212.14504-b31b1b.svg)](https://arxiv.org/abs/2212.14504) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1n4Y0iiM368RaPxPg6qvsfACguaolFnhf/view?usp=sharing) |
| [RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank](https://arxiv.org/abs/2210.02885) | [![arXiv](https://img.shields.io/badge/arXiv-2210.02885-b31b1b.svg)](https://arxiv.org/abs/2210.02885) [![Google Drive](https://img.shields.io/badge/Lightly_Reading_Group-4285F4?logo=googledrive&logoColor=white)](https://drive.google.com/file/d/1cEP1_G2wMM3-AMMrdntGN6Fq1E5qwPi1/view?usp=sharing) |
| [A Closer Look at Self-Supervised Lightweight Vision Transformers](https://arxiv.org/abs/2205.14443) | [![arXiv](https://img.shields.io/badge/arXiv-2205.14443-b31b1b.svg)](https://arxiv.org/abs/2205.14443) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/wangsr126/mae-lite) |
| [Beyond neural scaling laws: beating power law scaling via data pruning](https://arxiv.org/abs/2206.14486) | [![arXiv](https://img.shields.io/badge/NeurIPS_2022-2206.14486-b31b1b.svg)](https://arxiv.org/abs/2206.14486) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/rgeirhos/dataset-pruning-metrics) |
| [A simple, efficient and scalable contrastive masked autoencoder for learning visual representations](https://arxiv.org/abs/2210.16870) | [![arXiv](https://img.shields.io/badge/arXiv-2210.16870-b31b1b.svg)](https://arxiv.org/abs/2210.16870) |
| [Masked Autoencoders are Robust Data Augmentors](https://arxiv.org/abs/2206.04846) | [![arXiv](https://img.shields.io/badge/arXiv-2206.04846-b31b1b.svg)](https://arxiv.org/abs/2206.04846) |
| [Is Self-Supervised Learning More Robust Than Supervised Learning?](https://arxiv.org/abs/2206.05259) | [![arXiv](https://img.shields.io/badge/arXiv-2206.05259-b31b1b.svg)](https://arxiv.org/abs/2206.05259) |
| [Can CNNs Be More Robust Than Transformers?](https://arxiv.org/abs/2206.03452) | [![arXiv](https://img.shields.io/badge/arXiv-2206.03452-b31b1b.svg)](https://arxiv.org/abs/2206.03452) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/UCSC-VLAA/RobustCNN) |
| [Patch-level Representation Learning for Self-supervised Vision Transformers](https://arxiv.org/abs/2206.07990) | [![arXiv](https://img.shields.io/badge/arXiv-2206.07990-b31b1b.svg)](https://arxiv.org/abs/2206.07990) [![GitHub](https://img.shields.io/badge/GitHub-100000?&logo=github&logoColor=white)](https://github.com/alinlab/selfpatch) |

## 2021

Expand All @@ -53,6 +79,8 @@ Check out [Lightly**SSL**](https://github.com/lightly-ai/lightly) a computer vis
| [With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations](https://arxiv.org/abs/2104.14548) | [![arXiv](https://img.shields.io/badge/arXiv-2104.14548-b31b1b.svg)](https://arxiv.org/abs/2104.14548) [![Open In Colab](https://img.shields.io/badge/Colab-PyTorch-blue?logo=googlecolab)](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/nnclr.ipynb) |
| [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886) | [![arXiv](https://img.shields.io/badge/arXiv-2111.09886-b31b1b.svg)](https://arxiv.org/abs/2111.09886) [![Open In Colab](https://img.shields.io/badge/Colab-PyTorch-blue?logo=googlecolab)](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/simmim.ipynb) |
| [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566) | [![arXiv](https://img.shields.io/badge/arXiv-2011.10566-b31b1b.svg)](https://arxiv.org/abs/2011.10566) [![Open In Colab](https://img.shields.io/badge/Colab-PyTorch-blue?logo=googlecolab)](https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/simsiam.ipynb) |
| [When Does Contrastive Visual Representation Learning Work?](https://arxiv.org/abs/2105.05837) | [![arXiv](https://img.shields.io/badge/arXiv-2105.05837-b31b1b.svg)](https://arxiv.org/abs/2105.05837) |
| [Efficient Visual Pretraining with Contrastive Detection](https://arxiv.org/abs/2103.10957) | [![arXiv](https://img.shields.io/badge/arXiv-2103.10957-b31b1b.svg)](https://arxiv.org/abs/2103.10957) |

## 2020

Expand Down
Loading