Skip to content

Latest commit

 

History

History
2812 lines (1588 loc) · 93.5 KB

README.md

File metadata and controls

2812 lines (1588 loc) · 93.5 KB

ECCV2022-Paper-List

ECCV2022论文汇总,部分论文的详细解析见FightingCV公众号

技术交流

欢迎大家关注公众号:FightingCV

FightingCV公众号 小助手微信 (备注【公司/学校+方向+ID】)
  • 公众号每天都会进行论文、算法和代码的干货分享哦~

  • 交流群每天分享一些最新的论文和解析,欢迎大家一起学习交流哈~~~ (加不进去可以加微信:775629340,记得备注【公司/学校+方向+ID】)

  • 强烈推荐大家关注知乎账号和FightingCV公众号,可以快速了解到最新优质的干货资源。

数据集/Dataset

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

Image Classification

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

Invariant Feature Learning for Generalized Long-Tailed Classification

RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos

GAN

Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

RepMix: Representation Mixing for Robust Attribution of Synthesized Images

VecGAN: Image-to-Image Translation with Interpretable Latent Directions

Context-Consistent Semantic Image Editing with Style-Preserved Modulation

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Supervised Attribute Information Removal and Reconstruction for Image Manipulation

Name: Adaptive Feature Interpolation for Low-Shot Image Generation

WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

Outpainting by Queries

Single Stage Virtual Try-on via Deformable Attention Flows

Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation

Monocular 3D Object Reconstruction with GAN Inversion

Generative Multiplane Images: Making a 2D GAN 3D-Aware

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

2D GANs Meet Unsupervised Single-view 3D Reconstruction

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

Auto-regressive Image Synthesis with Integrated Quantization

Compositional Human-Scene Interaction Synthesis with Semantic Control

Generator Knows What Discriminator Should Learn in Unconditional GANs

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing

Cross Attention Based Style Distribution for Controllable Person Image Synthesis

NeRF

Streamable Neural Fields

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields

PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo

Neural-Sim: Learning to Generate Training Data with NeRF

Neural Density-Distance Fields

Visual Transformer

k-means Mask Transformer

Weakly Supervised Grounding for VQA in Vision-Language Transformers

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Hunting Group Clues with Transformers for Social Group Activity Recognition

Entry-Flipped Transformer for Inference and Prediction of Participant Behavior

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

Action Quality Assessment with Temporal Parsing Transformer

GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

AiATrack: Attention in Attention for Transformer Visual Tracking

Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Reference-based Image Super-Resolution with Deformable Attention Transformer

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

Online Continual Learning with Contrastive Vision Transformer

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition

多模态 / Multimodal

Audio-Visual Segmentation

Cross-modal Prototype Driven Network for Radiology Report Generation

Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

Video Graph Transformer for Video Question Answering

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

LocVTP: Video-Text Pre-training for Temporal Localization

Inductive and Transductive Few-Shot Video Classification via Appearance and Temporal Alignments

Cross-Modal 3D Shape Generation and Manipulation

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

对比学习/Contrastive Learning

Network Binarization via Contrastive Learning

Contrastive Deep Supervision

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Action-based Contrastive Learning for Trajectory Prediction

FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs

Adversarial Contrastive Learning via Asymmetric InfoNCE

Fast-MoCo: Boost Momentum-based Contrastive Learning with Combinatorial Patches

Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation

目标检测/Object Detection

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

Should All Proposals be Treated Equally in Object Detection?

HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors

Adversarially-Aware Robust Object Detector

ObjectBox: From Centers to Boxes for Anchor-Free Object Detection

Point-to-Box Network for Accurate Object Detection via Single Point Supervision

DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection

SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection

Rethinking IoU-based Optimization for Single-stage 3D Object Detection

Densely Constrained Depth Estimator for Monocular 3D Object Detection

Robust Object Detection With Inaccurate Bounding Boxes

Unsupervised Domain Adaptation for One-stage Object Detector using Offsets to Bounding Box

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Active Learning Strategies for Weakly-supervised Object Detection

W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection

Salient Object Detection for Point Clouds

UC-OWOD: Unknown-Classified Open World Object Detection

Monocular 3D Object Detection with Depth from Motion

目标跟踪/Object Tracking

Tracking Objects as Pixel-wise Distributions

Towards Grand Unification of Object Tracking

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

MOTCOM: The Multi-Object Tracking Dataset Complexity Metric

Robust Landmark-based Stent Tracking in X-ray Fluoroscopy

AiATrack: Attention in Attention for Transformer Visual Tracking

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Tracking Every Thing in the Wild

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

语义分割/Segmentation

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

PseudoClick: Interactive Image Segmentation with Click Imitation

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Tackling Background Distraction in Video Object Segmentation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Box-supervised Instance Segmentation with Level Set Evolution

ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Self-Supervised Interactive Object Segmentation Through a Singulation-and-Grasping Approach

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation

GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation

Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions

In Defense of Online Models for Video Instance Segmentation

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

Long-tailed Instance Segmentation using Gumbel Optimized Loss

Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation

Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation

Self-Support Few-Shot Semantic Segmentation

Active Pointly-Supervised Instance Segmentation

Video Mask Transfiner for High-Quality Video Instance Segmentation

Doubly Deformable Aggregation of Covariance Matrices for Few-shot Segmentation

Per-Clip Video Object Segmentation

Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels

医学图像分割/Medical Image Segmentation

Personalizing Federated Medical Image Segmentation via Local Calibration

Learning Topological Interactions for Multi-Class Medical Image Segmentation

Knowledge Distillation

Knowledge Condensation Distillation

FedX: Unsupervised Federated Learning with Cross Knowledge Distillation

Action Detection

ReAct: Temporal Action Detection with Relational Queries

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Temporal Action Detection with Global Segmentation Mask Learning

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

Action Recognition

Compound Prototype Matching for Few-shot Action Recognition

Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Anomaly Detection

Registration based Few-Shot Anomaly Detection

Look at Adjacent Frames: Video Anomaly Detection without Offline Training

人脸识别/Face Recognition

Controllable and Guided Face Synthesis for Unconstrained Face Recognition

人体姿态估计/Human Pose Estimation

Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation

Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance

Pose for Everything: Towards Category-Agnostic Pose Estimation

C3P: Cross-domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

Neural Correspondence Field for Object Pose Estimation

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation

人脸活体检测/Face Anti-Spoofing

Generative Domain Adaptation for Face Anti-Spoofing

人脸属性识别/Facial Attribute Recognition

FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification

人脸相关 / Face

On Mitigating Hard Clusters for Face Clustering

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Human Reconstruction

3D Clothed Human Reconstruction in the Wild

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

The One Where They Reconstructed 3D Humans and Environments in TV Shows

Relighting

Geometry-aware Single-image Full-body Human Relighting

Relighting4D: Neural Relightable Human from Videos

DeepFake

Detecting and Recovering Sequential DeepFake Manipulation

An Efficient Method for Face Quality Assessment on the Edge

Text Recognition

Scene Text Recognition with Permuted Autoregressive Sequence Models

Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting

Contextual Text Block Detection towards Scene Text Understanding

点云/Point Cloud

Open-world Semantic Segmentation for LIDAR Point Clouds

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

CPO: Change Robust Panorama to Point Cloud Localization

diffConv: Analyzing Irregular Point Clouds with an Irregular View

CATRE: Iterative Point Clouds Alignment for Category-level Object Pose Refinement

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

Dynamic 3D Scene Analysis by Point Cloud Accumulation

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Salient Object Detection for Point Clouds

MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

光流估计/Flow Estimation

Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation

What Matters for 3D Scene Flow Network

Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection Fusion

深度估计/Depth Estimation

Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation

车道线检测/Lane Detection

RCLane: Relay Chain Prediction for Lane Detection

轨迹预测/Trajectory Prediction

Action-based Contrastive Learning for Trajectory Prediction

Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction

Aware of the History: Trajectory Forecasting with the Local Behavior Data

Human Trajectory Prediction via Neural Social Physics

D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights

超分/Super-Resolution

Image Super-Resolution with Deep Dictionary

Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution

Towards Interpretable Video Super-Resolution via Alternating Optimization

Reference-based Image Super-Resolution with Deformable Attention Transformer

图像去噪/Image Denoising

Optimizing Image Compression via Joint Learning with Denoising

图像去模糊/Image Deblurring

Spatio-Temporal Deformable Attention Network for Video Deblurring

Efficient Video Deblurring Guided by Motion Magnitude

图像复原/Image Restoration

D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration

图像增强/Image Enhancement

Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression

检索/Image Retrieval

Feature Representation Learning for Unsupervised Cross-domain Image Retrieval

2D目标检测(2D Object Detection)

[4] Multimodal Object Detection via Probabilistic Ensembling (基于概率集成的多模态目标检测) (Oral)

paper | code

[3] Point-to-Box Network for Accurate Object Detection via Single Point Supervision (通过单点监督实现精确目标检测的点对盒网络)
paper | code

[2] You Should Look at All Objects (您应该查看所有物体)
paper | code

[1] Adversarially-Aware Robust Object Detector (对抗性感知鲁棒目标检测器)(Oral))
paper | code

3D目标检测(3D Object Detection)

[2] Densely Constrained Depth Estimator for Monocular 3D Object Detection (用于单目 3D 目标检测的密集约束深度估计器)
paper | code

[1] Rethinking IoU-based Optimization for Single-stage 3D Object Detection (重新思考基于 IoU 的单阶段 3D 对象检测优化)
paper

人物交互检测(HOI Detection)

[2] Discovering Human-Object Interaction Concepts via Self-Compositional Learning (通过自组合学习发现人-物交互概念)

paper | [code](https://github.com/zhihou7/scl; https://github.com/zhihou7/HOI-CL)

[1] Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection (面向基于 DETR 的人机交互检测的硬性查询挖掘)
paper | code

显著性目标检测(Saliency Object Detection)

[1] KD-SCFNet: Towards More Accurate and Efficient Salient Object Detection via Knowledge Distillation (KD-SCFNet:通过知识蒸馏实现更准确、更高效的显着目标检测)

paper | code

图像异常检测/表面缺陷检测(Anomally Detection in Image)

[2] DSR -- A dual subspace re-projection network for surface anomaly detection (DSR——用于表面异常检测的双子空间重投影网络)

paper | code

[1] DICE: Leveraging Sparsification for Out-of-Distribution Detection (DICE:利用稀疏化进行分布外检测)
paper | code


[3] In Defense of Online Models for Video Instance Segmentation (为视频实例分割的在线模型辩护) (Oral)
paper|code

[2] Box-supervised Instance Segmentation with Level Set Evolution (具有水平集进化的框监督实例分割)
paper

[1] OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers (OSFormer:使用 Transformers 进行单阶段伪装实例分割)
paper | code

语义分割(Semantic Segmentation)

[1] 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds (2DPASS:激光雷达点云上的二维先验辅助语义分割)
paper | code

视频目标分割(Video Object Segmentation)

[1] Learning Quality-aware Dynamic Memory for Video Object Segmentation (视频对象分割的学习质量感知动态内存)
paper | code

超分辨率(Super Resolution)

[3] Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution (学习高效图像超分辨率的串并行查找表)

paper | code

[2] Efficient Meta-Tuning for Content-aware Neural Video Delivery (内容感知神经视频交付的高效元调整)
paper | code

[1] Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks (超低精度超分辨率网络的动态双可训练边界)
paper | code

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[9] Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression (无监督夜间图像增强:当层分解遇到光效抑制时)

paper | code

[8] Bringing Rolling Shutter Images Alive with Dual Reversed Distortion(通过双重反转失真使滚动快门图像重现) (Oral)
paper | code

[7] Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression (无监督夜间图像增强:当层分解遇到光效抑制时)
paper | code

[6] Semantic-Sparse Colorization Network for Deep Exemplar-based Colorization (用于基于深度示例的着色的语义稀疏着色网络)
paper

[5] Geometry-aware Single-image Full-body Human Relighting (几何感知单图像全身人体重新照明)
paper

[4] Multi-Modal Masked Pre-Training for Monocular Panoramic Depth Completion (单目全景深度补全的多模态蒙面预训练)
paper

[3] PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation (PanoFormer:用于室内 360 深度估计的全景变压器)
paper

[2] SESS: Saliency Enhancing with Scaling and Sliding (SESS:通过缩放和滑动增强显着性)
paper

[1] RigNet: Repetitive Image Guided Network for Depth Completion (RigNet:用于深度补全的重复图像引导网络)
paper


[1] Deep Portrait Delighting (深度人像去光)

paper


[3] Perceiving and Modeling Density is All You Need for Image Dehazing (感知和建模密度是图像去雾所需的全部) (Oral)
paper |code

[2] Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance (来自模糊的动画:具有运动引导的多模态模糊分解)
paper | code

[1] Deep Semantic Statistics Matching (D2SM) Denoising Network (深度语义统计匹配(D2SM)去噪网络)
paper


[1] Outpainting by Queries (通过查询进行外推)
paper | code


[1] CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer (CCPL:通用风格迁移的对比相干性保留损失) (Oral)
paper | code



视频编辑(Video Editing)

[3] AlphaVC: High-Performance and Efficient Learned Video Compression (AlphaVC:高性能和高效的学习视频压缩)

paper


[2] Improving the Perceptual Quality of 2D Animation Interpolation (提高二维动画插值的感知质量)
paper | code

[1] Real-Time Intermediate Flow Estimation for Video Frame Interpolation(视频帧插值的实时中间流估计)
paper | code


[1] Error Compensation Framework for Flow-Guided Video Inpainting (流引导视频修复的误差补偿框架)
paper


[2] Event-guided Deblurring of Unknown Exposure Time Videos (未知曝光时间视频的事件引导去模糊) (Oral)

paper

[1] Efficient Video Deblurring Guided by Motion Magnitude (由运动幅度引导的高效视频去模糊)

paper | code



行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)

[4] GaitEdge: Beyond Plain End-to-end Gait Recognition for Better Practicality (GaitEdge:超越普通的端到端步态识别,提高实用性)
paper | code

[3] Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition (用于跨域 3D 动作识别的协作域共享和特定于目标的特征聚类)
paper | code

[2] ReAct: Temporal Action Detection with Relational Queries (ReAct:使用关系查询的时间动作检测)
paper | code

[1] Hunting Group Clues with Transformers for Social Group Activity Recognition (用Transformers寻找群体线索用于社会群体活动识别)
paper


[1] PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification(PASS:用于人员重新识别的部分感知自我监督预训练)
paper | code

视频理解(Video Understanding)

[1] GraphVid: It Only Takes a Few Nodes to Understand a Video (GraphVid:只需几个节点即可理解视频) (Oral)
paper


[6] Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding (打乱的视频是否有益于时间偏差问题:一种新的时间接地训练框架)

paper |code

[5] Feature Representation Learning for Unsupervised Cross-domain Image Retrieval (无监督跨域图像检索的特征表示学习)
paper | code

[4] LocVTP: Video-Text Pre-training for Temporal Localization (LocVTP:时间定位的视频文本预训练)
paper | code

[3] Deep Hash Distillation for Image Retrieval (用于图像检索的深度哈希蒸馏)
paper | code

[2] TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval (TS2-Net:用于文本视频检索的令牌移位和选择转换器)
paper | code

[1] Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval (轻量级注意力特征融合:文本到视频检索的新基线)
paper



光流/运动估计(Flow/Motion Estimation)

[1] Deep 360∘ Optical Flow Estimation Based on Multi-Projection Fusion (基于多投影融合的深度360∘光流估计)

paper


[4] Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction (被忽视的姿势实际上是有意义的:为人体运动预测提炼特权知识)

paper

[3] 3D Interacting Hand Pose Estimation by Hand De-occlusion and Removal (通过手部去遮挡和移除的 3D 交互手部姿势估计)

paper | code

[2] Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration (基于隐式空间校准的 Transformer 的弱监督目标定位)
[paper] (https://arxiv.org/abs/2207.10447) | code

[1] Category-Level 6D Object Pose and Size Estimation using Self-Supervised Deep Prior Deformation Networks (使用自监督深度先验变形网络的类别级 6D 对象姿势和大小估计)
paper | code


[1] Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches ((使用最优对抗补丁对单目深度估计进行物理攻击))
paper



人脸识别/检测(Facial Recognition/Detection)

[1] Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation (通过场景消歧实现种族无偏肤色估计)

paper | code


[1] MoFaNeRF: Morphable Facial Neural Radiance Field (MoFaNeRF:可变形面部神经辐射场)

paper |code

三维重建(3D Reconstruction)

[1] DiffuStereo: High Quality Human Reconstruction via Diffusion-based Stereo Using Sparse Cameras (DiffuStereo:使用稀疏相机通过基于扩散的立体进行高质量人体重建)
paper


[1] Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields (Sem2NeRF:将单视图语义掩码转换为神经辐射场)
paper | code



文本检测/识别/理解(Text Detection/Recognition/Understanding)

[5] Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition (了解艺术字:用于场景文本识别的角引导转换器) (Oral)

paper | code

[4] Contextual Text Block Detection towards Scene Text Understanding (面向场景文本理解的上下文文本块检测)

paper

[3] PromptDet: Towards Open-vocabulary Detection using Uncurated Images (PromptDet:使用未经处理的图像进行开放词汇检测)
paper |code

[2] End-to-End Video Text Spotting with Transformer (使用 Transformer 的端到端视频文本定位) (Oral)
paper | code

[1] Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting (用于经济高效的端到端文本定位的动态低分辨率蒸馏)
paper | code

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[7] Learning Energy-Based Models With Adversarial Training (通过对抗训练学习基于能量的模型)

paper | code

[6] Adaptive Image Transformations for Transfer-based Adversarial Attack (基于传输的对抗性攻击的自适应图像转换)
paper

[5] Generative Multiplane Images: Making a 2D GAN 3D-Aware (生成多平面图像:让一个2D GAN变得3D感知)
paper | code

[4] Eliminating Gradient Conflict in Reference-based Line-Art Colorization (消除基于参考的艺术线条着色中的梯度冲突)
paper | code

[3] WaveGAN: Frequency-aware GAN for High-Fidelity Few-shot Image Generation (WaveGAN:用于高保真少镜头图像生成的频率感知 GAN)
paper | code

[2] FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs (FakeCLR:探索对比学习以解决数据高效 GAN 中的潜在不连续性)
paper | code

[1] UniCR: Universally Approximated Certified Robustness via Randomized Smoothing (UniCR:通过随机平滑获得普遍近似的认证鲁棒性)
paper

图像生成/图像合成(Image Generation/Image Synthesis)

[1] PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation (PixelFolder:用于图像生成的高效渐进式像素合成网络)

paper | code

视觉预测(Vision-based Prediction)

[1] D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights (D2-TPred:交通灯下轨迹预测的不连续依赖)
paper | code



Transformer

[5] Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding (用于长期 4D 点云视频理解的 Point Primitive Transformer)

paper

[4] Improving Vision Transformers by Revisiting High-frequency Components (通过重新审视高频组件来改进视觉变压器)

paper | code

[3] Transformer with Implicit Edges for Particle-based Physics Simulation (用于基于粒子的物理模拟的隐式边缘变压器)

paper | code

[2] ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer (ScalableViT:重新思考 Vision Transformer 面向上下文的泛化)
paper | code

[1] Visual Prompt Tuning (视觉提示调整)
paper | code

神经网络架构搜索(NAS)

[3] ScaleNet: Searching for the Model to Scale (ScaleNet:搜索要扩展的模型)
paper | code

[2] Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning (集成知识引导的子网络搜索和过滤器修剪微调)
paper | code

[1] EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs (EAGAN:GAN 的高效两阶段进化架构搜索)
paper | code

归一化/正则化(Batch Normalization)

[1] Fine-grained Data Distribution Alignment for Post-Training Quantization (训练后量化的细粒度数据分布对齐) (Oral)
paper | code

22. 图像特征提取与匹配(Image feature extraction and matching)

[1] Unsupervised Deep Multi-Shape Matching (无监督深度多形状匹配)
paper

噪声标签(Noisy Label)

[1] Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection (通过有效的转移矩阵估计学习噪声标签以对抗标签错误校正)
paper

长尾分布(Long-Tailed Distribution)

[2] Long-tailed Instance Segmentation using Gumbel Optimized Loss (使用 Gumbel 优化损失的长尾实例分割)

paper | code

[1] Identifying Hard Noise in Long-Tailed Sample Distribution (识别长尾样本分布中的硬噪声) (Oral)

paper|code



知识蒸馏(Knowledge Distillation)

[3] Prune Your Model Before Distill It (在蒸馏之前修剪你的模型)

paper|code

[2] Efficient One Pass Self-distillation with Zipf's Label Smoothing (使用 Zipf 的标签平滑实现高效的单程自蒸馏)

paper | code

[1] Knowledge Condensation Distillation (知识浓缩蒸馏)
paper | code

半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

[8] Acknowledging the Unknown for Multi-label Learning with Single Positive Labels (用单个正标签承认未知的多标签学习)

paper | code

[7] W2N:Switching From Weak Supervision to Noisy Supervision for Object Detection (W2N:目标检测从弱监督切换到嘈杂监督)

paper | code

[6] CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation (CA-SSL:用于检测和分割的与类别无关的半监督学习)
paper | code

[5] FedX: Unsupervised Federated Learning with Cross Knowledge Distillation (FedX:具有交叉知识蒸馏的无监督联合学习)
paper

[4] Synergistic Self-supervised and Quantization Learning (协同自监督和量化学习)
paper | code

[3] Contrastive Deep Supervision (对比深度监督)
paper | code

[2] Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection (稠密教师:用于半监督目标检测的稠密伪标签)
paper

[1] Image Coding for Machines with Omnipotent Feature Learning (具有全能特征学习的机器的图像编码)
paper

视觉-语言(Vision-language)

[2] Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting (语言问题:用于场景文本检测和识别的弱监督视觉语言预训练方法) (Oral)

paper

[1] Contrastive Vision-Language Pre-training with Limited Resources (资源有限的对比视觉语言预训练)
paper | code

其他

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

A Generic Visualization Approach for Convolutional Neural Networks

Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches

GIQA: Generated Image Quality Assessment

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

AiR: Attention with Reasoning Capability

Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

GraphVid: It Only Takes a Few Nodes to Understand a Video

Target-absent Human Attention

Lottery Ticket Hypothesis for Spiking Neural Networks

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture

DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images

Learning Local Implicit Fourier Representation for Image Warping

SESS: Saliency Enhancing with Scaling and Sliding

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

DenseHybrid: Hybrid Anomaly Detection for Dense Open-set Recognition

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

Towards Realistic Semi-Supervised Learning

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning

Factorizing Knowledge in Neural Networks

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

Video Dialog as Conversation about Objects Living in Space-Time

Demystifying Unsupervised Semantic Correspondence Estimation

A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision

DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization

Batch-efficient EigenDecomposition for Small and Medium Matrices

Few 'Zero Level Set'-Shot Learning of Shape Signed Distance Functions in Feature Space

Camera Pose Auto-Encoders for Improving Pose Regression

Synergistic Self-supervised and Quantization Learning

Frequency Domain Model Augmentation for Adversarial Attack

Organic Priors in Non-Rigid Structure from Motion

Unsupervised Visual Representation Learning by Synchronous Momentum Grouping

Learning Implicit Templates for Point-Based Clothed Human Modeling

BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks

Lipschitz Continuity Retained Binary Neural Network

3D Instances as 1D Kernels

ScaleNet: Searching for the Model to Scale

Rethinking Data Augmentation for Robust Visual Question Answering

Semantic Novelty Detection via Relational Reasoning

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes

Class-incremental Novel Class Discovery

MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects

SepLUT: Separable Image-adaptive Lookup Tables for Real-time Image Enhancement

Learning with Recoverable Forgetting

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Neural Color Operators for Sequential Image Retouching

Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

You Should Look at All Objects

NeFSAC: Neurally Filtered Minimal Samples

CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS

Cross-Domain Cross-Set Few-Shot Learning via Learning Compact and Aligned Representations

Self-calibrating Photometric Stereo by Neural Inverse Rendering

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

Towards Understanding The Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos

Deep Semantic Statistics Matching (D2SM) Denoising Network

3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform

NDF: Neural Deformable Fields for Dynamic Human Modelling

Self-Supervision Can Be a Good Few-Shot Learner

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data

Prior-Guided Adversarial Initialization for Fast Adversarial Training

Prior Knowledge Guided Unsupervised Domain Adaptation

Discover and Mitigate Unknown Biases with Debiasing Alternate Networks

Difficulty-Aware Simulator for Open Set Recognition

Tailoring Self-Supervision for Supervised Learning

Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain

Temporal and cross-modal attention for audio-visual zero-shot learning

Telepresence Video Quality Assessment

Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing

Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification

Discrete-Constrained Regression for Local Counting Models

Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction

Efficient Meta-Tuning for Content-aware Neural Video Delivery

Object-Compositional Neural Implicit Surfaces

Explaining Deepfake Detection by Analysing Image Matching

ERA: Expert Retrieval and Assembly for Early Action Prediction

Perspective Phase Angle Model for Polarimetric 3D Reconstruction

Explicit Image Caption Editing

Unsupervised Deep Multi-Shape Matching

Contributions of Shape, Texture, and Color in Visual Recognition

Novel Class Discovery without Forgetting

Approximate Differentiable Rendering with Algebraic Surfaces

FADE: Fusing the Assets of Decoder and Encoder for Task-Agnostic Upsampling

Error Compensation Framework for Flow-Guided Video Inpainting

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Temporal Saliency Query Network for Efficient Video Recognition

UFO: Unified Feature Optimization

OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Towards Accurate Open-Set Recognition via Background-Class Regularization

Grounding Visual Representations with Texts for Domain Generalization

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

On Label Granularity and Object Localization

Spotting Temporally Precise, Fine-Grained Events in Video

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning

Visual Knowledge Tracing

Tackling Long-Tailed Category Distribution Under Domain Shifts

Latent Discriminant deterministic Uncertainty

Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance

Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

Structural Causal 3D Reconstruction

AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

Continual Variational Autoencoder Learning via Online Cooperative Memorization

Panoptic Scene Graph Generation

Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay

POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion

Few-shot Object Counting and Detection

Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection

My View is the Best View: Procedure Learning from Egocentric Videos

Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation

MeshLoc: Mesh-Based Visual Localization

MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation

Deforming Radiance Fields with Cages

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data

Black-box Few-shot Knowledge Distillation

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

NeuMesh: Learning Disentangled Neural Mesh-based Implicit Field for Geometry and Texture Editing

Domain Adaptive Person Search

VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

Label-Guided Auxiliary Training Improves 3D Object Detector

Combining Internal and External Constraints for Unrolling Shutter in Videos

TIPS: Text-Induced Pose Synthesis

Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes

Learning Graph Neural Networks for Image Style Transfer

Contrastive Monotonic Pixel-Level Modulation

CompNVS: Novel View Synthesis with Scene Completion

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

3D Shape Sequence of Human Comparison and Classification using Current and Varifolds

NewsStories: Illustrating articles with visual summaries

Efficient One Pass Self-distillation with Zipf's Label Smoothing

AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction

Static and Dynamic Concepts for Self-supervised Video Representation Learning

Learning Hierarchy Aware Features for Reducing Mistake Severity

Translating a Visual LEGO Manual to a Machine-Executable Plan

Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

Trainability Preserving Neural Structured Pruning

Shift-tolerant Perceptual Similarity Metric

Abstracting Sketches through Simple Primitives

AutoTransition: Learning to Recommend Video Transition Effects

Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips

Identifying Hard Noise in Long-Tailed Sample Distribution

One-Trimap Video Matting

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation

LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

Initialization and Alignment for Adversarial Texture Optimization

Depth Field Networks for Generalizable Multi-view Scene Representation

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images

Break and Make: Interactive Structural Understanding Using LEGO Bricks

A Repulsive Force Unit for Garment Collision Handling in Neural Networks

Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

AlphaVC: High-Performance and Efficient Learned Video Compression

WISE: Whitebox Image Stylization by Example-based Learning

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Video Question Answering with Iterative Video-Text Co-Tokenization

S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning

Skeleton-free Pose Transfer for Stylized 3D Characters

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-Boosting Attention Mechanism

SdAE: Self-distillated Masked Autoencoder

Out-of-Distribution Detection with Semantic Mismatch under Masking

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning

Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network

Few-Shot Class-Incremental Learning from an Open-Set Perspective

DAS: Densely-Anchored Sampling for Deep Metric Learning

Fast Two-step Blind Optical Aberration Correction

Negative Frames Matter in Egocentric Visual Query 2D Localization

来源:

https://github.com/DWCTOD/ECCV2022-Papers-with-Code-Demo

https://github.com/extreme-assistant/ECCV2022-Paper-Code-Interpretation