Video

Text to video generation

ModelScope Text to video synthesis
- zeroscope v2 xl Watermark free modelscope based video model generating high quality video at 1024x576 16:9, to be used with text2video extension for automatic1111
Nvidia VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Potat1 , colab
Phenaki multi minute text to video prompts with scene changes, project page
StableVideo Text-driven Consistency-aware Diffusion Video Editing, code, paper
Rerender A Video Zero-Shot Text-Guided Video-to-Video Translation, paper
VideoCrafter1 Open Diffusion Models for High-Quality Video Generation
i2vgen-xl a holistic video generation ecosystem for video generation building on diffusion models
pixeldance High-Dynamic Video Generation
Open-Sora-Plan aims to reproduce Sora
StoryDiffusion Consistent Long-Range Image and Video Generation
Open-Sora Open implementation approach for video generation
CogVideo SOTA video generation and consistency generating 6 seconds of video with 8fps at 720x480 using 18-36GB vRAM
Pyramid-Flow is a highly efficient autoregressive video generation method that leverages flow matching for improved computational efficiency, capable of generating high-quality 10-second videos at 768p resolution and 24 FPS, and supporting image-to-video generation.
HunyuanVideo Tencent's open-weight video-generation model

Frame Interpolation (Temporal Interpolation)

https://github.com/google-research/frame-interpolation
https://github.com/ltkong218/ifrnet
https://github.com/megvii-research/ECCV2022-RIFE

Segmentation & Tracking

Segment and Track Anything, code. an innovative framework combining the Segment Anything Model (SAM) and DeAOT tracking model, enables precise, multimodal object tracking in video, demonstrating superior performance in benchmarks
Track Anything, code. extends the Segment Anything Model (SAM) to achieve high-performance, interactive tracking and segmentation in videos with minimal human intervention, addressing SAM's limitations in consistent video segmentation
MAGVIT Single model for multiple video synthesis outperforming existing methods in quality and inference time, code and models, paper
FastSAM Fast Segment Anything, a CNN trained achieving a comparable performance with the SAM method at 50× higher run-time speed.
SAM-PT Extending SAM to zero-shot video segmentation with point-based tracking, paper
DEVA Tracking Anything with Decoupled Video Segmentation, paper
Cutie Putting the Object Back into Video Object Segmentation, paper
YOLOv10 Real-Time End-to-End Object Detection
SAM2 enables fast, precise selection of any object in any video or image

Super Resolution (Spacial Interpolation)

https://github.com/researchmm/FTVSR
https://github.com/picsart-ai-research/videoinr-continuous-space-time-super-resolution

Spacio Temporal Interpolation

https://github.com/llmpass/RSTT

NeRF

Instant-ngp Train NeRFs in under 5 seconds on windows/linux with support for GPUs
NeRFstudio A Collaboration Friendly Studio for NeRFs simplifying the process of creating, training, and testing NeRFs and supports web-based visualizer, benchmarks, and pipeline support.
Threestudio A Framework for 3D Content Creation from Text Prompts, Single Images, and Few-Shot Images or text2image created single image to 3D
Zero-1-to-3 Zero-shot One Image to 3D Object for novel view synthesis and 3D reconstruction
localrf NeRFs for reconstructing large-scale stabilized scenes from shakey videos, paper, project page
gaussian-splatting reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering", paper
4d-gaussian-splatting Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting, paper

Deepfakes

roop one-click deepfake (face swap)
- rope GUI-focused roop
streamv2v Official Pytorch implementation of StreamV2V
MusePose Pose Driven Image 2 Video framework to generate Virtual Humans
V-Express generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images
Deep-Live-Cam real time face swap and one-click video deepfake with only a single image

Benchmarking

MSU Benchmarks collection of video processing benchmarks developed by the Video Processing Group at the Moscow State University
Video Super Resolution Benchmarks
Video Generation Benchmarks
Video Frame Interpolation Benchmarks

Inpainting Outpainting

ProPainter Improving Propagation and Transformer for Video Inpainting, paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video-ai.md

video-ai.md

Video

Text to video generation

Frame Interpolation (Temporal Interpolation)

Segmentation & Tracking

Super Resolution (Spacial Interpolation)

Spacio Temporal Interpolation

NeRF

Deepfakes

Benchmarking

Inpainting Outpainting

Files

video-ai.md

Latest commit

History

video-ai.md

File metadata and controls

Video

Text to video generation

Frame Interpolation (Temporal Interpolation)

Segmentation & Tracking

Super Resolution (Spacial Interpolation)

Spacio Temporal Interpolation

NeRF

Deepfakes

Benchmarking

Inpainting Outpainting