DSV2 is a lossy video codec using wavelets and block-based motion compensation. It is designed for medium-low to medium-high bitrates. Comparable to MPEG-4 Part 2 and Part 10 (H.264) (P frames only) in terms of efficiency and quality.
minih264:
minih264_tt.mp4
DSV2:
dsv2_212315.mp4
Note PDFs are likely to be out of date. They reflect the codec as it was when it was first uploaded here.
- compression using multiresolution subband analysis instead of DCT
- also known as a wavelet transform
- up to quarter-pixel motion compensation
- 4:1:1, 4:2:0, 4:2:2 (+ UYVY) and 4:4:4 chroma subsampling formats
- adaptive quantization
- in-loop filtering
- intra and inter frames with variable length closed GOP
- no bidirectional prediction (also known as B-frames). Only forward prediction with previous picture as reference
- only form of entropy coding is interleaved exponential-golomb coding for simplicity
- in-loop filtering after motion compensation
- more adaptive quantization potential
- skip blocks for temporal stability
- new subband filters + support for adaptive subband filtering
- better motion compensation through Expanded Prediction Range Mode (EPRM)
- quarter pixel compensation
- psychovisual optimizations in the codec and encoder design
--- for more detailed information please refer to the informal specification document (DSV2_spec.pdf) in the repository.
- single pass average bitrate (ABR) or constant rate factor (CRF) rate control
- more advanced Human Visual System (HVS) based intra block mode determination
- new Human Visual System (HVS) based intra frame adaptive quantization
- more complex scene change detection
- hierarchical motion estimation
- better temporal adaptive quantization
- written to be compatible with C89
--- for more detailed information please refer to the encoder information document (DSV2_encoder.pdf) in the repository.
- no built-in interlacing support
- only 8 bits of depth per component supported
- frame sizes must be divisible by two
This code follows my self-imposed restrictions:
- Everything must be done in software, no explicit usage of hardware acceleration.
- No floating point types or literals, everything must be integer only.
- No 3rd party libraries, only C standard library and OS libraries for window, input, etc.
- No languages used besides C.
- No compiler specific features and no SIMD.
- Single threaded.
All you need is a C compiler.
In the root directory of the project (with all the .h and .c files):
cc -O3 -o dsv2 *.c
If you have Zig installed, you can use the build.zig file to compile the project. Building requires Zig version ≥0.13.0
.
-
Ensure you have Zig & git installed
-
Clone this repo & enter the cloned directory:
git clone https://github.com/LMP88959/Digital-Subband-Video-2
cd Digital-Subband-Video-2
- Build the binary with Zig:
zig build
Note: If you'd like to specify a different build target from your host OS/architecture, simply supply the target flag. Example:
zig build -Dtarget=x86_64-linux-gnu
- Find the build binary in
zig-out/bin
. You can install it like so:
sudo cp zig-out/bin/dsv2 /usr/local/bin
Now, you should be all set to use the compiled dsv2
binary.
Sample output:
Envel Graphics DSV v2.0 compliant codec by EMMIR 2024
usage: ./dsv2 e [options]
sample usage: ./dsv2 e -inp=video.yuv -out=compressed.dsv -w=352 -h=288 -fps_num=24 -fps_den=1 -qp=85 -gop=15
------------------------------------------------------------
-qp : quality percent. If -1 and ABR mode, estimate good qp for desired bitrate. If -1 and CRF mode, default to 85. -1 = default
[min = -1, max = 100]
-effort : encoder effort. 0 = least effort, 10 = most effort. higher value -> better video, slower encoding. default = 10
[min = 0, max = 10]
-w : width of input video. 352 = default
[min = 16, max = 16777216]
-h : height of input video. 288 = default
[min = 16, max = 16777216]
-gop : Group Of Pictures length. 0 = intra frames only, 12 = default
[min = 0, max = 2147483647]
-fmt : chroma subsampling format of input video. 0 = 4:4:4, 1 = 4:2:2, 2 = 4:2:0, 3 = 4:1:1, 4 = 4:2:2 UYVY, 2 = default
[min = 0, max = 4]
-nfr : number of frames to compress. -1 means as many as possible. -1 = default
[min = -1, max = 2147483647]
-sfr : frame number to start compressing at. 0 = default
[min = 0, max = 2147483647]
-fps_num : fps numerator of input video. 30 = default
[min = 1, max = 16777216]
-fps_den : fps denominator of input video. 1 = default
[min = 1, max = 16777216]
-aspect_num : aspect ratio numerator of input video. 1 = default
[min = 1, max = 16777216]
-aspect_den : aspect ratio denominator of input video. 1 = default
[min = 1, max = 16777216]
-ipct : percentage threshold of intra blocks in an inter frame after which it is simply made into an intra frame. 50 = default
[min = 0, max = 100]
-pyrlevels : number of pyramid levels to use in hierarchical motion estimation. 0 means auto-determine. 0 = default
[min = 0, max = 5]
-rc_mode : rate control mode. 0 = single pass average bitrate (ABR), 1 = constant rate factor (CRF). 0 = default
[min = 1, max = 0]
-kbps : ONLY FOR ABR RATE CONTROL: bitrate in kilobits per second. 0 = auto-estimate needed bitrate for desired qp. 0 = default
[min = 0, max = 2147483647]
-maxqstep : max quality step for ABR, absolute quant amount in range [1, 400]. 1 = default (0.25%)
[min = 1, max = 400]
-minqp : minimum quality. 0 = default
[min = 0, max = 100]
-maxqp : maximum quality. 100 = default
[min = 0, max = 100]
-iminqp : minimum quality for intra frames. 20 = default
[min = 0, max = 100]
-stabref : period (in # of frames) to refresh the stability block tracking. 0 = auto-determine. 0 = default
[min = 0, max = 2147483647]
-scd : do scene change detection. 1 = default
[min = 0, max = 1]
-tempaq : do temporal adaptive quantization. If disabled, spatial methods will be used instead. 1 = default
[min = 0, max = 1]
-bszx : override block sizes in the x (horizontal) direction. -1 = auto-determine. -1 = default. 0 = 16, 1 = 32
[min = -1, max = 1]
-bszy : override block sizes in the y (vertical) direction. -1 = auto-determine. -1 = default. 0 = 16, 1 = 32
[min = -1, max = 1]
-scpct : scene change percentage. 55 = default
[min = 0, max = 100]
-skipthresh : skip block threshold. -1 = disable. 0 = default, larger value means more likely to mark a block as skipped.
[min = -1, max = 2147483647]
-varint : intra frames that are created outside of the normal GOP cycle reset the GOP cycle if 1. 1 = default
[min = 0, max = 1]
-psy : enable/disable psychovisual optimizations. 1 = default
[min = 0, max = 1]
-y4m : set to 1 if input is in Y4M format, 0 if raw YUV. 0 = default
[min = 0, max = 1]
-inp= : input file. NOTE: if not specified, defaults to stdin
-out= : output file. NOTE: if not specified, defaults to stdout
-y : do not prompt for confirmation when potentially overwriting an existing file
-l<n> : set logging level to n (0 = none, 1 = error, 2 = warning, 3 = info, 4 = debug/all)
-v : set verbose
Sample output:
Envel Graphics DSV v2.0 compliant codec by EMMIR 2024
usage: ./dsv2 d [options]
sample usage: ./dsv2 d -inp=video.dsv -out=decompressed.yuv -out420p=1
------------------------------------------------------------
-out420p : convert video to 4:2:0 chroma subsampling before saving output. 0 = default
[min = 0, max = 1]
-drawinfo : draw debugging information on the decoded frames (bit OR together to get multiple at the same time):
1 = draw stability info
2 = draw motion vectors
4 = draw intra subblocks. 0 = default
[min = 0, max = 7]
-inp= : input file. NOTE: if not specified, defaults to stdin
-out= : output file. NOTE: if not specified, defaults to stdout
-y : do not prompt for confirmation when potentially overwriting an existing file
-l<n> : set logging level to n (0 = none, 1 = error, 2 = warning, 3 = info, 4 = debug/all)
-v : set verbose
NOTE: if -inp= and -out= are not specified, it will default to standard in / out (stdin/stdout). Only .yuv (one file containing all the frames) and .y4m files are supported as inputs to the encoder.
This codec is by no means fully optimized, so there is a lot of room for performance gains. It performs quite well for what it is though.
If you have any questions feel free to leave a comment on YouTube OR join the King's Crook Discord server :)
YouTube: https://www.youtube.com/@EMMIR_KC/videos
Discord: https://discord.gg/hdYctSmyQJ
itch.io: https://kingscrook.itch.io/kings-crook
All videos are encoded at 30fps with a GOP length of 12. The H.264 file sizes were within a few kilobytes of their respective DSV2 file size. H.264 examples were encoded using https://github.com/lieff/minih264 using -speed0 (best quality) DSV2 examples were encoded with -effort=10 (best quality)
minih264:
minih264_mob.mp4
DSV2:
dsv2_944306.mp4
minih264:
minih264_bus.mp4
DSV2: