added documentation for LDM + dictionary compatibility #3553

Cyan4973 · 2023-03-15T01:11:34Z

As mentioned by @reyqn in #2835 (comment) , the Long Distance Mode works best with dictionaries loaded with ZSTD_CCtx_refPrefix().
LDM is effectively incompatible with ZSTD_CDict, and by extension with ZSTD_CCtx_loadDictionary(), so results are disappointing when trying to combine them.

Added documentation to nudge users towards ZSTD_CCtx_refPrefix() when they want to use a dictionary as large "reference image" which requires LDM for proper indexing.

ghost · 2023-03-18T04:44:49Z

May I ask, why prefix is only used once?
If there is an API that allows cctx to use a prefix infinitely, what negative effects does it have?

From ZSTD_CCtx_refPrefix() doc:

Reference a prefix (single-usage dictionary) for next compressed frame.

A prefix is only used once. Tables are discarded at end of frame (ZSTD_e_end).

Cyan4973 · 2023-03-18T07:53:12Z

A prefix must be loaded into the match finder tables.
The match finder tables are then mutated during the rest of the compression process,
so the initial "state" of the match finder tables, after loading the prefix, is effectively lost.

Therefore, using a prefix a second time requires loading its content into the match finder tables again. This is a non-trivial cost.

This situation is in contrast with CDict, which are created once, and are then immutable,
allowing their usage by any number of CCtx afterwards,
without any initialization cost.

Employing a prefix rather than a full-feature CDict makes sense when it's only going to be used once.

ghost · 2023-03-24T03:15:48Z

Thanks for your explaination.

It seems this sentence is a bit misleading, may be "regarded as a prefix" rather than "called a prefix".

zstd/lib/zstd.h

Line 904 in 3e0550e

* A dictionary can be any arbitrary data segment (also called a prefix),

I have another question, if a prefix is loaded as ZSTD_dct_fullDict, how is it different from a dictionary?

zstd/lib/zstd.h

Lines 1895 to 1898 in 3e0550e

    
           /*! ZSTD_CCtx_refPrefix_advanced() : 
        
            *  Same as ZSTD_CCtx_refPrefix(), but gives finer control over 
        
            *  how to interpret prefix content (automatic ? force raw mode (default) ? full mode only ?) */ 
        
           ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advanced(ZSTD_CCtx* cctx, const void* prefix, size_t prefixSize, ZSTD_dictContentType_e dictContentType);

Cyan4973 · 2023-03-24T04:30:49Z

I have another question, if a prefix is loaded as ZSTD_dct_fullDict, how is it different from a dictionary?

It will still be loaded directly into the match search tables, and therefore the initial state will be lost during the compression process. So it's only suitable when used once.

The point of ZSTD_dct_fullDict is to explicitly tell the compressor to expect only a well-formed trained dictionary, with a conformant header. If not, it will error out.

The difference with a default auto mode is that, in auto mode, if the dictionary doesn't contain a well-formed header, it default to raw mode, where the entire input is considered as "raw content", and no statistics are present (and no dictionary ID is present). This could be a missed opportunity to detect an error scenario earlier.

added documentation for LDM + dictionary compatibility

f4563d8

facebook-github-bot added the CLA Signed label Mar 15, 2023

Cyan4973 self-assigned this Mar 15, 2023

terrelln approved these changes Mar 16, 2023

View reviewed changes

Cyan4973 merged commit e220824 into dev Mar 16, 2023

Cyan4973 deleted the ldm_dict branch March 29, 2023 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added documentation for LDM + dictionary compatibility #3553

added documentation for LDM + dictionary compatibility #3553

Cyan4973 commented Mar 15, 2023 •

edited

Loading

ghost commented Mar 18, 2023 •

edited by ghost

Loading

Cyan4973 commented Mar 18, 2023 •

edited

Loading

ghost commented Mar 24, 2023

Cyan4973 commented Mar 24, 2023

added documentation for LDM + dictionary compatibility #3553

added documentation for LDM + dictionary compatibility #3553

Conversation

Cyan4973 commented Mar 15, 2023 • edited Loading

ghost commented Mar 18, 2023 • edited by ghost Loading

Cyan4973 commented Mar 18, 2023 • edited Loading

ghost commented Mar 24, 2023

Cyan4973 commented Mar 24, 2023

Cyan4973 commented Mar 15, 2023 •

edited

Loading

ghost commented Mar 18, 2023 •

edited by ghost

Loading

Cyan4973 commented Mar 18, 2023 •

edited

Loading