You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IPFS has the ability to dedup blocks between different types of files. This functionality is based on a rolling hash algorithm.
You can either select rabin or buzzhash for this task (in IPFS). Rabin is kind of slow, but buzzhash is quite fast.
The rolling hash would allow to 'prescan' both files, get some cut marks and run some fast cryptographic hash algorithm over the chunks, like blake2b.
I think both operations are much cheaper than pattern matching. This way you can skip all pattern matching attempts which are on both sides (A and B) inside the known equal blocks.
The first layer of patching would just generate a lengths+offset+move triple, which can copy the blocks from the original file into a sparse file as first patching operation.
The pattern matching rules could be used on top of that, completing the gaps of the output file.
However, we also use our regular match finders, so end up inserting the whole file anyway.
One idea would be to only insert the "end" of the dictionary into our normal match finders. Basically only the portion that we expect would be reasonably indexed by our normal hash tables (e.g. like 4 * (1 << max(hashLog, chainLog))). Then let our LDM mode handle the rest.
IPFS has the ability to dedup blocks between different types of files. This functionality is based on a rolling hash algorithm.
You can either select rabin or buzzhash for this task (in IPFS). Rabin is kind of slow, but buzzhash is quite fast.
The rolling hash would allow to 'prescan' both files, get some cut marks and run some fast cryptographic hash algorithm over the chunks, like blake2b.
I think both operations are much cheaper than pattern matching. This way you can skip all pattern matching attempts which are on both sides (A and B) inside the known equal blocks.
The first layer of patching would just generate a lengths+offset+move triple, which can copy the blocks from the original file into a sparse file as first patching operation.
The pattern matching rules could be used on top of that, completing the gaps of the output file.
Originally posted by @RubenKelevra in #2063 (comment)
The text was updated successfully, but these errors were encountered: