Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contract Spec: Experiment with compressed or optimized form of XDR #415

Closed
leighmcculloch opened this issue Aug 11, 2022 · 5 comments
Closed
Assignees

Comments

@leighmcculloch
Copy link
Member

leighmcculloch commented Aug 11, 2022

We use XDR for the contract spec, but it has the downsides that all XDR has, lots of unused space due to small integers, small valued enums, and padding.

Take the following contract spec as an example:

[0, 0, 0, 0, 0, 0, 0, 3, 97, 100, 100, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2]

Horizon uses a simplistic compressed form of XDR for reducing significant memory usage. It reduces some values such as integers, enums, etc to 1 byte instead of 4 on the assumption they will never escape the limits of 1 byte. The above example employing the same or similar approach would become:

[3, 97, 100, 100, 2, 2, 2, 1, 2]

There are other schemes for how to variably encode integers that would likely be better since they would allow us to make less assumptions about the contents and simply encode them using an alternative scheme.

This has huge drawbacks in that we'd be generating non-standard XDR, or rather we'd be generating something that isn't XDR at all, but the benefits probably outweigh the downside of that.

An alternative to this is for us to find an entirely different format for the contract spec that is optimized for size better than XDR is, although that would likely require us to add an entirely new toolset, rather than modifying our existing toolset. Modifying our existing XDR toolset with our own extension would probably make it easier to apply the same pattern to Stellar XDR elsewhere as well.

@jonjove
Copy link
Contributor

jonjove commented Aug 11, 2022

An alternative would be to compress the XDR with a known and relatively simple compression algorithm like LZ77. We've also discussed compressing WASM blobs (eg. when creating a contract) on the wire with LZ77, although I can't find the discussion of that right now. I'm going to open an issue to track that too, and reference this.

@leighmcculloch
Copy link
Member Author

@graydon It'd be good if we could still do this. Simply gzipping the contract spec would probably reduce things. Wdyt?

@graydon
Copy link
Contributor

graydon commented May 1, 2023

I don't think it's worth bothering at this level. 1 contract wasm is shared among N contract instances, each instance stores M contract-data entries. M*N is way larger than the 1 wasm contract. Moreover there's just not that much to win on a single wasm: if I look at our example contracts, our largest example is soroban_liquidity_pool_contract.wasm which is 11422 bytes, of which 560 bytes are the meta section. lz4 takes it down to 333 bytes, deflate/gzip takes it to 223 bytes, but .. either way we're talking about maybe winning 2-3% on the wasm blob which is itself the smallest piece of the storage puzzle.

I'd be modestly supportive of, say, applying lz4 to all ledger entries, just to minimize load on the BL; but I think it's too intrusive to specify this at a protocol level, it'd break a ton of stuff and would also be double-compressing when storing long term (archives already gzip buckets). I think the way we'd do this (if we did) would be to teach @SirTyson's new BucketDB layer to store a new flavor of buckets as indexed lz4 frames rather than indexed bucket entries. Basically imitating what we'd get if we were on a filesystem with transparent lz4 compression (several such filesystems exist but they're not what you'll get on your stock linux VM typically running core).

@leighmcculloch
Copy link
Member Author

leighmcculloch commented May 2, 2023

👍🏻 All great points, I agree, not worth doing.

This is also not something that is urgent to look into before the first release. It's easy for us to add new spec formats, the way meta/specs are stored in the WASM is designed to support that.

Closing.

@leighmcculloch leighmcculloch closed this as not planned Won't fix, can't repro, duplicate, stale May 2, 2023
@leighmcculloch leighmcculloch reopened this Dec 1, 2023
@leighmcculloch leighmcculloch self-assigned this Dec 1, 2023
@leighmcculloch leighmcculloch closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2023
@leighmcculloch leighmcculloch removed their assignment Dec 1, 2023
@leighmcculloch leighmcculloch reopened this Dec 1, 2023
@leighmcculloch leighmcculloch self-assigned this Dec 1, 2023
@leighmcculloch leighmcculloch changed the title Contract Spec: Use compressed optimized form of XDR Contract Spec: Experiment with compressed or optimized form of XDR Dec 1, 2023
@leighmcculloch
Copy link
Member Author

leighmcculloch commented Dec 1, 2023

I did two quick hacks to see if some of the ideas I'd had about this were worth slipping in to the release.

For both I used the token interface as a guide, which has a contract spec that is over 5336 bytes today with doc comments, and 1004 bytes without doc comments.

Both underwhelming. The token contract has a large number of functions. For contracts with less functions it is even more underwhelming.

If writing long docs on functions was desirable, gzipping would be worth it, but I also expect folks will write brief docs, or no docs and document elsewhere. Of course, doing nothing here might cause people to never write docs, but I don't think 3000 vs 5000 will make a difference to that decision.

I don't think it's worth bothering at this level.

@graydon you are still right 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants