Skip to content
This repository was archived by the owner on Jul 14, 2023. It is now read-only.

Statefull hashing host function #12

Closed
wants to merge 4 commits into from
Closed

Statefull hashing host function #12

wants to merge 4 commits into from

Conversation

cheme
Copy link

@cheme cheme commented Apr 11, 2023

This PPP complement the blob storage proposal of #11 with a way to avoid some memory copies when hashing.

The idea is to keep state of hasher in the host, and only send chunks of bytes to it from the runtime.

@tomaka
Copy link
Contributor

tomaka commented Apr 12, 2023

Why this stack-based design? Can we not just have a list of hashers, rather than a stack?

@cheme
Copy link
Author

cheme commented Apr 12, 2023

It is an open question.
My opinion is that the goals should be to have a a handle allocation as simple as it can be. Using a stack we make sure the a new handle is always the biggest handle.
So the point would be that there is no search for free handle and simple reuse of handle (since it need to be deterministic and simple to implement).
Tbh, I feel like an alternative could be to just never use again any handle so next handle id is always previous handle id + 1 up to some limit (u32 max may be too big a limit I think).
Would using a list makes things simpler here (I am not exactly sure how it would work)?

@tomaka
Copy link
Contributor

tomaka commented Apr 12, 2023

I was initially thinking that this is implementation-specific, and that we could for example use a https://docs.rs/slab/latest/slab/ to allocate the hashers.

However, if there are multiple different allocation strategies then a malicious runtime/PVF could do for example if ext_hashing_get_hasher() == 5 { ... } to introduce some non-determinism, so maybe it's best to precisely explain how handles must be allocated.

@cheme cheme mentioned this pull request May 8, 2023
@burdges
Copy link

burdges commented May 12, 2023

As a rule, hashers have well specified internal states, so morally a hasher host call takes the form:

impl Hasher {
    fn update(&mut self, bytes &[u8]);
}

In other words, the hasher should take (usize,usize,usize) with the first being a raw pointer to the fixed size internal hasher state in wasm memory, the second being a raw pointer to the bytes to be hashed in wasm memory, and the third being the number of bytes to be hashed

@cheme
Copy link
Author

cheme commented May 16, 2023

I did use the formultation: "a pointer-size containing a buffer of byte content to send in the hasher." to try to stick with polkadot spec. But it really is just https://github.com/cheme/substrate/blob/9415e8f9ddac9b4ba26e903e103143607d0bdd7b/primitives/io/src/lib.rs#L1462 .

This stick to existing slice passing for host function, I am not sure if it does a memory copy (I fear it does), even if I would really not think it need copy in this case, but I would put this consideration out of scope of the proposal (I mean it applies to all host function taking &[u8] as parameter).
Edit: actually that seems perfectly fine : from runtime interface docs: //! | &[u8]|u64| <code>v.len() 32bit << 32 &#124; v.as_ptr() 32bit</code> |

@burdges
Copy link

burdges commented May 17, 2023

@Noc2
Copy link
Contributor

Noc2 commented Jul 14, 2023

We will archive this repository in favor of the new RFCs repository by the Fellowship. So, I'm closing the issue here. I recommend opening a PR at the new repository.

@Noc2 Noc2 closed this Jul 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants