Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python and JS wrappers #2

Open
DonaldTsang opened this issue Jul 25, 2019 · 9 comments
Open

Python and JS wrappers #2

DonaldTsang opened this issue Jul 25, 2019 · 9 comments

Comments

@DonaldTsang
Copy link

Would be useful to have wrappers that makes such usage easy.
Standard formatting: https://docs.python.org/3/library/hashlib.html
x = k12(param="number") x.update(b"binary data") y = x.digest()

@mumbleskates
Copy link
Contributor

There is no standard API for XOFs in Python, so this would entail inventing one at the very minimum.

@DonaldTsang
Copy link
Author

@mumbleskates thanks for the response, what do you mean by "standard API"? We can simply copy the API standard of BLAKE2 where they allow the existence of digest size, key and salt.

@mumbleskates
Copy link
Contributor

Cool, that's a fair observation! I forgot they added those in 3.6.

The C api of libk12.so is pretty straightforward, so calling into it with ctypes is not overly complex. I would expect the work to be 30% plumbing and 60% packaging and 10% left over for nonsense

@gvanas
Copy link
Contributor

gvanas commented May 16, 2020

Creating bindings of K12 for other languages would of course be of great interest for the community. See for instance the bindings for Rust written by Jack O'Connor.

However, to be honest, I don't think I will work on this any time soon. For the time being, I stay focused on cryptography and core cryptographic implementations. Also, although I think (or hope) that I have the abilities to write such wrappers, I do not have any experience.

So, if anyone wants to take on this task it would certainly be very useful.

@mumbleskates
Copy link
Contributor

Yes, I believe this would definitely belong in a separate repository.

@DonaldTsang
Copy link
Author

It should be in a separate repo, I agree, but it should be good to know that it should be done.

@mumbleskates
Copy link
Contributor

Regardless of a precedent for variable output length with blake2, Some immediate challenges i can see:

  • hashlib objects are designed to be initialized with optional data, optionally updated zero or more times with more data, and then produce a digest. They can then be updated again, and produce another digest of the longer data without re-hashing the earlier data.
  • Because of this multi step nature, there's no clear way to use the KangarooTwelve(...) all-in-one function, and any wrapper library needs to know how to allocate and manage K12 states if it is to conform to the existing API standard.
  • Managing allocated structs like this is a non-trivial affair that probably requires additional C code unless public APIs are added to return both sizeof(KangarooTwelve_Instance) and the required alignment (or we assume some janky worst-case and use that instead and hope it keeps working).
  • blake2 is still not an XOF, it only has a variable-size digest. There's no provision for squeezing from it multiple times, or producing a number of bytes from it unknown at initialization time. none of them have stateful modes, so there is no existing function with a concept of "sponge now in squeezing mode, you can't update with more data now".
  • There is no clear resolution here that solves all these problems, except perhaps to provide an alternate API for absorb/squeeze usage from a module outside hashlib.

So from all of the above, there are still non-trivial amounts of work to be done to 1) figure out the desired python API, and either 2a) add a couple helpful bits to K12 or 2b) create a new library with its own .so to use.

I would prefer adding public APIs to return the size and alignment of the struct, since managing a mutable buffer with a bytearray and ctypes is pretty doable and managing lifetimes of memory allocated by a C extension sounds pretty lame. Having these functions seems potentially useful for other language wrappers as well.

@gvanas
Copy link
Contributor

gvanas commented May 18, 2020

There's no provision for squeezing from it multiple times, or producing a number of bytes from it unknown at initialization time. none of them have stateful modes, so there is no existing function with a concept of "sponge now in squeezing mode, you can't update with more data now".

The API for SHAKE128 and SHAKE256 in Python solves part of the problem. The method digest(Length) allows to produce a number of bytes unknown at initialization time, but it does not really enter in the "squeezing phase" since requesting more bytes restarts from the beginning of the output stream.

I would prefer adding public APIs to return the size and alignment of the struct, since managing a mutable buffer with a bytearray and ctypes is pretty doable and managing lifetimes of memory allocated by a C extension sounds pretty lame. Having these functions seems potentially useful for other language wrappers as well.

A possible fallback solution would be to have a canonical struct to store the state (with fixed size and alignment) and functions that import/export from/to the KangarooTwelve_Instance struct.

@mumbleskates
Copy link
Contributor

mumbleskates commented May 19, 2020

A possible fallback solution would be to have a canonical struct to store the state (with fixed size and alignment) and functions that import/export from/to the KangarooTwelve_Instance struct.

We could, but this still punts on the idea of knowing how much space to allocate.

However, looking at it a little more, I think it might be quite safe to choose the current 64B aligned struct sizeof() and always use that. I believe that currently the most-aligned layout has a sizeof() of 512, and looks like this:

[  0, 200): queueNode.state       -- 200 bytes
 200      : queueNode.byteIOIndex -- 1 byte
 201      : queueNode.squeezing   -- 1 byte
[202, 256):   <padding>           -- 54 bytes
[256, 456): finalNode.state       -- 200 bytes
 456      : finalNode.byteIOIndex -- 1 byte
 457      : finalNode.squeezing   -- 1 byte
[458, 464):   <padding>           -- 6 bytes
[464, 472): fixedOutputLength     -- 8 bytes
[472, 480): blockNumber           -- 8 bytes
[480, 484): queueAbsorbedLen      -- 4 bytes
[484, 488): phase                 -- 4 bytes
[488, 512):   <padding>           -- 24 bytes

This actually seems like ample overhead and it could be very reasonable to just assume that 512 bytes, 64-byte aligned, will probably always be enough. It seems kind of unlikely that we would need to support a build whose padding overhead is somehow even worse than this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants