Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base58 instead of Base32 and Remove SCID shortening #79

Merged
merged 6 commits into from
Jul 26, 2024

Conversation

bj-ms
Copy link
Contributor

@bj-ms bj-ms commented Jul 23, 2024

As part of the discussion and security concerns in https://github.com/bcgov/trustdidweb/issues/75, I propose to change Base32 to Base 58 and remove the shortening of the SCID during it's generation.

I also changed the version, changelog and added myself as an author (feel free to comment on a possible rollback on that part).

@swcurran
Copy link
Collaborator

@bj-ms -- thanks! Please add DCO sign-off to the PR. See DCO - Developer Certificate of Origin - https://github.com/apps/dco. Instructions for fixing the commit are on the "Details" beside the failed check.

@swcurran swcurran requested review from swcurran, brianorwhatever and andrewwhitehead and removed request for swcurran July 23, 2024 13:45
@bj-ms bj-ms force-pushed the main branch 3 times, most recently from bd9f54c to b86c422 Compare July 23, 2024 14:59
…ring the scid generation; added changelog; added author

Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
@@ -620,7 +620,7 @@ first [[ref: DID log entry]] and is a portion of the hash of the DID's inception
To generate the required [[ref: SCID]] for a `did:tdw` DID, the DID Controller
**MUST** execute the following function:

`left(base32_lower(hash(JCS(preliminary log entry with placeholders))), <length>)`
`base58(hash(JCS(preliminary log entry with placeholders)))`
Copy link
Collaborator

@swcurran swcurran Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we want the flexibility/future-proofing to support different hash algorithms with backwards compatibility, I think we should use multihash (at least) and perhaps multibase here, so that a verifier can detect the hash (and encoding) algorithm(s) used. The spec. will dictate the allowed hashes to generate, but a verifier should be flexible in receiving hashes.

Do we need multibase as well as multihash?

If we use multihash, the length of the (untruncated) SCID (and entry hashes) will be 46 characters -- reference. So the SCID is up 18 from the previous 28, and we eliminate messing with the truncation. I think that is fine -- especially once we get into PQ public keys :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a minor security benefit to adding the hash algorithm, as it then could not be swapped in a replacement log file. That would still require producing a hash collision though, which is very unlikely.

Another option would be to add a version byte to the hash, fixing the generation algorithm version that was used. We're not currently using multihash so it would be a new dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m less worried about the security as the interop. Defense coding and future-proofing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use multibase and multihash and remove the hash parameter. Write normative statements saying which hash and base algs are allowed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m good with that. We definitely want to constrain what a DID Controller is allowed to use (hence the normative statements) and allow a resolver to discover the DID Controller's choices. I do think we keep the spec version and use that as an indicator to the resolver. The spec. version dictates the options.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So SCID calculation becomes:

SCID = multibase(<base>,multihash(<hash>, JCS(input with placeholders)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep version is fine I just want to remove hash from the parameters as that does the same job as multihash is.. other than the minor security benefit andrew mentions above which I don't see as a concern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see multihash, but I'd rather not use multibase. We don't have a need to support multiple encodings and it's more work for verifiers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'm fine with that. Also fine with not making the multihash change and just changing to base58btc in this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to see multi-hash. I can see a time where we add, for example, sha3-256 (as we briefly did), while still allowing sha-256 (since it is needed for long lasting DIDs). Multihash would be important.

Multibase less so. There are not going to be massive breakthroughs in efficiency or security related to encoding schemes :-).

Its output is the lower case of the Base32 encoded string of its input.
5. `left` extracts the `<length>` number of characters from the string input.
1. `<length>` **MUST** be at least 28 characters.
4. `base58` is an implementation of the [[ref: base58]] function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use multihash, I think we wind up using base58btc? Or is it still base58?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (expired) draft that is referenced is the Bitcoin version of base58, aka base58-btc.

Copy link
Collaborator

@swcurran swcurran Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another spec we should be referencing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side question @brianorwhatever — What is “Controller Document”? There is no description of why it is and what it is for. AFAICS, it is much like DIDs, and I heard Manu said it started with the DID Spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essentially where the normative statements for multibase/multihash had to be moved to in the VCWG as the group couldn't get consensus to have them in VCDM and there is a shared requirement between DI/JOSE.. I think basically a DID Document is a controller document

scid = 28+( lower-base32 )
lower-base32 = [2-7a-z]
scid = ( base58 )
base58 = [1-9A-Za-z]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be a bit more specific with [1-9A-HJ-NP-Za-km-z] !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or leave it off and leave it to the spec.

@swcurran
Copy link
Collaborator

@bj-ms — are you good with updating your PR to reflect the feedback? I think yours is close, but there will be a few more places to address. Happy to do that as a follow up PR. Let me know how you want to proceed.

@bj-ms
Copy link
Contributor Author

bj-ms commented Jul 25, 2024

@bj-ms — are you good with updating your PR to reflect the feedback? I think yours is close, but there will be a few more places to address. Happy to do that as a follow up PR. Let me know how you want to proceed.

I'm good with that, I will make the following modifications:

  • Use multihash in SCID and mention the Controller document multihash table.
  • Use base58btc in SCID
  • Remove hash property from the did log
  • Remove the definition of SCID in the chapter Method-Specific Identifier

bj-ms added 4 commits July 25, 2024 09:12
…n identifier and their sizes.

Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
digits 2-7.
~ Applies [[spec:draft-msporny-base58-03]] to convert
data to a `base58` encoding. Data encoded as
base58 consists of a string of characters containing only the letters A-Z, a-z and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base58 consists of 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz This purposely leaves off some characters that might be misread by humans.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we don’t repeat the data here — just reference the spec.

Signed-off-by: Michel Sahli <michel.sahli@bj.admin.ch>
@martipos
Copy link
Contributor

I am happy with the changes to base58 and skipping the trimming. Just two additional inputs.

  1. It seams base58 does not perform very well (in term of speed) with large byte values. Given that the SCID is generated based on the entry log this should however not be an issue as I assume the entry log will not reach a sized of concern (or are there any things planned on that end) ?
  2. Out of curiosity, would double hashing be of interesting in generating the SCID?

@swcurran
Copy link
Collaborator

The first issue is not a concern, as the input will always be a hash, which will be (for now) 32 bytes, perhaps longer if/when other hash algorithms are needed. Never more than 100 characters.

I’m not sure about a double hash as I don’t know how to calculate collision probability and how that translates into “time to calculate” in typical conditions (including Quantum conditions). AFAIK, the attacker must generate a new log entry with a key they control and take over the web server publishing the file. The hashing controls the probability of the first part of that.

@swcurran
Copy link
Collaborator

There are a couple of very minor clarifications I'd like to make to this (most of which are a bit tangential), but not worth holding up the merge. I think we have all agreed on the content. Let's get this merged.

@andrewwhitehead
Copy link
Contributor

The double hashing could be worth exploring if it increases the preimage resistance, although the difference would be pretty minor. Bitcoin uses it but without any real explanation of the purpose, IIRC. It would be easy to support without adding additional hash functions, but I'm not sure how it would compare to sha512/256 in strength.

@swcurran swcurran merged commit e3dfa22 into decentralized-identity:main Jul 26, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants