-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve consistancy of some base codes #114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say base2 encoding is more of a theoretical exercise. I'd be OK with changing it. I don't think it's widely used. I also don't know if there's any good or historic reason to use 0
. @Stebalien do you know?
I don't think there's a good reason one way or the other, no. And honestly, nobody should use these and I'd prefer to just remove them. But I'd have no objection to changing it. |
I find them all useful, certainly because binary and octal are also used in programming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the last IPLD community/sync call we talked about this issue. The folks in the call see Base2 useful for educational purpose and agree that it should be made consistent with other base encodings.
@bumblefudge do you want this to get in before or after #109?
Did I saw some IANA considerations? That would indeed be nice. |
Yes. Though only the ones we currently have labeled as "default" will be moved over there (if it is accepted). |
https://github.com/multiformats/go-multibase/blob/master/base2.go. So this is used by anyone that currently imports the base encodings from go-multibase. Does this mean there are tons of users of ipfs.io, kubo, or others use base2? Are there lots of educational materials showing people the base2 multibase? Probably not, but I don't know not sure if anyone's done much investigating here. For example: https://cloudflare-ipfs.com/ipfs/00000000101010101000000000000000101000001 Out of curiosity though. I feel like most times I've seen unary represented it's with a |
Agreed. Hopefully it will be accepted. The others could follow if becoming |
I didn't investigate anything at the moment. I wasn't even aware of this Cloudflare URL either. If the amount of usage is very little, or above URL is the only real usage (which is just decoding, because last byte
Many cultures would agree, because If we all disagree on unary, we also can give prefix |
I mostly don't understand the why this is worth doing. Yeah, it arguably should've been
If people really want to break things because it's not nice and no one is using base2 in practice (I hope not), and it's not in use in tutorials/educational materials (I don't know, but could certainly see it happen) then 🤷, ok I guess. It seems bad practice to break things because they seem slightly off (e.g. https://en.wikipedia.org/wiki/HTTP_referer) especially if the alternative is potentially just worse (to introduce unary we now have to use That link was mostly just evidence that code that does base2 multibase decoding is/has been deployed. You can take any CID and chuck it in there and it'll work https://cloudflare-ipfs.com/ipfs/0000000010111000000010010001000001111111000011100101100010110011010100000000001101111001010101000000111110000100010000100110001011000010010111101010010001111011101011110010100111101101111001110111011010001100001111011001110010001101101101001101011010001111110001111011101111101110111100000. |
I don't have a strong opinion on it, but I think consistency within specs is always great. There are sometimes those little small details that don't match up in multiformats/ipld/ipfs specs that are only there for historic reasons as someone once made some decision before they had the full picture. I think we should be free to break those things more often. For me this is such a case. |
It might, or it might not, depending on how important to you is the convention that each code is the "highest codepoint" allowed in that encoding alphabet, a convention that seems to have been a guiding principle at some point but never really enforced or codified. In the current spec, it's barely mentioned, and I've tried to make it more explicit but still nonbinding in my big PR that makes everything I can find more explicit to force differences of interpretation to the fore before taking things to IETF.
This is actually a valid argument, IMHO, for not being consistent about the "highest codepoint" rule, since it seems to reflect ergonomic preferences of people who wade around in lower-level programming and raw bytes all day (this is not my personal experience so I defer to the experts here!). I think "whatever will be intuitive and ergonomic for the developers who will use that codec in that use case" definitely trumps the "highest codepoint" rule, if you think it's the case with binary and unary!
Wow, this research is turning up so many implementation details I never knew about! It's super cool that Cloudflare's gateway already has this multibase at scale, I think asking them to break it in prod would be a huge ask. I'm calling this "strong argument #2" for leaving binary and unary as they are :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another rabbit hole for you: https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation
I don't believe we can do this, and the README even states as much under "Reserved":
1
- Base58 encoded identity multihashes used by libp2p peer IDs.
Even in current libp2p code it makes these peer IDs: https://github.com/libp2p/go-libp2p/blob/37319a699f336e3e062c7894f44d47db3fea4dc2/core/peer/peer.go#L179-L190
If the key is <42 bytes it "inlines" it using identity multicodec (0x00
) and then just does a straight base58btc encode of that string, which turns 0x00
into 1
...
These strings are everywhere and it means there's code like this: https://github.com/libp2p/go-libp2p/blob/37319a699f336e3e062c7894f44d47db3fea4dc2/core/peer/peer.go#L132
And in the spec linked above it has to have this line:
If it starts with 1 or Qm, it's a bare base58btc encoded multihash. Decode it according to the base58btc algorithm.
So I think that's a decisive no for this PR, and the reservation of 1
has to stay alive for "historical reasons", just like the reservation of the null byte.
Thanks everyone who dug deeper into the matter. I only had a quick look. I'm now convinced that we just keep the base 2 encoding as it is. |
I completely missed the reserved |
Yeah, I think it should stay |
I think so @ben221199; but good news! With #109 merged, all 3 are now listed in the table as "reserved" and there's a section describing why. |
I see a extra column with codepoint information. That is nice. The status |
Also, |
I know that many of you will go wild when seeing this pull request, but I think it could be useful. Also, I dare to do it, because
base2
is acandidate
and not adefault
.The reason for this change is simple:
0
and1
. Taking the highest to use as code,1
. This is not the case, so lets change it.0
,1
,2
,3
,4
,5
,6
and7
. Taking the higest,7
to use as code. This is already the case.0
,1
,2
,3
,4
,5
,6
,7
,8
and9
. Taking the higest,9
to use as code. This is already the case.0
,1
,2
,3
,4
,5
,6
,7
,8
,9
,A
,B
,C
,D
,E
andF
. Taking the higest,F
to use as code. This is already the case. Same for lowercase.If people disagree on changing it, we can also add a new
base2
record with1
as code. Eventually, we later decide if we drop code0
or give it another purpose. (For example, unary. I like that one.)