-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(textual): Require signing over raw bytes #12910
Conversation
Relevant comments:
tl;dr: signatures verify the integrity of the signed bytes, not a semantic transformation of those bytes |
Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string: | ||
|
||
``` | ||
*Hash of raw bytes: <base64(sha256(<body_bytes> ++ " " ++ <auth_info_bytes>))> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason base64 is used as the representation here instead of hex bytes? There could result in ambiguity depending on how the transaction can be tampered with. the digest should be represented in a simpler byte representation, e.g. hex-encoded with expected length validation on read
e.g. here are two base64-encoded payloads which differ in serialized representation but the same raw bytes:
payload1="QzNWwq=="
payload2="QzNWwr=="
% echo -n $payload1 | shasum -a256
b679c1a6f416eb2eb18af4dce3347c3aa74e559aa88a8ef0ef4d6eefb0939c32 -
% echo -n $payload2 | shasum -a256
a7e3cd272e296994116cdf160221a26e71ca6b800abb5c62e90086c245fa115e -
% echo -n $payload1 | base64 -d | xxd
00000000: 4333 56c2 C3V.
% echo -n $payload2 | base64 -d | xxd
00000000: 4333 56c2 C3V.
Additionally, while it doesn't appear to immediately be exploitable in the current form, would it be possible to utilize a scheme involving prepending the length of each payload before each, and/or adopt ASN.1/DER representation of the bytes here in the event " "
becomes significant?
e.g. H(len(param1)||param1||separator||len(param2)||param2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but more specifically...
We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:
The string is a rendering of the canonical bytes, and not the canonical bytes themselves. Whatever the on-chain representation is, that's what should be signed, directly. Signatures assert integrity only for the literal msg bytes provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need a separator at all, but length prefixing each part makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added length prefix in b15409d.
@arirubinstein Thanks for flagging the non-uniqueness of base64 encoding, I actually wasn't aware. We also use base64 in value renderers. This means that 2 different SignDocTextuals (i.e. 2 different string arrays) can represent the same proto Tx, which breaks the spec on bijectivity. Maybe it's worth indeed considering using hex everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use HEX for everything personally. Base64 is such a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to capital hex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note1: there is also z-base-32
which is more compact, readable and doesn't have the padding bytes issue.
Note2: we don't need to length prefix the last part. So <len(body_bytes) ++ <body_bytes> ++ <auth_info_bytes>
is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For z-base-32
support, here is the spec (good read!): https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
BTW, Algorand is using base32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking: #13141
42f2671
to
b15409d
Compare
Co-authored-by: Aaron Craelius <aaron@regen.network>
One additional thing we should do is make sure that TxRaw bytes are encoded according to ADR 027 in the tx decoder. This is actually an existing malleability issue for SIGN_MODE_DIRECT that we should address. |
Sorry I forgot! |
This is R4R again. We sign over a hex string now. Edit: about using Hex instead of base64 in other parts of the spec, I did another PR: #12954 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added a suggestion to consider z-base-32
## Description ref: #12910 (comment) --- ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [ ] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] added `!` to the type prefix if API or client breaking change - [ ] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#pr-targeting)) - [ ] provided a link to the relevant issue or specification - [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/main/docs/building-modules) - [ ] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#testing) - [ ] added a changelog entry to `CHANGELOG.md` - [ ] included comments for [documenting Go code](https://blog.golang.org/godoc) - [ ] updated the relevant documentation or specification - [ ] reviewed "Files changed" and left comments if necessary - [ ] confirmed all CI checks have passed ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable)
Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these bytes in SIGN_MODE_TEXTUAL, more specifically over the following string: | ||
|
||
``` | ||
*Hash of raw bytes: <HEX(sha256(<len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ <auth_info_bytes>))> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use <foo>
to mark the insertion of a variable in the document, can we just use a single set of brackets to include the calculation like this?
*Hash of raw bytes: <HEX(sha256(<len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ <auth_info_bytes>))> | |
*Hash of raw bytes: <HEX(sha256(len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ auth_info_bytes))> |
Also we need to specify how len is encoded to bytes. I suggest 4 byte big endian uint32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I had varint in mind, just to accomodate for arbitrarily long txs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, varint is complex and not unique. Also we don't need to optimize for output length here. What about 8 byte big endian uint64 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. uint64 should be plenty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to point out that ASN.1/DER encoding does this natively and has a go lib as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but pulling in a DER dependency for each client comes at a cost, especially when not used consistently thoughout the stack. For Ethereum signatures I had to write DER encoding/decoding by hand in TypeScript, which is not fun. But then again 50 lines of code also don't justify pulling in the whole fully featured ASN.1/DER dependency (if it exists in reasonable quality).
Co-authored-by: Simon Warta <2603011+webmaster128@users.noreply.github.com>
…cosmos-sdk into am/textual-sign-over-bytes
Co-authored-by: Simon Warta <2603011+webmaster128@users.noreply.github.com>
Putting automerge on. |
Description
Security concerns around what's signed, pulled out from #12785 (comment).
cc @peterbourgon
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change