Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(textual): Require signing over raw bytes #12910

Merged
merged 10 commits into from
Sep 5, 2022
Merged

Conversation

amaury1093
Copy link
Contributor

@amaury1093 amaury1093 commented Aug 11, 2022

Description

Security concerns around what's signed, pulled out from #12785 (comment).

cc @peterbourgon


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@amaury1093 amaury1093 requested a review from a team as a code owner August 11, 2022 15:47
@peterbourgon
Copy link

Relevant comments:

tl;dr: signatures verify the integrity of the signed bytes, not a semantic transformation of those bytes

Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:

```
*Hash of raw bytes: <base64(sha256(<body_bytes> ++ " " ++ <auth_info_bytes>))>
Copy link
Contributor

@arirubinstein arirubinstein Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason base64 is used as the representation here instead of hex bytes? There could result in ambiguity depending on how the transaction can be tampered with. the digest should be represented in a simpler byte representation, e.g. hex-encoded with expected length validation on read

e.g. here are two base64-encoded payloads which differ in serialized representation but the same raw bytes:

payload1="QzNWwq=="
payload2="QzNWwr=="

% echo -n $payload1 | shasum -a256
b679c1a6f416eb2eb18af4dce3347c3aa74e559aa88a8ef0ef4d6eefb0939c32  -
% echo -n $payload2 | shasum -a256
a7e3cd272e296994116cdf160221a26e71ca6b800abb5c62e90086c245fa115e  -

% echo -n $payload1 | base64 -d | xxd
00000000: 4333 56c2                                C3V.
% echo -n $payload2 | base64 -d | xxd
00000000: 4333 56c2                                C3V.

Additionally, while it doesn't appear to immediately be exploitable in the current form, would it be possible to utilize a scheme involving prepending the length of each payload before each, and/or adopt ASN.1/DER representation of the bytes here in the event " " becomes significant?
e.g. H(len(param1)||param1||separator||len(param2)||param2)

Copy link

@peterbourgon peterbourgon Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but more specifically...

We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:

The string is a rendering of the canonical bytes, and not the canonical bytes themselves. Whatever the on-chain representation is, that's what should be signed, directly. Signatures assert integrity only for the literal msg bytes provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need a separator at all, but length prefixing each part makes sense

Copy link
Contributor Author

@amaury1093 amaury1093 Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added length prefix in b15409d.

@arirubinstein Thanks for flagging the non-uniqueness of base64 encoding, I actually wasn't aware. We also use base64 in value renderers. This means that 2 different SignDocTextuals (i.e. 2 different string arrays) can represent the same proto Tx, which breaks the spec on bijectivity. Maybe it's worth indeed considering using hex everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use HEX for everything personally. Base64 is such a mess.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!

Copy link
Contributor Author

@amaury1093 amaury1093 Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to capital hex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note1: there is also z-base-32 which is more compact, readable and doesn't have the padding bytes issue.

Note2: we don't need to length prefix the last part. So <len(body_bytes) ++ <body_bytes> ++ <auth_info_bytes> is good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For z-base-32 support, here is the spec (good read!): https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

BTW, Algorand is using base32.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking: #13141

@amaury1093 amaury1093 force-pushed the am/textual-sign-over-bytes branch from 42f2671 to b15409d Compare August 16, 2022 11:28
Co-authored-by: Aaron Craelius <aaron@regen.network>
@aaronc
Copy link
Member

aaronc commented Aug 16, 2022

One additional thing we should do is make sure that TxRaw bytes are encoded according to ADR 027 in the tx decoder. This is actually an existing malleability issue for SIGN_MODE_DIRECT that we should address.

@amaury1093
Copy link
Contributor Author

@aaronc That has already been addressed in #9743.

@aaronc
Copy link
Member

aaronc commented Aug 17, 2022

Sorry I forgot!

@amaury1093
Copy link
Contributor Author

amaury1093 commented Aug 18, 2022

This is R4R again. We sign over a hex string now.

Edit: about using Hex instead of base64 in other parts of the spec, I did another PR: #12954

Copy link
Collaborator

@robert-zaremba robert-zaremba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added a suggestion to consider z-base-32

mergify bot pushed a commit that referenced this pull request Aug 22, 2022
## Description

ref: #12910 (comment)



---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [ ] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [ ] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#pr-targeting))
- [ ] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/main/docs/building-modules)
- [ ] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [ ] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:

```
*Hash of raw bytes: <HEX(sha256(<len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ <auth_info_bytes>))>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use <foo> to mark the insertion of a variable in the document, can we just use a single set of brackets to include the calculation like this?

Suggested change
*Hash of raw bytes: <HEX(sha256(<len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ <auth_info_bytes>))>
*Hash of raw bytes: <HEX(sha256(len(body_bytes) ++ <body_bytes> ++ len(auth_info_bytes) ++ auth_info_bytes))>

Also we need to specify how len is encoded to bytes. I suggest 4 byte big endian uint32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I had varint in mind, just to accomodate for arbitrarily long txs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, varint is complex and not unique. Also we don't need to optimize for output length here. What about 8 byte big endian uint64 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. uint64 should be plenty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to point out that ASN.1/DER encoding does this natively and has a go lib as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but pulling in a DER dependency for each client comes at a cost, especially when not used consistently thoughout the stack. For Ethereum signatures I had to write DER encoding/decoding by hand in TypeScript, which is not fun. But then again 50 lines of code also don't justify pulling in the whole fully featured ASN.1/DER dependency (if it exists in reasonable quality).

Co-authored-by: Simon Warta <2603011+webmaster128@users.noreply.github.com>
Co-authored-by: Simon Warta <2603011+webmaster128@users.noreply.github.com>
@amaury1093 amaury1093 added the A:automerge Automatically merge PR once all prerequisites pass. label Sep 5, 2022
@amaury1093
Copy link
Contributor Author

Putting automerge on.

@mergify mergify bot merged commit fe90139 into main Sep 5, 2022
@mergify mergify bot deleted the am/textual-sign-over-bytes branch September 5, 2022 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass. Type: ADR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants