Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(textual): Require signing over raw bytes #12910

Merged
merged 10 commits into from
Sep 5, 2022
23 changes: 22 additions & 1 deletion docs/architecture/adr-050-sign-mode-textual.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- Dec 06, 2021: Initial Draft.
- Feb 07, 2022: Draft read and concept-ACKed by the Ledger team.
- May 16, 2022: Change status to Accepted.
- Aug 11, 2022: Require signing over tx raw bytes.

## Status

Expand Down Expand Up @@ -115,6 +116,7 @@ Tip: <string>
*Public Key: <base64_string>
*Sequence: <uint64>
*End of other signers
*Hash of raw bytes: <base64_string> // Base64 encoding of bytes defined in #10, to prevent tx hash malleability.
```

### 8. Encoding of the Transaction Body
Expand Down Expand Up @@ -158,7 +160,23 @@ Grantee: cosmos1ghi...jkl
End of transaction messages
```

### 9. Signing Payload and Wire Format
### 10. Require signing over the `TxBody` and `AuthInfo` raw bytes

Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](https://github.com/cosmos/cosmos-sdk/blob/v0.46.0/proto/cosmos/tx/v1beta1/tx.proto#L33), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:
amaury1093 marked this conversation as resolved.
Show resolved Hide resolved

```
*Hash of raw bytes: <base64(sha256(<body_bytes> ++ " " ++ <auth_info_bytes>))>
Copy link
Contributor

@arirubinstein arirubinstein Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason base64 is used as the representation here instead of hex bytes? There could result in ambiguity depending on how the transaction can be tampered with. the digest should be represented in a simpler byte representation, e.g. hex-encoded with expected length validation on read

e.g. here are two base64-encoded payloads which differ in serialized representation but the same raw bytes:

payload1="QzNWwq=="
payload2="QzNWwr=="

% echo -n $payload1 | shasum -a256
b679c1a6f416eb2eb18af4dce3347c3aa74e559aa88a8ef0ef4d6eefb0939c32  -
% echo -n $payload2 | shasum -a256
a7e3cd272e296994116cdf160221a26e71ca6b800abb5c62e90086c245fa115e  -

% echo -n $payload1 | base64 -d | xxd
00000000: 4333 56c2                                C3V.
% echo -n $payload2 | base64 -d | xxd
00000000: 4333 56c2                                C3V.

Additionally, while it doesn't appear to immediately be exploitable in the current form, would it be possible to utilize a scheme involving prepending the length of each payload before each, and/or adopt ASN.1/DER representation of the bytes here in the event " " becomes significant?
e.g. H(len(param1)||param1||separator||len(param2)||param2)

Copy link

@peterbourgon peterbourgon Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but more specifically...

We require that the user signs over these canonical bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:

The string is a rendering of the canonical bytes, and not the canonical bytes themselves. Whatever the on-chain representation is, that's what should be signed, directly. Signatures assert integrity only for the literal msg bytes provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need a separator at all, but length prefixing each part makes sense

Copy link
Contributor Author

@amaury1093 amaury1093 Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added length prefix in b15409d.

@arirubinstein Thanks for flagging the non-uniqueness of base64 encoding, I actually wasn't aware. We also use base64 in value renderers. This means that 2 different SignDocTextuals (i.e. 2 different string arrays) can represent the same proto Tx, which breaks the spec on bijectivity. Maybe it's worth indeed considering using hex everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use HEX for everything personally. Base64 is such a mess.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!

Copy link
Contributor Author

@amaury1093 amaury1093 Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to capital hex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note1: there is also z-base-32 which is more compact, readable and doesn't have the padding bytes issue.

Note2: we don't need to length prefix the last part. So <len(body_bytes) ++ <body_bytes> ++ <auth_info_bytes> is good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For z-base-32 support, here is the spec (good read!): https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

BTW, Algorand is using base32.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking: #13141

```

where `++` denotes concatenation.

This is to prevent transaction hash malleability. The point #1 about bijectivity assures that transaction `body` and `auth_info` values are not malleable, but the transaction hash still might be malleable with point #1 only, because the SIGN_MODE_TEXTUAL strings don't follow the byte ordering defined in `body_bytes` and `auth_info_bytes`. Without this hash, a malicious validator or exchange could intercept a transaction, modify its transaction hash _after_ the user signed it using SIGN_MODE_TEXTUAL (by tweaking the byte ordering inside `body_bytes` or `auth_info_bytes`), and then submit it to Tendermint.
amaury1093 marked this conversation as resolved.
Show resolved Hide resolved

By including this hash in the SIGN_MODE_TEXTUAL signing payload, we keep the same level of guarantees as [SIGN_MODE_DIRECT](./adr-020-protobuf-transaction-encoding.md).

These bytes are only shown in expert mode, hence the leading `*`.

### 11. Signing Payload and Wire Format

This string array is encoded as a single `\n`-delimited string before transmitted to the hardware device, and this long string is the signing payload signed by the hardware wallet.

Expand Down Expand Up @@ -232,6 +250,7 @@ Amount: 10 atom // Conversion from uatom to atom using value renderer
End of transaction messages
Fee: 0.002 atom
*Gas: 100'000
*Hash of raw bytes: <base64_string>
```

#### Example 2: Multi-Msg Transaction with 3 signers
Expand Down Expand Up @@ -320,6 +339,7 @@ Tip: 200 ibc/CDC4587874B85BEA4FCEC3CEA5A1195139799A1FEE711A07D972537E18FDA39D
*Sign mode: Direct Aux
*Sequence: 42
*End of other signers
*Hash of raw bytes: <base64_string>
```

#### Example 5: Complex Transaction with Nested Messages
Expand Down Expand Up @@ -466,6 +486,7 @@ Fee: 0.002 atom
*Sign mode: Direct
*Sequence: 42
*End of other signers
*Hash of raw bytes: <base64_string>
```

## Consequences
Expand Down