fix: nonce implementation #1754

GentikSolm · 2024-03-21T17:10:06Z

name: Bug Fix Contribution
description: Use this template when contributing a bug fix.
labels: [bug, pull request]

Fix nonce implementation to guard against replay attacks

Nonce is shorthand for Number only used once. We use this to protect
against replay attacks. Since receivers in the bittensor network are
decentralized, requiring domain names for every receiver would be
self-defeating. This forces us to depend on HTTP communication which is prone to
many more attacks than HTTPS. One of these problems is called a replay attack.
This is when a malicious agent intercepts a message being sent to a miner and
sends it again, "replaying" the request.

A nonce is only used once, so sending another request with the same nonce is
required to fail. To accomplish this the server holds a dictionary of sender
identifiers -> last nonce, and makes sure the next nonce is less than the
previous.

# bittensor/axon.py
endpoint_key = f"{synapse.dendrite.hotkey}:{synapse.dendrite.uuid}"

# Check the nonce from the endpoint key.
if (
    endpoint_key in self.nonces.keys()
    and self.nonces[endpoint_key] is not None
    and synapse.dendrite.nonce is not None
    and synapse.dendrite.nonce <= self.nonces[endpoint_key]
):
    raise Exception("Nonce is too small")

This problem here is that nonce's are held in memory. If the server restarts
then there is no nonce held in memory and therefore a duplicate request can be
freely sent by a malicious user.

To solve this, receivers should both keep the last nonce in memory
and require nonces to be UNIX timestamps with a pre-determined delta to the
current time. A delta of 4 seconds was chosen since miners generally take a few
seconds to restart & requests should be able to reach an axon sent from a dendrite
and start the verification process of the request within 4 seconds including network
latency. This way if an attacker attempts to replay a message after the
receiver re-starts the replayed nonce time stamp will be too far behind the
delta to the current time and be rendered invalid.

Example Flow

Delta = 4
Request comes at timestamp 10
Received at timestamp 11
  |  Nonce > now - delta
  |  10    > 11  - 4
  |  Passes delta check. Nonce is within delta
Container restarts
Malicious user sends duplicate request at timestamp 15
  |  Nonce < now - delta
  |  10    < 15  - 4
  |  Fails delta check, nonce is too old

Testing

There are 5 routes that need tested before this PR is merged.

First good Request
Second good Request
Second Request replayed
Second Request replayed after axon restarts
Third Request missing nonce entirely

ifrit98

LGTM, thanks!

GentikSolm and others added 3 commits March 21, 2024 11:46

fix: nonce implementation

d5348b9

Merge branch 'staging' into fix/nonce

954709e

Merge branch 'staging' into fix/nonce

1c2970a

ifrit98 self-requested a review March 25, 2024 19:58

ifrit98 approved these changes Mar 25, 2024

View reviewed changes

ifrit98 merged commit de0bd31 into opentensor:staging Mar 25, 2024
12 checks passed

ifrit98 mentioned this pull request Mar 25, 2024

Release/6.10.0 #1757

Merged

thewhaleking mentioned this pull request May 29, 2024

Include PR #1754 in 7.1 #1953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: nonce implementation #1754

fix: nonce implementation #1754

GentikSolm commented Mar 21, 2024 •

edited by roman-opentensor

Loading

ifrit98 left a comment

fix: nonce implementation #1754

fix: nonce implementation #1754

Conversation

GentikSolm commented Mar 21, 2024 • edited by roman-opentensor Loading

Fix nonce implementation to guard against replay attacks

Example Flow

Testing

ifrit98 left a comment

Choose a reason for hiding this comment

GentikSolm commented Mar 21, 2024 •

edited by roman-opentensor

Loading