Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

Merged
merged 21 commits into from
Jun 17, 2021

Conversation

nategraf
Copy link
Contributor

@nategraf nategraf commented Jun 1, 2021

Description

Adds improved normalization for BIP-39 mnemonic phrases, as well as a new heuristic-based method for generating suggested corrections to a malformed mnemonic phrase.

Autocorrect suggestions are designed to correct words that may have been copied incorrectly, or mistyped when entered to recover an account. It suggests similar phrases, ordered by estimated probability that they are the source phrase of the observed phrase after being passed through a "noisy channel". Ordering is determined by edit total edit distance of the observed (incorrect) phrase to the suggestion.

Other changes

  • Major refactoring of accounts.test.ts
  • Refactoring of accounts.ts

Tested

Added unit tests which include a number of new phrases for normalization and correction testing.

Related issues

@nategraf nategraf requested a review from jmrossy June 1, 2021 21:48
@nategraf nategraf requested review from barbaraliau, medhakothari and a team as code owners June 1, 2021 21:48
@nategraf
Copy link
Contributor Author

nategraf commented Jun 1, 2021

Looks like formatNonAccentedCharacters broke the @celo/celocli build. I'll got ahead and update usage of that function to use normalizeMnemonic instead.

Copy link
Contributor

@jmrossy jmrossy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, just a few questions below

for (const language of languages) {
if (bip39ToUse.validateMnemonic(mnemonic, getWordList(language))) {
for (const guessedLanguage of languages) {
if (bip39ToUse.validateMnemonic(mnemonic, getWordList(guessedLanguage))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe validation is super fast so it's irrelevant but I wonder if comparing the words to the language dictionaries would be quicker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually I see you have a detectMnemonicLanguage function below. Why not use that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably slightly faster to use bip39.validateMnemonic because that function will bail on the first invalid word instead of comparing the phrase to all languages. Mostly though, this way of doing it feel more direct and makes sense to me personally.

@@ -2,7 +2,8 @@
"extends": "@celo/typescript/tsconfig.library.json",
"compilerOptions": {
"rootDir": "src",
"outDir": "lib"
"outDir": "lib",
"downlevelIteration": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for handling accented chars better? I wish json allowed comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to allow for iterating over generator functions

Copy link
Contributor

@barbaraliau barbaraliau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple questions about test cases, but no real concerns.

publicKey: '0361d8adcac067bb2927d625e642af5f1f53914b102d0740ad97d103ea079a6ce4',
{
mnemonic:
'declarer effrayer estime carbone bebe danger déphaser junior buisson ériger morceau cintrer',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this test (probably because I don't know French or Spanish very well). The mnemonic is in French but it is expected to be in Spanish, so it adds the accents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain what this is testing

expect(normalizeMnemonic(mnemonic)).toEqual(expected)
})

it('should normalize extra and non-standard whitespace', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any weird edge cases for whitespace with other languages like Korean or Japanese?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Only Japanese, but I've added a test case specifically for it now.

@nategraf nategraf added the automerge Have PR merge automatically when checks pass label Jun 17, 2021
@nategraf nategraf removed the automerge Have PR merge automatically when checks pass label Jun 17, 2021
@nategraf nategraf added the automerge Have PR merge automatically when checks pass label Jun 17, 2021
@mergify mergify bot merged commit c7b05ae into master Jun 17, 2021
@mergify mergify bot deleted the victor/mnemonic-autocorrect branch June 17, 2021 23:29
tkporter pushed a commit that referenced this pull request Jul 8, 2021
… mnemonic phrases (#8034)

### Description

Adds improved normalization for BIP-39 mnemonic phrases, as well as a new heuristic-based method for generating suggested corrections to a malformed mnemonic phrase.

Autocorrect suggestions are designed to correct words that may have been copied incorrectly, or mistyped when entered to recover an account. It suggests similar phrases, ordered by estimated probability that they are the source phrase of the observed phrase after being passed through a "noisy channel". Ordering is determined by edit total edit distance of the observed (incorrect) phrase to the suggestion.

### Other changes

* Major refactoring of `accounts.test.ts`
* Refactoring of `accounts.ts`

### Tested

Added unit tests which include a number of new phrases for normalization and correction testing.

### Related issues

- Related #7060
mergify bot pushed a commit to valora-inc/wallet that referenced this pull request Jul 19, 2021
### Description

Utilizing the functions included in `@celo/utils` in
celo-org/celo-monorepo#8034, this PR adds support for the wallet to
automatically correct minor mnemonic phrase errors, such as typos or replacement of simmilar
words, during the restore/import wallet flow.

When given an invalid mnemonic, the applications will spend up to 5 seconds searching for an
simmilar corrected mnemonic phrase. It tries suggestions by order of edit distance from the given
phrase and checks the balance of each valid mnemonic phrase it derives. If one of the phrases has a
balance, it is almost surely the intended account, so the wallet uses that phrase instead of the
invalid user given phrase. If no phrase can be found with a balance, then an error is displayed to
the user as before.

### Other changes

* Modified the error text upon input of an incorrect phrase.
* Added comments to various React compenents.
* Include cEUR where needed to allow the code to compile.
* Remove `celotool` and `celocli` commands from `package.json`
* Updates translation mocks and tests to use the parameters passed in

### Tested

* Added unit tests to ensure the new functionality works in the import saga.
* Manually tested with various phrases.


### How others should test

Using a wallet that has a balance, restore the wallet from the mnemonic phrase. (With the mnemonic
phrase, reset the application and upon relaunching it, enter the restore wallet flow) When entering
the phrase, make sure to add some errors (if you don't naturally make errors when typing). Press
submit and it the mnemonic phrase is accepted, then the feature worked.

### Related issues

- Fixes celo-org/celo-monorepo#7060
- Requires celo-org/celo-monorepo#8034
- Requires celo-org/celo-monorepo#8146

### Backwards compatibility

No concerns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Have PR merge automatically when checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants