Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

nategraf · 2021-06-01T21:48:54Z

Description

Adds improved normalization for BIP-39 mnemonic phrases, as well as a new heuristic-based method for generating suggested corrections to a malformed mnemonic phrase.

Autocorrect suggestions are designed to correct words that may have been copied incorrectly, or mistyped when entered to recover an account. It suggests similar phrases, ordered by estimated probability that they are the source phrase of the observed phrase after being passed through a "noisy channel". Ordering is determined by edit total edit distance of the observed (incorrect) phrase to the suggestion.

Other changes

Major refactoring of accounts.test.ts
Refactoring of accounts.ts

Tested

Added unit tests which include a number of new phrases for normalization and correction testing.

Related issues

Related Implement hueristic-based account key error correction #7060

…y slow.

This reverts commit b1d07df.

…r/mnemonic-autocorrect

nategraf · 2021-06-01T23:52:28Z

Looks like formatNonAccentedCharacters broke the @celo/celocli build. I'll got ahead and update usage of that function to use normalizeMnemonic instead.

…r/mnemonic-autocorrect

jmrossy

Overall lgtm, just a few questions below

jmrossy · 2021-06-08T15:58:24Z

packages/sdk/utils/src/account.ts

-  for (const language of languages) {
-    if (bip39ToUse.validateMnemonic(mnemonic, getWordList(language))) {
+  for (const guessedLanguage of languages) {
+    if (bip39ToUse.validateMnemonic(mnemonic, getWordList(guessedLanguage))) {


Maybe validation is super fast so it's irrelevant but I wonder if comparing the words to the language dictionaries would be quicker?

Or actually I see you have a detectMnemonicLanguage function below. Why not use that here?

Its probably slightly faster to use bip39.validateMnemonic because that function will bail on the first invalid word instead of comparing the phrase to all languages. Mostly though, this way of doing it feel more direct and makes sense to me personally.

jmrossy · 2021-06-08T16:03:04Z

packages/sdk/utils/tsconfig.json

@@ -2,7 +2,8 @@
  "extends": "@celo/typescript/tsconfig.library.json",
  "compilerOptions": {
    "rootDir": "src",
-    "outDir": "lib"
+    "outDir": "lib",
+    "downlevelIteration": true


Is this for handling accented chars better? I wish json allowed comments

It's to allow for iterating over generator functions

…r/mnemonic-autocorrect

barbaraliau

LGTM. Couple questions about test cases, but no real concerns.

barbaraliau · 2021-06-15T20:37:30Z

packages/sdk/utils/src/account.test.ts

-          publicKey: '0361d8adcac067bb2927d625e642af5f1f53914b102d0740ad97d103ea079a6ce4',
+        {
+          mnemonic:
+            'declarer effrayer estime carbone bebe danger déphaser junior buisson ériger morceau cintrer',


I'm not sure I understand this test (probably because I don't know French or Spanish very well). The mnemonic is in French but it is expected to be in Spanish, so it adds the accents?

Added a comment to explain what this is testing

barbaraliau · 2021-06-15T20:45:31Z

packages/sdk/utils/src/account.test.ts

+      expect(normalizeMnemonic(mnemonic)).toEqual(expected)
+    })
+
+    it('should normalize extra and non-standard whitespace', async () => {


Are there any weird edge cases for whitespace with other languages like Korean or Japanese?

Indeed! Only Japanese, but I've added a test case specifically for it now.

…r/mnemonic-autocorrect

… mnemonic phrases (#8034) ### Description Adds improved normalization for BIP-39 mnemonic phrases, as well as a new heuristic-based method for generating suggested corrections to a malformed mnemonic phrase. Autocorrect suggestions are designed to correct words that may have been copied incorrectly, or mistyped when entered to recover an account. It suggests similar phrases, ordered by estimated probability that they are the source phrase of the observed phrase after being passed through a "noisy channel". Ordering is determined by edit total edit distance of the observed (incorrect) phrase to the suggestion. ### Other changes * Major refactoring of `accounts.test.ts` * Refactoring of `accounts.ts` ### Tested Added unit tests which include a number of new phrases for normalization and correction testing. ### Related issues - Related #7060

### Description Utilizing the functions included in `@celo/utils` in celo-org/celo-monorepo#8034, this PR adds support for the wallet to automatically correct minor mnemonic phrase errors, such as typos or replacement of simmilar words, during the restore/import wallet flow. When given an invalid mnemonic, the applications will spend up to 5 seconds searching for an simmilar corrected mnemonic phrase. It tries suggestions by order of edit distance from the given phrase and checks the balance of each valid mnemonic phrase it derives. If one of the phrases has a balance, it is almost surely the intended account, so the wallet uses that phrase instead of the invalid user given phrase. If no phrase can be found with a balance, then an error is displayed to the user as before. ### Other changes * Modified the error text upon input of an incorrect phrase. * Added comments to various React compenents. * Include cEUR where needed to allow the code to compile. * Remove `celotool` and `celocli` commands from `package.json` * Updates translation mocks and tests to use the parameters passed in ### Tested * Added unit tests to ensure the new functionality works in the import saga. * Manually tested with various phrases. ### How others should test Using a wallet that has a balance, restore the wallet from the mnemonic phrase. (With the mnemonic phrase, reset the application and upon relaunching it, enter the restore wallet flow) When entering the phrase, make sure to add some errors (if you don't naturally make errors when typing). Press submit and it the mnemonic phrase is accepted, then the feature worked. ### Related issues - Fixes celo-org/celo-monorepo#7060 - Requires celo-org/celo-monorepo#8034 - Requires celo-org/celo-monorepo#8146 ### Backwards compatibility No concerns

Victor Graf added 13 commits May 25, 2021 15:14

add normaliMnemonic and refactor account utilities and tests

a247a5e

partially working solution. doesn't cover some cases.

4eac41f

working correction with suggestions by edit distance

b168bc4

hack to make it work for typos that result in valid BIP-39 words. ver…

520bc75

…y slow.

refactor suggestion list generator to be _much_ faster

8e0cb47

improved corrections and a few more text cases

320a266

[failed] attempt to allow for non-integer word weights

b1d07df

Revert "[failed] attempt to allow for non-integer word weights"

f9af838

This reverts commit b1d07df.

remove typos

10ff421

add test for suggestion validity and duplication

7b21945

simplify code and remove DO NOT MERGE tags

738112f

Merge branch 'master' of github.com:celo-org/celo-monorepo into victo…

fb316c1

…r/mnemonic-autocorrect

address linter errors

7245800

nategraf requested a review from jmrossy June 1, 2021 21:48

nategraf requested review from barbaraliau, medhakothari and a team as code owners June 1, 2021 21:48

rename and export detect language function

00470b6

Victor Graf added 2 commits June 7, 2021 13:55

Merge branch 'master' of github.com:celo-org/celo-monorepo into victo…

54ee893

…r/mnemonic-autocorrect

migrate to ussage of normalizeMnemonic in the CLI

0e7d587

nategraf requested review from gastonponti and mcortesi as code owners June 7, 2021 21:03

jmrossy reviewed Jun 8, 2021

View reviewed changes

Victor Graf added 2 commits June 10, 2021 14:49

Merge branch 'master' of github.com:celo-org/celo-monorepo into victo…

4f427f7

…r/mnemonic-autocorrect

add an invalid mnemonic words function

542c48a

barbaraliau approved these changes Jun 15, 2021

View reviewed changes

Victor Graf added 2 commits June 17, 2021 14:52

Merge branch 'master' of github.com:celo-org/celo-monorepo into victo…

1ef45fc

…r/mnemonic-autocorrect

address review comments

0c59ff1

nategraf added the automerge Have PR merge automatically when checks pass label Jun 17, 2021

nategraf removed the automerge Have PR merge automatically when checks pass label Jun 17, 2021

add back formatNonAccentedCharacters to fix back compat

4ba2442

nategraf added the automerge Have PR merge automatically when checks pass label Jun 17, 2021

nategraf mentioned this pull request Jun 17, 2021

Automatically correct minor mnemonic phrase errors on import valora-inc/wallet#508

Merged

mergify bot merged commit c7b05ae into master Jun 17, 2021

mergify bot deleted the victor/mnemonic-autocorrect branch June 17, 2021 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

nategraf commented Jun 1, 2021 •

edited

Loading

nategraf commented Jun 1, 2021

jmrossy left a comment

jmrossy Jun 8, 2021

jmrossy Jun 8, 2021

nategraf Jun 9, 2021

jmrossy Jun 8, 2021

nategraf Jun 9, 2021

barbaraliau left a comment

barbaraliau Jun 15, 2021

nategraf Jun 17, 2021

barbaraliau Jun 15, 2021

nategraf Jun 17, 2021

Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

Account utilities for normalization and heuristic based correction of mnemonic phrases #8034

Conversation

nategraf commented Jun 1, 2021 • edited Loading

Description

Other changes

Tested

Related issues

nategraf commented Jun 1, 2021

jmrossy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barbaraliau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nategraf commented Jun 1, 2021 •

edited

Loading