Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FormAnalyzer] scoring password hints #720

Merged
merged 21 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 32 additions & 9 deletions dist/autofill-debug.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 32 additions & 9 deletions dist/autofill.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

34 changes: 27 additions & 7 deletions src/Form/FormAnalyzer.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,6 @@ class FormAnalyzer {
*/
this.signals = [];

// Analyse the input that was passed. This is pretty arbitrary, but historically it's been working nicely.
this.evaluateElAttributes(input, 1, true);

// If we have a meaningful container (a form), check that, otherwise check the whole page
if (form !== input) {
this.evaluateForm();
Expand All @@ -53,15 +50,18 @@ class FormAnalyzer {
return this;
}

areLoginOrSignupSignalsWeak() {
return Math.abs(this.autofillSignal) < 10;
}

/**
* Hybrid forms can be used for both login and signup
* @returns {boolean}
*/
get isHybrid() {
// When marking for hybrid we also want to ensure other signals are weak
const areOtherSignalsWeak = Math.abs(this.autofillSignal) < 10;

return this.hybridSignal > 0 && areOtherSignalsWeak;
return this.hybridSignal > 0 && this.areLoginOrSignupSignalsWeak();
}

get isLogin() {
Expand Down Expand Up @@ -146,7 +146,7 @@ class FormAnalyzer {
}

const signupRegexToUse = this.matching.getDDGMatcherRegex(shouldBeConservative ? 'conservativeSignupRegex' : 'signupRegex');
const matchesSignup = safeRegexTest(/new.?password/i, string) || safeRegexTest(signupRegexToUse, string);
const matchesSignup = safeRegexTest(/new.?(password|username)/i, string) || safeRegexTest(signupRegexToUse, string);

// In some cases a login match means the login is somewhere else, i.e. when a link points outside
if (shouldFlip) {
Expand Down Expand Up @@ -230,6 +230,16 @@ class FormAnalyzer {
});
}

evaluatePasswordHints() {
const textContent = this.form.textContent?.replace(/\s+/g, ' ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk here that this causes a performance issue when text content is very large?

Copy link
Collaborator Author

@dbajpeyi dbajpeyi Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially! Actually using this existing utility:

const removeExcessWhitespace = (string = '') => {

I've added a 1000 cutoff limit, had to modify the utility a bit for custom cutoff limit.

66bfaad#diff-29e05a441db72f7ca04df5e51bdccfcdebc988497045a25fd89fc587247c4ed2R836

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbajpeyi Do we have a sense if 1000 is a solid number to rely on? (too big, too small, just right?)

In #720 (comment) it looks like the Apple form has 1800+ characters (mostly whitespace), so 1000 might be too conservative given it won't be run on that form. However, this question comes out of not having a good sense of where that balance lies between performance and usefulness -- do we have data we can lean on? (e.g. we've a bunch of forms in this repo -- what character size are they generally? Or do we have a rule of thumb on the largest size of string we want to run a regex on based on how often this code executes?)

Copy link
Collaborator Author

@dbajpeyi dbajpeyi Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeExcessWhitespaces call from the test run:

With 1000 limit, it is being called ~196/555 (35%) of the forms.
With 1800 limit, it is is being called ~233/555 (41%) of our forms in the suite, this is only just a bit more.

I a tiny bit difference in test suite run time (~10.5 -> 11.4s). I won't say it's that bad though.

The hints are not scored with 1000 limit for the apple form but that one is scored fairly strongly already (score: 9), and the other forms have still improved with 1000 (1 new and 2 existing forms). This for me seems like a good enough improvement, and I rather want to keep on the lower side. If we see more examples, we can increase the threshold to accommodate more. But I think 1000 is a good threshold for now, as it covers most forms that can be improved, without much overhead.

We generally cut texts to TEXT_LENGTH_CUTOFF currently which is set to 100. But in this case, I think we want to go longer. I don't see a very specific reason for the 100 number, I think it was arbitrarily set.

Copy link
Collaborator Author

@dbajpeyi dbajpeyi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another thought, and some more test run and I noticed a tiny bit slower tests vs main. I am sticking to 750 now - which still works for the 3 forms I mentioned, 642d018.

if (textContent) {
const hasPasswordHints = safeRegexTest(this.matching.getDDGMatcherRegex('passwordHintsRegex'), textContent, 500);
if (hasPasswordHints) {
this.increaseSignalBy(5, 'Password hints');
}
}
}

/**
* Function that checks if the element is an external link or a custom web element that
* encapsulates a link.
Expand Down Expand Up @@ -313,6 +323,11 @@ class FormAnalyzer {
// Check page title
this.evaluatePageTitle();

// Evaluate attributes of form's input elements
this.form.querySelectorAll(this.matching.cssSelector('formInputsSelector')).forEach((input) => {
this.evaluateElAttributes(input, 1, true);
});

// Check form attributes
this.evaluateElAttributes(this.form);

Expand All @@ -332,11 +347,16 @@ class FormAnalyzer {
}

// A form with many fields is unlikely to be a login form
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('genericTextField'));
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('genericTextInputField'));
if (relevantFields.length >= 4) {
this.increaseSignalBy(relevantFields.length * 1.5, 'many fields: it is probably not a login');
}

// If we can't decide at this point, try reading password hints
if (this.areLoginOrSignupSignalsWeak()) {
this.evaluatePasswordHints();
}

// If we can't decide at this point, try reading page headings
if (this.autofillSignal === 0) {
this.evaluatePageHeadings();
Expand Down
Loading
Loading