Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hunspell (minor): reduce allocations when processing compound rules #12316

Merged
merged 1 commit into from
May 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ public class Dictionary {
boolean checkCompoundCase, checkCompoundDup, checkCompoundRep;
boolean checkCompoundTriple, simplifiedTriple;
int compoundMin = 3, compoundMax = Integer.MAX_VALUE;
List<CompoundRule> compoundRules; // nullable
CompoundRule[] compoundRules; // nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Would it be any different if you passed the size to ArrayList's constructor? I'm guessing streams have an associated cost but if it's nearly the same then I'd go with less verbose version (collections).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the list, the question arose whether to iterate over it with foreach or with indexed access. The latter definitely has no allocations but results in wordier code (including an IDE suppression comment). So I chose an array to get the best of both worlds :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sure. Do those allocations really show up in benchmarks? I'd think it to be almost non-detectable as long as it doesn't go beyond local thread allocation buffers. Just wondering, I'm fine with the patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an allocation snapshot, yes. I suspect that might not always be real allocations, but since it's there and easy to remove, why not

List<CheckCompoundPattern> checkCompoundPatterns = new ArrayList<>();

// ignored characters (dictionary, affix, inputs)
Expand Down Expand Up @@ -601,11 +601,11 @@ private String[] splitBySpace(LineNumberReader reader, String line, int minParts
return parts;
}

private List<CompoundRule> parseCompoundRules(LineNumberReader reader, int num)
private CompoundRule[] parseCompoundRules(LineNumberReader reader, int num)
throws IOException, ParseException {
List<CompoundRule> compoundRules = new ArrayList<>();
CompoundRule[] compoundRules = new CompoundRule[num];
for (int i = 0; i < num; i++) {
compoundRules.add(new CompoundRule(singleArgument(reader, reader.readLine()), this));
compoundRules[i] = new CompoundRule(singleArgument(reader, reader.readLine()), this);
}
return compoundRules;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ private boolean checkCompoundRules(
if (forms != null) {
words.add(forms);

if (dictionary.compoundRules.stream().anyMatch(r -> r.mayMatch(words))) {
if (mayHaveCompoundRule(words)) {
if (checkLastCompoundPart(wordChars, offset + breakPos, length - breakPos, words)) {
return true;
}
Expand All @@ -467,6 +467,15 @@ private boolean checkCompoundRules(
return false;
}

private boolean mayHaveCompoundRule(List<IntsRef> words) {
for (CompoundRule rule : dictionary.compoundRules) {
if (rule.mayMatch(words)) {
return true;
}
}
return false;
}

private boolean checkLastCompoundPart(
char[] wordChars, int start, int length, List<IntsRef> words) {
IntsRef ref = new IntsRef(new int[1], 0, 1);
Expand All @@ -475,7 +484,12 @@ private boolean checkLastCompoundPart(
Stemmer.RootProcessor stopOnMatching =
(stem, formID, morphDataId, outerPrefix, innerPrefix, outerSuffix, innerSuffix) -> {
ref.ints[0] = formID;
return dictionary.compoundRules.stream().noneMatch(r -> r.fullyMatches(words));
for (CompoundRule r : dictionary.compoundRules) {
if (r.fullyMatches(words)) {
return false;
}
}
return true;
};
boolean found = !stemmer.doStem(wordChars, start, length, COMPOUND_RULE_END, stopOnMatching);
words.remove(words.size() - 1);
Expand Down