Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce class/member visibility of all normalizer and stemmer classes [LUCENE-10561] #11597

Closed
asfimport opened this issue May 7, 2022 · 12 comments

Comments

@asfimport
Copy link

asfimport commented May 7, 2022

This is a spin-off of #11348.

Constants and methods in those classes are exposed to the outside packages; we should be able to limit the visibility to private or, at least to package private.

This change breaks backward compatibility so should be applied to the main branch (10.0) only, and a MIGRATE entry may be needed.

Also, they seem unchanged since 2008, we could refactor them to embrace newer Java APIs as we did in #540.


Migrated from LUCENE-10561 by Tomoko Uchida (@mocobeta), resolved May 14 2022
Linked issues:

Pull requests: #883

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

you can just make the entire classes package private. there are probably a ton of stemmer classes in the same situation. only the tokenstream(factory) need be public.

@asfimport
Copy link
Author

Rushabh Shah (@shahrs87) (migrated from JIRA)

Hi,
This is my first PR so apologies if I ask dumb questions.
I created PR for this change. When I am running ./gradlew check, it is throwing error below.

apache-lucene % ./gradlew check
> Task :lucene:analysis:common:generateClassicTokenizerChecksumCheck FAILED

FAILURE: Build failed with an exception.

* Where:
Script '/Users/rushabh.shah/apache-lucene/gradle/generation/regenerate.gradle' line: 184

* What went wrong:
Execution failed for task ':lucene:analysis:common:generateClassicTokenizerChecksumCheck'.
> Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: generateClassicTokenizer):
  Current:
    lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=c825a8b8d0d0d893b4914e7161bcd119e7b07b40
  
  Expected:
    lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=381a9627fd7da6402216e3279cf81a09af222aaf

I understand that I need to update checksum somewhere but don't know whether we have some script that does that or we have to compute manually. Please advise.
@rmuir @mocobeta

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

Some of the tokenizers are auto-generated. For example ClassicTokenizerImpl is generated by "jflex" tool from ClassicTokenizer.jflex sources.
The build is just failing because the wrong file was changed.
Personally I recommend avoiding ClasicTokenizerImpl or similar for this issue, because these particular generated tokenizers are very complicated. It is hard to reduce the member/class visibility due to the way the code generation works.

@asfimport
Copy link
Author

Rushabh Shah (@shahrs87) (migrated from JIRA)

Thank you @rmuir  for the comment. I have updated the PR as per your suggestions. Please review.

@asfimport
Copy link
Author

Rushabh Shah (@shahrs87) (migrated from JIRA)

@rmuir  Also can you please assign the Jira to me.

@asfimport
Copy link
Author

Robert Muir (@rmuir) (migrated from JIRA)

I think i didn't explain my suggestion well enough. When i said "make entire classes package private", I mean that, for example, ArabicNormalizer does not need to be public class anymore.

I think this is easier than modifying many individual constants. Some of the constants modified in the PR are not related to stemmers/normalizers and may cause problems.

Generally speaking, public constants are harmless, so I don't think there is much benefit in hiding them.

But entire classes that need not be public, that is worth fixing because it clutters up javadoc and API for no good reason.

@asfimport
Copy link
Author

Rushabh Shah (@shahrs87) (migrated from JIRA)

Thank you @rmuir  for the clarification. One followup question. Should I change only the classes that are Normalizer or Stemmer in this PR and not touch other classes ?

@asfimport
Copy link
Author

Tomoko Uchida (@mocobeta) (migrated from JIRA)

Also can you please assign the Jira to me.

I don't think we can assign external contributors. Don't worry about it - it's not so important thing; A CHANGE entry is sufficient for us to track who changes it.

 

@asfimport
Copy link
Author

Tomoko Uchida (@mocobeta) (migrated from JIRA)

Generally speaking, public constants are harmless, so I don't think there is much benefit in hiding them.

There are some public static constants their type is Array - char[][]. I thought it'd be safe to make them private but is making the entire class package-private sufficient?

@asfimport
Copy link
Author

Tomoko Uchida (@mocobeta) (migrated from JIRA)

One followup question. Should I change only the classes that are Normalizer or Stemmer in this PR and not touch other classes ?

I'd agree with that. It'd be better not to touch Tokenizer/TokenFilters in this issue. It's another issue and changing them will break users' code; we'd need to be careful about it.

@asfimport
Copy link
Author

asfimport commented May 12, 2022

Tomoko Uchida (@mocobeta) (migrated from JIRA)

(So... we need to switch Jira/GitHub even such a relatively small issue, and conversation is scattered over both of them. This is a good motivation for #11593 to me.)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 694d797 in lucene's branch refs/heads/main from Rushabh Shah
https://gitbox.apache.org/repos/asf?p=lucene.git;h=694d797526a

LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes (#883)

Co-authored-by: Rushabh Shah <shahrs87@apache.org>
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants