Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor Improvements LUCENE-10422 #679

Merged
merged 18 commits into from
Mar 21, 2022
Merged

Conversation

mogui
Copy link
Contributor

@mogui mogui commented Feb 14, 2022

Description

Change how Monitor manages it's directory, IndexWriter and IndexReader, giving the possibility to

  • use a custom Directory implementation
  • use a readonly QueryIndex in order to have more Monitor instance on different server reading from the same index (now the index reader is created from the index writer so it is impossible to make a readonly Monitor)

Solution

I've extended MonitorConfiguration, with a readonly boolean property that governs ho the QueryIndex is created, and I also added a functional property that acts as a Directory Provider

Tests

I've added tests to confirm two readonly monitor could work as expected.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @mogui! I left some comments - I think we can break things up into read-only and writeable classes which should help simplify things a bit.

@mogui
Copy link
Contributor Author

mogui commented Feb 16, 2022

@romseygeek I've updated with the requested changes

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mogui, thanks for your updates. I left a couple of comments.

moved common logic to QueryIndex now an abstract class
Removed Lazy Parsing of queries, readonly monitor uses in memory cache just like Writable, but without locking and purge executor.
@mogui
Copy link
Contributor Author

mogui commented Feb 18, 2022

@romseygeek OK I've moved things around, making QueryIndex an abstract class sharing common code for both implementations
ReadonlyQueryIndex have the same logic of in-memory laoding of parsed queries, but without locking handling and without purge executor.
I've made it usable by enabling purging cache so it can be done externally.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mogui! I left some more comments.

@mogui
Copy link
Contributor Author

mogui commented Feb 22, 2022

@romseygeek I should have fixed everything, also added few lines of docs to explain read-only behaviour.


@Override
void purgeCache(CachePopulator populator) throws IOException {
manager.maybeRefresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race here, I think. Once you've called manager.maybeRefresh() then a subsequent call to search will return the new searcher, but you may not have updated this.queries yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I call maybeRefreshBlocking instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually the best solution is to remove the query cache entirely for this impl, which is where you started out - sorry for all the back and forth here. We can have a background thread that calls maybeRefresh() on the manager to keep up with updates, but all the queries will be read directly from the searcher and parsed as they are executed. The in-memory cache works when the Monitor in question is handling updates as well, but trying to do that when you have no idea what the changes are between IndexReaders is going to be nasty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think it was the best solution too, I'll work getting back to that solution.
Don't worry, all the back and forth got me to understand everything better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek I just realized that main branch from which I started the fork requires java 17, is there a way to make a build that targets Java 11 or do I have to remake the pull request on topo of 9_0_0 branch ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek in your opinion it makes sense to reuse purgeFrequency and purgeFrequencyUnits from MonitorConfiguration to configure the refreshing thread or is better a new pairof configurations?

Copy link
Contributor Author

@mogui mogui Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek did you had a chanche to look out the new commits?

live query parsing
added executor to refresh index
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delays @mogui. This is pretty close I think? I asked for a few changes in ReadOnlyQueryCache but the rest is looking good.

You should target main, so we need Java 17, yes.

I wonder if we should rename purgeCache to something like refreshCache, as cacheing only really applies to the writeable Monitor. Maybe in a followup though, as that's an API change and I'd need to think about deprecations etc.

@mogui
Copy link
Contributor Author

mogui commented Mar 14, 2022

@romseygeek fixed !

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more comments. Can you merge in the latest master branch as well, as github reckons there are conflicts now?

niko.usai added 3 commits March 15, 2022 09:44
Moved listeners to abstract class QueryIndex to give listener logic to Readonly too and fix an ugly test
@mogui
Copy link
Contributor Author

mogui commented Mar 15, 2022

@romseygeek I've merged main and restored listeners to the abstarct class, You were right the sleep was very ugly

@mogui
Copy link
Contributor Author

mogui commented Mar 21, 2022

@romseygeek sorry to bother you, just a reminder :D

@romseygeek
Copy link
Contributor

One last thing, can you add a CHANGES.txt entry in the Lucene 9.2.0 New Features section? And then I think we're good to merge! Thanks for all your patience on this.

niko.usai added 2 commits March 21, 2022 15:33
* main:
  LUCENE-10473: Make tests a bit faster when running nightly. (apache#754)
  LUCENE-9905: Fix check in TestPerFieldKnnVectorsFormat#testMergeUsesNewFormat
  LUCENE-9614: Fix rare TestKnnVectorQuery failures
  LUCENE-10472: Fix TestMatchAllDocsQuery#testEarlyTermination (apache#753)
  LUCENE-10418: Move CHANGES to the correct section.
  LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case. (apache#672)
  LUCENE-10469: Fix score mode propagation in ConstantScoreQuery. (apache#750)
  LUCENE-10452, LUCENE-10451: mention hunspell changes in CHANGES.txt
  LUCENE-10452: Hunspell: call checkCanceled less frequently to reduce the overhead (apache#723)
  Add 9.2.0 section to release notes
  LUCENE-10451 Hunspell: don't perform potentially expensive spellchecking after timeout (apache#721)
  LUCENE-10463: increment java version to 17 in smoke tester (apache#748)
@mogui
Copy link
Contributor Author

mogui commented Mar 21, 2022

Done!
Thanks for Your patience and support @romseygeek !

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, we wrap lines in CHANGES to make it easier to read in text editors

Co-authored-by: Alan Woodward <romseygeek@gmail.com>
@mogui
Copy link
Contributor Author

mogui commented Mar 21, 2022

Ok!

@romseygeek romseygeek merged commit be99178 into apache:main Mar 21, 2022
asfgit pushed a commit that referenced this pull request Mar 21, 2022
This commit adds a read-only monitor implementation that can
search the QueryIndex of another monitor without supporting adding
new queries.
@mogui
Copy link
Contributor Author

mogui commented Mar 23, 2022

@romseygeek There is something else in roadmap or future ideas regarding this package that I can contribute? Or where I can look to contribute more to the project. I'd like to remain involved if I can, it was nice to wotk with.

@romseygeek
Copy link
Contributor

@mogui there isn't a roadmap for the monitor itself, but please do open issues and/or PRs if you have ideas for other improvements!

wjp719 added a commit to wjp719/lucene that referenced this pull request Mar 28, 2022
* main: (38 commits)
  remove obsolete image/description from luke/README.md
  Upgrade to forbiddenapis 3.3 (apache#768)
  LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori (apache#740)
  LUCENE-9651 Update benchmark module docs (apache#759)
  LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables (apache#736)
  LUCENE-10481: FacetsCollector will not request scores if it does not use them (apache#760)
  LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (apache#758)
  Add back-compat indices for 9.1.0.
  Synchronize CHANGES.
  LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently (apache#737)
  Add version 9.1.0.
  DOAP changes for release 9.1.0
  LUCENE-10422: Make errorprone happy
  LUCENE-10478: mark Test4GBStoredFields as @monster (apache#757)
  LUCENE-10422: Read-only monitor implementation (apache#679)
  LUCENE-10473: Make tests a bit faster when running nightly. (apache#754)
  LUCENE-9905: Fix check in TestPerFieldKnnVectorsFormat#testMergeUsesNewFormat
  LUCENE-9614: Fix rare TestKnnVectorQuery failures
  LUCENE-10472: Fix TestMatchAllDocsQuery#testEarlyTermination (apache#753)
  LUCENE-10418: Move CHANGES to the correct section.
  ...
wjp719 added a commit to wjp719/lucene that referenced this pull request Mar 28, 2022
* main: (52 commits)
  remove obsolete image/description from luke/README.md
  Upgrade to forbiddenapis 3.3 (apache#768)
  LUCENE-10393: Unify binary dictionary and dictionary writer in kuromoji and nori (apache#740)
  LUCENE-9651 Update benchmark module docs (apache#759)
  LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables (apache#736)
  LUCENE-10481: FacetsCollector will not request scores if it does not use them (apache#760)
  LUCENE-10477: mention 'call multiple times' in Query.rewrite javadoc (apache#758)
  Add back-compat indices for 9.1.0.
  Synchronize CHANGES.
  LUCENE-10464, LUCENE-10477: WeightedSpanTermExtractor.extractWeightedSpanTerms to rewrite sufficiently (apache#737)
  Add version 9.1.0.
  DOAP changes for release 9.1.0
  LUCENE-10422: Make errorprone happy
  LUCENE-10478: mark Test4GBStoredFields as @monster (apache#757)
  LUCENE-10422: Read-only monitor implementation (apache#679)
  LUCENE-10473: Make tests a bit faster when running nightly. (apache#754)
  LUCENE-9905: Fix check in TestPerFieldKnnVectorsFormat#testMergeUsesNewFormat
  LUCENE-9614: Fix rare TestKnnVectorQuery failures
  LUCENE-10472: Fix TestMatchAllDocsQuery#testEarlyTermination (apache#753)
  LUCENE-10418: Move CHANGES to the correct section.
  ...

# Conflicts:
#	lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
#	lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants