-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexReader.close should forcefully evict entries from FieldCache [LUCENE-2135] #3211
Comments
Michael McCandless (@mikemccand) (migrated from JIRA) This is not unlike what we had to do in creating CloseableThreadLocal... that was another case where the underlying impl failed to free things as immediately as we'd like. |
Earwin Burrfoot (migrated from JIRA) A better approach is to don IR-keyed weakHashMaps completely and bind everything you need onto IR itself. That's how I do it and it works like a charm. |
Uwe Schindler (@uschindler) (migrated from JIRA) #1906... |
Yonik Seeley (@yonik) (migrated from JIRA)
That would be nice... I'd love to see a |
Earwin Burrfoot (migrated from JIRA)
I'm currently using - <T> T IndexReader.component(Class<T> key) I can try to conjure a patch this weekend. |
Uwe Schindler (@uschindler) (migrated from JIRA)
Thats much better than untyped. Same like AttributeSource. +1 for a patch. |
Christian Kohlschütter (migrated from JIRA) Please see #3209 for a refactoring of FieldCache, which also addresses these problems. |
Earwin Burrfoot (migrated from JIRA)
It doesn't address a problem of adding custom components to IR. It also does complicate IR beyond that unholy mess it already is. I think it's better to have an ability to add 'any' kind of component to IR, and then implement whateverCaches over it. |
Christian Kohlschütter (migrated from JIRA) I haven't followed the aforementioned discussion on the mailing list, but I think this issue covers a few things that are not mentioned explicitly here. Maybe it is a good idea to summarize the actual problems/challenges/benefits in a few sentences? What I understand is that you plan to add arbitrary (cacheable) attributes to IndexReader. I suggest to move these features to the IndexCache proposed in #3209. Especially when using decorating IndexReaders (things like "ReadOnlyIndexReader") you would not want to store attributes separately from the decorated IndexReader. The same probably applies to SegmentReader with all its clones. IndexCache would provide a common base for the extensions you mentioned. (i.e. you are welcome to apply your patches on top of LUCENE-2133). |
Michael McCandless (@mikemccand) (migrated from JIRA) I would love to see a bigger solution here, but in the interim, I The patch adds FieldCache.purge to the interface. This is technically A few tests (incl back-compat) needed fixing, because they were |
Uwe Schindler (@uschindler) (migrated from JIRA) In 2.9. we wrote in the BW section, that FieldCache interface is no BW problem as nobody ever can implement it (because the FileCacheImpl singleton is the only used one). Ok, you can implement it without any use. :-) |
Uwe Schindler (@uschindler) (migrated from JIRA) In my opinion, all IndexReaders should call purge, mabe put it on toplevel IR.close default impl? Because if you request FieldCache from Top-level (which you should not do, but you can), it should also be purged. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Hmm... this actually gets tricky to get right, because of the FieldCacheKey. EG on closing a SegmentReader that's a clone of another, you don't want to evict it from the FieldCache. I guess I could fix each of the IndexReader subclasses to evict themselves from the cache. Let me look into that... |
Yonik Seeley (@yonik) (migrated from JIRA)
It's almost like we want two caches... one with entries that are independent of any changes in deleted docs (like the current FieldCache), and one that isn't. |
Michael McCandless (@mikemccand) (migrated from JIRA) New patch, also evicts the other subclasses of IR from FieldCache. |
Uwe Schindler (@uschindler) (migrated from JIRA) +1 I just noticed, it is even possible to retrieve a field cache from the FilterIndexReader and that would be a duplicate of the dlegate's cache. Very ugly. |
Michael McCandless (@mikemccand) (migrated from JIRA)
Yeah, not good. Should we default getFieldCacheKey to delegate? A subclass of FIR would presumably need to then override if their filtering altered what's in the field cache. |
Uwe Schindler (@uschindler) (migrated from JIRA) +1 Good idea, just add a note to the method javadocs, that you have to override this, if you change the contents by the filter. |
Christian Kohlschütter (migrated from JIRA) Honestly, please have a look at #3209, I really think it is a good starting point to solve all these problems. Could we perhaps merge the two issues (LUCENE-2133 and LUCENE-2135)? A quick summary of #3209: The patch allows one or more IndexReaders to share common cache information (whatever this is), stored in the same "IndexCache" instance. The IndexCache is designed to contain any cacheable/volatile information that can be regenerated from the IndexReader. For example: all clones of SegmentReader share the same SegmentReaderIndexCache with the original instance, containing the ThreadLocals of the "core reader". To provide arbitrary cacheable objects we could now extend IndexCache by a simple HashMap (it does not need to be a WeakHashMap, since the IndexCache is closed and purged as soon as the original IndexReader is closed). If you wish so, with the help of IndexCache we might even easily implement two different field caches for the same IndexCache instance, one that changes with deleted docs and another one that does not. Basically we may add any other kind of cache at a later point without touching IndexReader again. To re-use Earwin Burrfoot statement from above, that would then not "complicate IndexReader beyond that unholy mess it already is." |
Michael McCandless (@mikemccand) (migrated from JIRA) I definitely plan to have a look at #3209, but that's a rather large (and, good, on first read!) change to Lucene. I just don't think it should hold this small change up. |
Michael McCandless (@mikemccand) (migrated from JIRA) New patch, adds override to FIR.getFieldCacheKey |
Earwin Burrfoot (migrated from JIRA)
If I'm reading your patch right, to add something as a user to DirectoryReader+SegmentReader, I have to extend SegmentReaderIndexCache, IndexCache, then extend SegmentReader and DirectoryReader, and override all methods in DirectoryReader that create SegmentReader. My aim is to be able to bind stuff to readers without overriding them, delegating, or touching in any manner except providing certain factories on creation. |
Christian Kohlschütter (migrated from JIRA)
No, the functionality you intend to have is just not there at the moment. But it could be added directly to IndexCache (and thus, to all subclasses of IndexCache automatically).
We could add getProperty/setProperty methods to IndexCache. You could then bind/get arbitrary objects as follows: IndexReader ir = // somehow create your new reader (Personally, I prefer standardized string keys to avoid collisions, just like in the Servlet API, for example) Again, this is not yet implemented, but could be done easily, without affecting any existing IndexReader or the other changes on FieldCache etc. |
Christian Kohlschütter (migrated from JIRA) There is some discussion in #3209 where we need a decision that also affects this issue. Could you please check and comment? Thanks! |
Michael McCandless (@mikemccand) (migrated from JIRA) backport |
Spinoff of java-user thread "heap memory issues when sorting by a string field".
We rely on WeakHashMap to hold our FieldCache, keyed by reader. But this lacks immediacy on releasing the reference, after a reader is closed.
WeakHashMap can't free the key until the reader is no longer referenced by the app. And, apparently, WeakHashMap has a further impl detail that requires invoking one of its methods for it to notice that a key has just become only weakly reachable.
To fix this, I think on IR.close we should evict entries from the FieldCache, as long as the sub-readers are truly closed (refCount dropped to 0).
Migrated from LUCENE-2135 by Michael McCandless (@mikemccand), resolved Jun 02 2010
Attachments: LUCENE-2135.patch (versions: 3)
Linked issues:
The text was updated successfully, but these errors were encountered: