-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format #94
Format #94
Conversation
Expose StreamState
@fulmicoton You seem to keep submitting PRs by mistake. Is there a way to prevent that from happening? Also, this has caused me to take a look at your fork, and to be honest, this seems like a pretty poor situation to be in. Would it be possible for you to elaborate on the motivation for some of these changes? In particular, I notice that many of them are cosmetic, which is going to make it quite difficult to bring in any new changes to I also notice that you removed my name from the license copyright and also the authors list in the |
I sincerely apologize for all the noise. I'll try to investigate if it is possible to change the default destination branch when creating a PR, but I think this is hardcoded in github.
The original motivating changes were :
Post fork, a recent change makes it possible to stream the fst backward.
I was under the impression that you were not welcoming the two motivating changes anyway, so
Your name is in a comment right next to the authors section. |
Yeah I'd rather have my name added back to both places. I appreciate you being conscientious about bug reports. :-) If that becomes a serious issue that we can certainly re-evaluate. But I think it should be fine. I think in general it would be nice to find a way to re-align here and share one crate. I'm not sure when I'll have time to think about that more deeply, but for when that time comes, to be clear, the important things blocking that for you are the generics over
Oh yeah that would be great to get back into |
I put your name back in the Cargo.toml. What is the second place? If it is the license your name is in it of course.
That would be great!
To be entirely honest, AFAIK noone is relying on streaming the state with the automaton (I should doublecheck) and I am more and more convinced that it is better to use automaton that are matching exact distance. ( streaming backward, and Also, just to put that on your radar : there is a bit of a push for moving tantivy IO to something that is not married with MMap. The motivation here is having tantivy work on WASI, having a distant index (on S3 for instance), and more recently matrix wants an encrypted directory. I think this is not for tomorrow, but tantivy dictionary may move to something that can work over an |
(I asked on the github community forums if it is possible to set the default PR branch. |
Oh, I must have misread the diff. All is good then. Thanks!
Yeah, I think it'd be pretty challenging to use fsts without memory maps. |
@fulmicoton Is there any particular reason that you use I'm going to try exploring adding your two proposed features along with polishing up a bunch of stuff. I'd really like to get you back on |
I'm also planning on dropping the |
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
If by upstreaming the implementation you mean sending a PR with all of the levenshtein_automata code, I don't think this is a good idea. I can add a levenshtein_automata as a dependency in the
That would be awesome. |
@fulmicoton Ah yeah, keeping it as a separate crate makes sense then! You can make support for |
@BurntSushi I believe this is already the case today. Meilisearch for instance uses your fst crate and the levenshtein_automata crate. |
Apologies I forgot to answer this question. I am not sure what is the right one for this job. If you tell me I think i picked |
@fulmicoton Gotya, great. The next release of Another question: how big a deal is it to have the FST format break on you? That is, if |
Users are expected to keep the mean to reindex everything. In the future, if you update the fst crate often and break the index format, I will simply delay the updates version to every year or every 6 months or so. |
@fulmicoton Aye. I do not anticipate breaking the index format often. If I do a 1.0 release, then I'll commit to index format stability for 1.x.y. (My plan is to add optional CRC32 checking on the FST. Building an FST will always write a checksum, but opening an FST will not check it by default. It will be opt-in.) |
Understood. FYI, the CRC32 check is a bit of anti feature for tantivy. |
Yeah. If computing the CRC32 leads to measurable overhead during construction, then I'll make it optional to compute as well. Otherwise it should be practically free. But other people have experienced index corruption probably due to hardware failurea, so this is an important diagnostic tool. |
@BurntSushi I would be surprised if you see any overhead. No need to make it optional. I just wanted to let you know this is not something that is useful for tantivy... |
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
No description provided.