-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: looser lifetime for loading from byte slice #92
Comments
Right. This is basically why it's not provided. That lifetime parameter will then infect everything in the public APIs of this crate. IIRC, when I was i initially designing the API, I could not get it to work. This was years ago, so I don't remember precisely why, but I wouldn't be surprised if there was some poor interaction between said lifetime parameter and streaming iterators. On the other hand, if you could get it to work, then I might be willing to entertain that, since I recognize that the current API is a bit too restrictive in that sense. With that said, if that extra lifetime parameter leads to a lot of additional API complexity, then I'm not sure if it's worth it. Another, more lofty, idea I've had for a while is to perhaps abandon the formulation of a streaming iterator in this crate as it exists now, and perhaps use something more like a cursor API. But I haven't tried this yet, so I don't know how well it would work. And it's a much bigger change. |
Could you say more about why? Certainly, you could "just" clone the bytes and give |
@BurntSushi I have many fsts and they are short-lived. Their lifecycles are short enough that we would find it onerous to allocate a copy of the fst data each time we want to read them. They aren't stored in files. |
I'm just trying to brainstorm work arounds for you. Doing that requires understanding more about what you're doing. Because even if there is a fix for this, I don't see it happening any time soon. How big are your fsts? |
@BurntSushi Thanks for trying to help me find a quick solution! My fsts have single-digit millions of elements, and the size of the underlying data is between 10MiB and 1GiB (with a median around 100MiB). I haven't benchmarked how much compression I get from fst generation. I see two paths to an immediate solution, neither of which require changes to the fst library:
I think I like (2), because it's simple and high-performance. WDYT? |
Yeah, I would probably go with (2) if trying (1) did indeed produce a suboptimal result. |
Thanks! |
Just to help a little this conversation to move forward, I have tried to clone an FST of 400MiB each time I needed to search in it and it was around 200-400ms. I then made an unsafe transmute to make my slice appear like a static one and it was 50us to load it from LMDB. This is unsafe but it work great. |
I think for now, the best solution to this problem is probably to unfortunately use I'll continue to brainstorm about this, but I think both of the solutions discussed above are major changes to the library that are unlikely to happened, especially given that I'm not sure if they would work or not. In other words, I think something this is only going to get fixed if someone builds a new library or if i happen to get motivated to do some lengthy experimentation. So I'm going to close this for now. |
OK, I'm going to try fixing this by making FSTs generic over |
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
This change was not nearly as bad as I thought it would be. Instead of trying to provide all the different possible ways to store some bytes, we instead make FSTs maximally flexible by accepting any type that can cheaply provide byte slice. This should resolve a number of issues with folks constructing FSTs in ways that weren't supported by the old constructors. As a bonus, we no longer need to directly depend on a specific implementation of memory maps. Conveniently, the `Mmap` type in the `memmap` crate already implements `AsRef<[u8]>`, so using memory maps is as simple as let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap()); let fst = Fst::new(mmap).unwrap(); Fixes #92, Fixes #94, Fixes #97
Currently, the primary way to load an fst object (Map/Set/Raw) from a byte slice is to use
from_static_slice
. While that solves the use case of loading data that lives for the lifetime of the program, it appears to be overly-restrictive for other use cases.For example, I have fst data that I've written into a network buffer:
from_bytes
won't work.'static
, sofrom_static_slice
won't work.from_shared_bytes
I would need to change my internal APIs to useArc<Vec<u8>>
instead of&[u8]
(which may add bugs and performance regressions) .While I could try to use the existing mmap API, I believe I would need to create a file-like object that wraps my byte slice, and then memory map it, which is unergonomic and error-prone.
My recommendation is to add a
from_slice(bytes: &[u8])
constructor to Map, Set, and Raw. My understanding is that this is unfortunately not trivial, because theFstData
type does not take a lifetime parameter, which would be required while it holds a reference to the byte slice.Nevertheless, is there interest in adding this feature?
Thanks!
The text was updated successfully, but these errors were encountered: