Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer using an Arc<[u8]> instead of an Arc<Vec<u8>> #97

Closed
Kerollmops opened this issue Feb 21, 2020 · 4 comments
Closed

Prefer using an Arc<[u8]> instead of an Arc<Vec<u8>> #97

Kerollmops opened this issue Feb 21, 2020 · 4 comments

Comments

@Kerollmops
Copy link
Contributor

Kerollmops commented Feb 21, 2020

In a previous comment it seemed you were ready to probably introduce breaking changes to the library.

I found out that an fst (and thus a set and a map) can be created from shared bytes, those are represented by an Arc of Vec<u8>, I was wondering why you chose to use this type instead of an Arc<[u8]>.

One of the advantage of an Arc<[u8]> is that it can be used every where AsRef<[u8]> is needed, an Arc<Vec<u8>> doesn't implement this trait.

The only advantage I see in this choice is for performance reason that will be rarely triggered: constructing an Arc<Vec<u8>> from an existing vec is cheaper as it will not copy the entirety of the bytes, Arc<[u8]> does.

I will stop bothering you, and let you work on more interresting features now :)
I just wanted to make sure to break things when possible and clean the library a little bit more if this makes any sense, of course.

@BurntSushi
Copy link
Owner

This is a great issue, thanks. If you have other suggestions for breaking changes, please share them. Now is a good time. Unfortunately, the way I do development in FOSS isn't terribly conducive to collaboration because I have a terrible time context switching. So I end up doing things in short spurts.

As for why I chose Arc<Vec<u8>>, I don't know. I added it about 3.5 years ago. Did Arc even supports DSTs back then? Ah okay... It looks like they were added in the Rust 1.2 release. So by mid 2016 (when Arc<Vec<u8>> was added), Arc<[u8]> had been available for a year. It's possible that I just didn't consider it at all, or didn't even know about it.

... but looking at std, I actually don't see an easy (and safe) way to build an Arc<[u8]>. Do you? Ah, I see now, there is a impl From<Vec<T>> for Arc<[T]>. But that wasn't introduced until Rust 1.21, which came out in mid 2017, well after I added the Arc<Vec<u8>> API. So that probably explains it (combined with me probably not even considering it at all).

The extra copy does indeed seem unfortunate, particularly given that FSTs tend to be quite large.

With all that said, my plan at the moment is to adopt what tantivy's fork of fst did: add a type parameter that satisfies AsRef<[u8]>. That should let you use whatever type you want. I have resisted doing this in the past because I really do not want to thread a type parameter everywhere. But tantivy apparently needs it and this has been a pain point for others, e.g., in #92.

Does this assuage your concern?

@Kerollmops
Copy link
Contributor Author

No worries, that was just the only reason (IIRC) I forked the library and step back because it is not pleasant to work with a fork.

You said that there were no safe way to construct an Arc<[u8]> from a Vec<u8> before 1.21, but From<&[u8]> exists since 1.21.

Anyway, you will change the API to make it better, so don’t take my comment into account.

Thank you for your work :)

PS: will you fix #92 to make it safe to read an FST from any slice lifetime?

@Kerollmops Kerollmops changed the title Prefer ussing an Arc<[u8]> instead of an Arc<Vec<u8>> Prefer using an Arc<[u8]> instead of an Arc<Vec<u8>> Feb 22, 2020
@BurntSushi
Copy link
Owner

You said that there were no safe way to construct an Arc<[u8]> from a Vec before 1.21, but From<&[u8]> exists since 1.21.

Yes? That's also when the From impl for Vec was added. Both were added a year after I added the Arc<Vec> APIs.

PS: will you fix #92 to make it safe to read an FST from any slice lifetime?

Hmm. Maybe I wasn't being clear. Making Fst generic over AsRef<[u8]> will fix that, no?

@Kerollmops
Copy link
Contributor Author

Kerollmops commented Feb 22, 2020

Yeah, sorry, I am not an early morning guy :)

I think that the parametric type with Arc<[u8]> would fix that!

BurntSushi added a commit that referenced this issue Feb 24, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Feb 25, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Mar 6, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Mar 6, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Mar 7, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Mar 7, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
BurntSushi added a commit that referenced this issue Mar 7, 2020
This change was not nearly as bad as I thought it would be. Instead of
trying to provide all the different possible ways to store some bytes,
we instead make FSTs maximally flexible by accepting any type that can
cheaply provide byte slice. This should resolve a number of issues with
folks constructing FSTs in ways that weren't supported by the old
constructors.

As a bonus, we no longer need to directly depend on a specific
implementation of memory maps. Conveniently, the `Mmap` type in the
`memmap` crate already implements `AsRef<[u8]>`, so using memory maps is
as simple as

    let mmap = memmap::Mmap::map(&File::open("something.fst").unwrap());
    let fst = Fst::new(mmap).unwrap();

Fixes #92, Fixes #94, Fixes #97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants