Rust has many good things going for it, but it's ownership-based resource management does not make "data layout" one of them.
If you'd like to represent a Vec<String>
, a variable length list of variable length lists of characters, you are going to need a bunch of allocations.
And this makes sense: you are able to remove()
any element of that outer list, push characters on to any of the constituent strings, and generally mutate things in ways that tie your hands and (seem to) force you to use Rust's built-in types and their pre-existing layouts.
The problem here, and there may not be one for you, is that lots of little allocations are generally worse than fewer larger allocations. Each little allocation needs to get allocated in the first place, which takes work. Accessing these allocations involves darting around in your computer's physical memory, at locationts that needn't be near one another. Finally, de-allocating these allocations is surprisingly exhausting: someone has to put the pieces back together so others can use them. Your computer can do less work if it had fewer, larger allocations.
This tension, between ownership based resource management and the attendant work of actually managing the owned resources, prompts us to look for other ways to reach the same goals. What are those goals? I have a few, and there may be more, but I would love the data layout to be:
- Flat: arbitrarily many instances of a type should be able to fit in a small number of allocations, which may be arbitrarily large.
- Sequential: a sequential traversal of the many instances should result in a sequential traversal of the few allocations.
- Economical: the representation should be no larger than the input, and ideally smaller.
In addition, it would be great for implementations to be ergononic, in that they either expose reference to Rust types or allow navigation with Rust idioms, and to be safe in that they don't use the unsafe
keyword.
I haven't yet figured out how to do all of these at the same time yet.
What do we plan to give up to get these?
I'm not aiming for a drop-in replacement for Vec<T>
.
I'm personally interested in a read-only or read-mainly type, where one can land a bunch of data that is immediately immutable.
In Rust terms, the type would support Index
but not IndexMut
: you can look, but you can't touch.
Here are some examples I've done:
-
Abomonation is a crate that just
memcpy
s types down as bytes, and thenmemcpy
s any memory the type references, recursively. You then correct some pointers, make sure that no one actually de-allocates the bytes directly, and you have a pretty janky region allocator.Rust's lifetimes seem to do a pretty good job of ensuring that one only uses the lens of references to types, which allows them to masquerade as Rust types without risking de-allocation. However, the crate is wildly
unsafe
, and it's entirely unclear whether it can be held in a way that satisfies Rust's safety requirements. I recommend you do not use it!However, the ergonomics were great! You push in references
&T
and get back&T
references in the future, all backed by one allocation. And it was quite fast. Still is, but I recommend you do not use it. -
Columnation is follow-on work that attempts to avoid the highlight Abomonation issues around .. alignment mostly. It used typed allocations in the form of
Vec<T>
under the hood, rather thanVec<u8>
, which meant that someone else needs to hold the bag around the unsafety of moving between binary and typed allocations.At the same time, it gave the same ergonomics of accepting and providing
&T
references. It's stillunsafe
, because returning a&Vec<Foo>
when you don't really have aVec<Foo>
around to reference is not obviously something you can do in safe Rust. It's not obviously unsound to use it in otherwise safe code, but Rust's safety guarantees are still evolving, and as a rule the occurrence ofunsafe {
in your code (or code you use) means you are relying on some human both to have been smart in the past and to be vigilant in the future.It is also pretty fast. I don't recommend against using it, but you've seen my opinion on using
unsafe
code just above. I'd prefer to use something that was built on safe Rust.
In this post we'll talk about how to do something similar to columnation in safe Rust. A lot of the thinking comes out of collaboration with Moritz Hoffmann, who has similar goals but even harder constraints to work with.
The concession we'll make in this post is to exchange ergonomics for safety.
We'll give up the property of providing access to your data through a &T
reference, and instead use Generic Associated Types (GATs).
For example, rather than provide a &Vec<T>
, we might return a &[T]
, which is similar but not the same.
Rather than provide a &(S, T)
we might provide a (&S, &T)
.
Rather than an &Option<T>
an Option<&T>
.
Needless to say, your code that expected to see a &T
is likely to break, and you'll have to fix it.
Unfortunately, there don't appear to be a prevalence of container traits as much as there are types.
I don't know of a VecLike<T>
trait describing the methods an analogue of Vec<T>
would need to implement.
There are some traits, like Index
and IndexMut
, that capture "random access", but they have their own issues (they predate GATs).
For the moment, we'll see how bad the ergonomic hit feels, and perhaps it isn't all that bad!
Let's start with the trait we are going to implement, and then we'll build up some implementations. I'm going to start with an intentionally limited trait, and we'll add methods to it in the future, but I don't want to overwhelm you.
/// A stand in for `Vec<T>` with a different layout.
pub trait Columnar<T> {
/// Pushes an owned item onto `self`.
fn push(&mut self, item: T);
/// The number of contained elements.
fn len(&self) -> usize;
/// Type returned by the indexing operation.
/// Meant be similar to `T`, but not the same.
type Index<'a> where Self: 'a;
/// Access the `index`th element to be copied in.
fn index(&self, index: usize) -> Self::Index<'_>;
}
Two of these methods might look familiar.
Both push
and len
are methods on Vec
you pretty commonly use to populate the vector and to see how many things it contains.
That index
function looks a little different, and that Index<'a>
may be entirely baffling (it is a GAT).
The easiest way I can think of GATs is as generalized references.
You wanted to return a &T
, but that doesn't work out for some reason.
Instead, you describe an alternate type, similar to &T
in spirit, but one whose implementation you control.
We'll build up what I think are well-motivated cases for them as we go, and perhaps that will be helpful!
With this trait, and a deferred understanding of GATs, you should be able to repeatedly push into and read out of our types.
In that limited sense they are like a Vec<_>
, and we aren't going to make them much more like one.
What we are going to spend more time unpacking is the Index<'a>
type, and how closely it can resemble a &T
.
The easiest implementation we'll use is for primitive types like bool
, i32
, and std::time::Duration
.
For these types, we will replace Vec<T>
with .. wait for it .. Vec<T>
.
We don't need anything more complicated than a Vec
, so no need to invent something complicated.
impl Columnar<i32> for Vec<i32> {
fn push(&mut self, item: i32) {
self.push(item);
}
fn len(&self) -> usize {
self.len()
}
type Index<'a> = &'a i32;
fn index(&self, index: usize) -> Self::Index<'_> {
&self[index]
}
}
In this case, we do just use a reference as the Index<'a>
associated type!
It turns out this is just fine, and how nice that we can match user expectations.
It won't always be the case, but take the wins when you can.
We'll do the same implementation for each of the primitive types we can think of. In fact, you could use this implementation even for complicated types that you don't know what else to do with. It's a pretty handy pressure release valve, but we'll want to improve on it for other types of data.
Strings dip our toes into the water of variable-length allocations, without entirely wetting ourselves yet.
Our goal is to represent something like a Vec<String>
without needing an allocation for each element of the vector.
The approach is not too complicated: use one allocation for all the string content, and delimeters to figure out where one string starts and another ends.
/// A stand-in for `Vec<String>`.
pub struct ColumnString {
/// A sequence of indexes delimiting inserted strings.
///
/// Element `index` corresponds to the slice of bytes found at
/// `self.values[self.bounds[index] .. self.bounds[index+1]]`.
bounds: Vec<usize>,
/// String content, delimited by `self.bounds`.
values: Vec<u8>,
}
Notice that, even if it doesn't make sense yet, there are just two Vec
members here, each of which contain primitive types.
There will not be a large number of allocations behind this type.
impl Columnar<String> for ColumnString {
fn push(&mut self, item: String) {
self.values.extend_from_slice(item.as_bytes());
self.bounds.push(self.values.len());
}
fn len(&self) -> usize { self.bounds.len() - 1 }
type Index<'a> = &'a [u8];
fn index(&self, index: usize) -> Self::Index<'_> {
let lower = self.bounds[index];
let upper = self.bounds[index + 1];
&self.values[lower .. upper]
}
}
This all checks out to me, except for the whole type Index<'a> = &'a [u8];
thing.
Ideally that would be a &String
or maybe a &str
, or .. really anything involving strings, rather than bytes?
It turns out that not every sequence of bytes is valid UTF-8 text, and Rust's memory safety goes sideways if you treat them as such.
There are methods, like std::str::from_utf8, that allow you to go from &[u8]
to a Result<&str, Utf8Error>
, which is Rust's way of saying "maybe a &str
, but maybe an error instead if you gave me bad data".
The method has some overhead though, so we'll let you call it if you like.
If you only needed to test for equality, perhaps sort by something, or write the data as binary (tada!) then you can skip the overhead.
We do this because although Rust has a way to promise itself that e.g. String
data is always valid UTF-8 and stays that way, we do not have the ability to maintain that invariant using safe Rust outside of using the String
type itself.
We could just wrap some unsafe
around this, but we are trying not to do that.
Our next tranche of implementations builds up combinators of columnar implmeentations.
Rather than work with specific types, like i32
or String
, we'll work with abstract types like (S, T)
, Result<S, T>
, and Vec<T>
.
For each of these, we'll see how to build a Columnar
implementation, using techniques that generalize across multiple S
and T
.
The Rust tuple (S, T)
is fairly pervasive, but it also stands in for the struct
: Rust's product type.
Everything we're about to do would work equally well for any struct
, and could be automatic once I learn how to write derive
macros.
For now, just pairs.
/// A pair of Columnar types can host pairs of owned data.
impl<S, SC: Columnar<S>, T, TC: Columnar<T>> Columnar<(S, T)> for (SC, TC) {
fn push(&mut self, item: (S, T)) {
self.0.push(item.0);
self.1.push(item.1);
}
fn len(&self) -> usize { self.0.len() }
type Index<'a> = (SC::Index<'a>, TC::Index<'a>) where SC: 'a, TC: 'a;
fn index(&self, index: usize) -> Self::Index<'_> {
(self.0.index(index), self.1.index(index))
}
}
Perhaps the most intimidating line here is the first one: the impl
line.
We are describing a relationship between four types, S, SC, T, TC
where two pairs have a Columnar
relationship.
With those types named, and the required relationship stated, we are providing the implementation of Columnar<(S, T)>
for (SC, TC)
.
That is, if you show up with pairs (S, T)
, we can house them in a pair (SC, TC)
.
Informally, we'll stash the S
parts in SC
, and the T
parts in TC
.
The other confusing line (for me) is
type Index<'a> = (SC::Index<'a>, TC::Index<'a>) where SC: 'a, TC: 'a;
This is where we name the type that you'll get back when you call index
.
We won't give you back a &(S, T)
; that ship sailed a long time back.
Instead, we'll give you a pair of things.
Not a pair (&S, &T)
, though; we don't know for sure that SC
provides a &S
nor that TC
provides a &T
.
Instead, we use whatever SC
and TC
provide, presented in a pair.
It turns out that this implementation may be the controversial one. It's certainly the one that loses out to columnation in performance. The good news is that it can use less memory than columnation. Let's hold off on why that is until later, though.
Rust's sum type is capture by enum
, a type that can be one of several variants each of which may house different types.
The archetypical sum type in Rust is Result<S, T>
, each of which is either an Ok(S)
or an Err(T)
.
There doesn't seem to be an anonymous sum type, the way tuples are anonymous product types, so we'll just do Result
.
Our type will store S
and T
values, but it will need some help and we'll make it a struct
rather than just a pair.
pub struct ColumnResult<SC, TC> {
/// This could be substantially more efficient as e.g. a `Vec<(u64, u64)>`,
/// with one entry for each 64 items pushed, describing the cumulative sum
/// of `Ok` variants (say) and a bitfield of the associated variants.
indexes: Vec<Result<usize, usize>>,
s_store: SC,
t_store: TC,
}
The rough idea is that each time one inserts a Result<S, T>
, we'll figure out which flavor it is and insert the payload into the right store.
We'll also leave ourselves a note in self.indexes
, either Ok(index)
or Err(index)
, telling is which store to look into and at what position.
impl<S, SC: Columnar<S>, T, TC: Columnar<T>> Columnar<Result<S, T>> for ColumnResult<SC, TC> {
fn push(&mut self, item: Result<S, T>) {
match item {
Ok(item) => {
self.indexes.push(Ok(self.s_store.len()));
self.s_store.copy(item);
}
Err(item) => {
self.indexes.push(Ok(self.t_store.len()));
self.t_store.copy(item);
}
}
}
fn len(&self) -> usize { self.indexes.len() }
type Index<'a> = Result<SC::Index<'a>, TC::Index<'a>> where SC: 'a, TC: 'a;
fn index(&self, index: usize) -> Self::Index<'_> {
match self.indexes[index] {
Ok(i) => Ok(self.s_store.index(i)),
Err(i) => Err(self.t_store.index(i)),
}
}
}
The implementation should be as advertised.
On inserts we stash the variant in the corresponding columnar store, and record where to find it.
When accessing by index, we grab the notes we left ourselves and use those to find and build the right variant to return.
As before, we can't know these are &S
or &T
results, and we need to rely on the Index<'a>
type of each of the columnar stores.
The main scandalous moment here is just how wasteful Vec<Result<usize, usize>>
is.
There is at most one bit of information per element, we use instead 128 bits.
The usize
values we store just increment for each of the variants, and are not as arbitrary as they could be.
You could draw this down to two bits per element, using a Vec<(u64, u64)>
for each block of 64 elements:
a count of the Ok
variants so far, and a bit field describing the Ok
/Err
variants for the block of 64.
The problem is fundamental to succinct data structures, but I haven't been brave enough to see if anything is easy to implement.
The Option<T>
type is another common sum type in Rust.
You can think of it as Result<T, ()>
, and get an implementation by copying the above.
This is the main event. The reason things are as tricky as they are is because of nested variable length sequences of data.
We got our toes wet with String
, but we need to generalize it to arbitrary Vec<T>
types, where T
is more complicated than u8
.
That being said, we are going to lean on the ideas from our columnar string implementation.
/// A stand-in for `Vec<Vec<T>>`.
pub struct ColumnVec<TC> {
bounds: Vec<usize>,
values: TC,
}
Like before we'll have bounds
that delimit the stored sequences, where the actual data live in values
.
Unlike before, values
is a TC
, whatever that is.
We'll need to work around that abstraction.
First, let's describe our Index<'a>
.
Ideally this would have been a &[T]
, corresponding to the Vec<T>
that we inserted.
Instead, it will be a wrapper around our TC
with enough information to support random access.
Not quite a &[T]
, but with similar random-access properties.
/// The result of indexing into a `ColumnVec`.
///
/// The result represents a `&[T]`, described by `slice[lower .. upper]`.
/// This may not be a slice of anything, but we can randomly access it.
#[derive(Debug)]
pub struct ColumnVecRef<'a, T, TC> {
lower: usize,
upper: usize,
slice: &'a TC,
phant: std::marker::PhantomData<T>,
}
The only thing this type lets you do is index into it.
impl<'a, T, TC: Columnar<T>> ColumnVecRef<'a, T, TC> {
/// Retrieve an element at position `index` in the "slice".
pub fn index(&self, index: usize) -> TC::Index<'_> {
assert!(index < (self.upper - self.lower));
self.slice.index(self.lower + index)
}
}
With these two types in hand, let's look at the implementation.
impl<T, TC: Columnar<T>> Columnar<Vec<T>> for ColumnVec<TC> {
fn push(&mut self, item: Vec<T>) {
for x in item {
self.values.push(x);
}
self.bounds.push(self.values.len());
}
fn len(&self) -> usize { self.bounds.len() - 1 }
type Index<'a> = ColumnVecRef<'a, T, TC> where TC: 'a;
fn index(&self, index: usize) -> Self::Index<'_> {
ColumnVecRef {
lower: self.bounds[index],
upper: self.bounds[index + 1],
slice: &self.values,
phant: std::marker::PhantomData,
}
}
}
The implementation tracks the String
implementation pretty well.
We push on the elements in the input sequence, then capture a new delimiter.
The number of elements is bounds.len() - 1
.
And we have a weird Index<'a>
type that .. lets you look at other elements.
It's probably appropriate to flesh out the implementation of ColumnVecRef
to have a len(&self) -> usize
method itself.
We could allow subslicing it (which would modify lower
or upper
), iterating over it, things like that.
These are moments where we could recover some ergonomics, but it's important to accept that our ColumnVecRef
may not represent contiguous memory, even though every &[T]
does.
Those are all the combinators that I've implemented. You might also want an associative array (aka "map" or "dictionary"). These aren't too different from sorted lists (use binary search!) other than performance, and we're not presently here to engineer a performant hash map (or b-tree map, or any map).
How the hell are you supposed to remember which Columnar<T>
implementor you need to name to store some type?
Like, if I show up with a Vec<(String, Option<Vec<i32>>)>
, what was the name again of the type I need to be able to call push
?
Let's make a quick trait that provides handholds from the type back to the columnar store.
/// A type with opinions about how to represent it columnarly.
pub trait Columnable {
/// A type that can represent `Self`.
type Columns: Columnar<Self> + Default;
}
This handy trait, which we implement along with all the combinators, allows one to go from a type to its columnar store. It means you can writing things like:
/// Convert items to a columnar representation.
fn as_columns<T: Columnable>(items: Vec<T>) -> T::Columns {
let mut store: T::Columns = Default::default();
for item in items {
store.push(item);
}
store
}
There are several other random ergonomic bits that are likely to evolve as I play with the code a bit more.
For example, stealing from Moritz it makes some sense to remove the push
function and use instead a Push<T>
trait:
/// Types that can accept items of type `T`.
trait Push<T> {
fn push(&mut self, item: T);
}
This allows us to implement each of Push<String>
and Push<&String>
and Push<&str>
and Push(&[u8])
for our ColumnString
type.
This means you can store heterogenous types that might be owned, or borrowed, or not even valid strings.
You only end up getting &[u8]
back out the other end of course, and it's less locked down in terms of certainly containing valid UTF-8.
But who hasn't pulled their hair out when they have a &str
and needed a &String
.
This leads us to another point: you only rarely need to push
owned data, and can instead copy
in references.
If you look at all of our implementations, they have vectors of primitive types at their roots.
Whenever we push
or extend
them, we are just copying memory rather than transferring ownership.
When presented with a Vec<(String, Option<Vec<i32>>)>
we end up copying and deallocating the owned memory.
In all of these cases, we can non-destructively copy the data, leaving you with ownership of your resources.
What should you do with those owned resources?
One thing would be to recycle them when you need owned instances from the columnar store.
The Index<'a>
type we return could implement a hypothetical CloneInto<T>
trait, both to get an owned instance and also to populate the internal of a mutably borrowed instance.
Rust has the Clone
and ToOwned
traits, but unfortunately neither of these would work here:
both require that T
implement a relationship with the index type (Clone
and Borrow
, respectively), and Vec<String>
doesn't know or care about our Index<'a>
.
Of course, this is all fascinating but is it any good?
Columnation has some benchmarks, copied from abomonation, that test the throughput of moving back and forth between representations.
For each of various types, they pack up 1024 copies of some instance of that type, often enough to get a sense for how long it takes.
We use cargo bench
for this, and it reports
test empty_clone ... bench: 608.11 ns/iter (+/- 134.01)
test empty_copy ... bench: 493.76 ns/iter (+/- 28.16)
test string10_clone ... bench: 53,079,975.00 ns/iter (+/- 514,232.91)
test string10_copy ... bench: 2,799,108.40 ns/iter (+/- 54,098.75)
test string20_clone ... bench: 29,095,600.10 ns/iter (+/- 2,969,723.78)
test string20_copy ... bench: 2,762,900.10 ns/iter (+/- 20,577.88)
test u32x2_clone ... bench: 662,151.56 ns/iter (+/- 289,850.37)
test u32x2_copy ... bench: 1,066,198.95 ns/iter (+/- 30,819.53)
test u64_clone ... bench: 678,821.36 ns/iter (+/- 92,635.69)
test u64_copy ... bench: 160,963.55 ns/iter (+/- 106,155.31)
test u8_u64_clone ... bench: 692,565.62 ns/iter (+/- 132,223.48)
test u8_u64_copy ... bench: 542,332.81 ns/iter (+/- 19,635.83)
test vec_u_s_clone ... bench: 57,517,145.80 ns/iter (+/- 772,137.41)
test vec_u_s_copy ... bench: 3,295,237.50 ns/iter (+/- 30,713.72)
test vec_u_vn_s_clone ... bench: 61,586,579.20 ns/iter (+/- 2,733,198.28)
test vec_u_vn_s_copy ... bench: 3,754,679.10 ns/iter (+/- 50,596.25)
Each couplet here is some item, for example vec![(); 1024]
in the first one, repeatedly either cloned and inserted into a Vec
, or pushed into our columnar store.
The numbers are mostly better, except for two that are worse.
test u32x2_clone ... bench: 662,151.56 ns/iter (+/- 289,850.37)
test u32x2_copy ... bench: 1,066,198.95 ns/iter (+/- 30,819.53)
test u8_u64_clone ... bench: 692,565.62 ns/iter (+/- 132,223.48)
test u8_u64_copy ... bench: 542,332.81 ns/iter (+/- 19,635.83)
The second one doesn't look worse, but it is worse. At least, both are doing the same "bad" thing: de-interleaving data that was happily just sitting there paired up together.
The first couplet takes a vec![(0u32,0u32); 1024]
and turns it in to two Vec<u32>
s, one for the first tuple element and one for the second.
That's a lot of work, when we could have just memcpy
d the whole thing somewhere.
And generally, yes: you don't necessarily want the tuple implementation that I showed you.
You could equally well have a Columnar<(S, T)>
implementor that is just a Vec<(S, T)>
, for simple enough S
and T
.
It's hard to know exactly which one you want, as e.g. summing the first column would go faster once the data are all columned-up, especially if you end up engaging SIMD, but it takes longer to get there.
Adding a new column in (!!!) is much easier if the columns are sharded up, though that's a more advanced topic!
The second couplet is just the same as the first, but the overhead is less dramatic.
What is neat about the second case is that by using a (Vec<u8>, Vec<u64>)
we are likely using less memory.
A bit more than one half the memory; nine-sixteenths, to be precise.
This is because a (u8, u64)
takes up sixteen bytes in Rust, even though there are only nine bytes available.
You would see the same benefits with a Result<u8, u64>
, were the implementation better, and even more so with an Option<u64>
.
Spliting the data apart provides more opportunity for compact representation, because we remove the potential waste Rust introduces for product types.
You can also compare the results to columnation.
Here each couplet is first our Columnar
implementations, and second the analogous columnation implementation.
test empty_copy ... bench: 493.76 ns/iter (+/- 28.16)
test empty_copy ... bench: 924.28 ns/iter (+/- 49.76)
test string10_copy ... bench: 2,799,108.40 ns/iter (+/- 54,098.75)
test string10_copy ... bench: 4,109,616.70 ns/iter (+/- 25,751.48)
test string20_copy ... bench: 2,762,900.10 ns/iter (+/- 20,577.88)
test string20_copy ... bench: 3,025,491.60 ns/iter (+/- 44,797.96)
test u32x2_copy ... bench: 1,066,198.95 ns/iter (+/- 30,819.53)
test u32x2_copy ... bench: 146,793.75 ns/iter (+/- 115,612.27)
test u64_copy ... bench: 160,963.55 ns/iter (+/- 106,155.31)
test u64_copy ... bench: 127,308.59 ns/iter (+/- 24,375.83)
test u8_u64_copy ... bench: 542,332.81 ns/iter (+/- 19,635.83)
test u8_u64_copy ... bench: 217,216.65 ns/iter (+/- 6,887.76)
test vec_u_s_copy ... bench: 3,295,237.50 ns/iter (+/- 30,713.72)
test vec_u_s_copy ... bench: 4,110,174.90 ns/iter (+/- 58,917.53)
test vec_u_vn_s_copy ... bench: 3,754,679.10 ns/iter (+/- 50,596.25)
test vec_u_vn_s_copy ... bench: 4,484,491.60 ns/iter (+/- 585,434.97)
The numbers are generally better, except where they are markedly worse for the de-interleaving reasons. I should also admit that columnation is a bit more robust in terms of memory use: it avoids repeated doubling and copying of allocations as it goes, but with the fixed sized benchmarking workload we are not exercising that.
It's not all that bad writing a safe columnar store for the sorts of data you often have in Rust.
It remains to be seen what the ergonomic burden is, and how wide the uncanny valley of not having implemented all the functions, traits, and general support that &[T]
comes with.
There's certainly more to do, and I'm especially interested in attempting a CloneInto<T>
next, as well as making the "array of structs" and "struct of array" relationship more convenient.