Replies: 3 comments 9 replies
-
Rather than changing the signature of Then users can choose which of the two traits they wish to implement; copse would always call the new "with order" trait (whereupon the blanket impl will delegate to the existing trait if that is the one that has been user implemented). In any event, I've done that in 18e5122, which you can use right away with the following added to your [patch.crates-io]
copse = { git = "https://github.com/eggyal/copse" } It really just needs some tests and documentation (PRs welcome!) and then it can be released, I think just a patch (because we're pre-1.0). |
Beta Was this translation helpful? Give feedback.
-
I just got around to writing a test for the new code. I ran into a lifetime issue with the signature of sort_key_with_order. The lifetime doesn't allow borrowing from O which is what I'm trying to do. I used an unsafe hack to bypass for testing and it seems to work otherwise. here's my code [Click to expand]fn copse_test() {
use common::rand;
use copse::{SortableByWithOrder, TotalOrder};
use std::cmp::Ordering;
use std::sync::atomic::{AtomicUsize, Ordering as MemOrdering};
use std::sync::Arc;
static COMPARISON_COUNT: AtomicUsize = AtomicUsize::new(0);
// define a total order
struct OrderByIndexedKeys {
key_size: usize,
keys: Arc<Vec<u8>>,
}
fn extend_lifetime_hack<'b>(slice: &'b [u8]) -> &'static [u8] {
unsafe { std::mem::transmute::<&'b [u8], &'static [u8]>(slice) }
}
fn extract_key(bytes: &[u8], key_size: usize, idx: usize) -> &[u8] {
&bytes[idx * key_size..(idx + 1) * key_size]
}
impl OrderByIndexedKeys {
fn get_key(&self, idx: usize) -> &'static [u8] {
extend_lifetime_hack(extract_key(&self.keys.as_slice(), self.key_size, idx))
}
}
impl TotalOrder for OrderByIndexedKeys {
type OrderedType = [u8];
fn cmp(&self, left: &Self::OrderedType, right: &Self::OrderedType) -> Ordering {
COMPARISON_COUNT.fetch_add(1, MemOrdering::SeqCst);
left.cmp(right)
}
}
// define lookup key types for collections sorted by our total order
impl SortableByWithOrder<OrderByIndexedKeys> for usize {
fn sort_key_with_order(&self, order: &OrderByIndexedKeys) -> &[u8] {
order.get_key(*self)
}
}
impl SortableByWithOrder<OrderByIndexedKeys> for [u8] {
fn sort_key_with_order(&self, _order: &OrderByIndexedKeys) -> &[u8] {
self
}
}
impl SortableByWithOrder<OrderByIndexedKeys> for str {
fn sort_key_with_order(&self, _order: &OrderByIndexedKeys) -> &[u8] {
self.as_bytes()
}
}
impl SortableByWithOrder<OrderByIndexedKeys> for String {
fn sort_key_with_order(&self, _order: &OrderByIndexedKeys) -> &[u8] {
self.as_bytes()
}
}
use rand::RngCore;
let mut rng = rand::thread_rng();
let mut v = Vec::new();
let key_count = 40;
let key_size = 128;
v.resize(key_count * key_size, 0);
for x in v.iter_mut() {
*x = (rng.next_u32() % 26) as u8 + b'A';
}
let v = Arc::new(v);
let order = OrderByIndexedKeys { key_size, keys: v.clone() };
let mut set = copse::btree_set::BTreeSet::new(order);
let key_count = v.len() / key_size;
for i in 0..key_count {
if rng.next_u32() & 1 == 1 {
set.insert(i);
}
}
let n = set.len();
let cmp_count = COMPARISON_COUNT.load(MemOrdering::SeqCst);
for i in 0..key_count {
assert_eq!(set.contains(&i), set.contains(extract_key(v.as_slice(), key_size, i)));
}
for &i in set.iter() {
let key = std::str::from_utf8(extract_key(v.as_slice(), key_size, i)).unwrap();
println!("{}->{}", i, key);
}
println!(
"sorted {} items with {} comparisons ({:.2}x of n*log(n))",
n,
cmp_count,
(cmp_count as f32) / ((n as f32) * (n as f32).log2()),
);
} (EDIT: made long code collapsible) |
Beta Was this translation helpful? Give feedback.
-
Trying to use it in practice now and I'm running into the issue that I would like to update the order after creation. The only other solution I can think of would be that the data used for sorting is not owned exclusively by the order, meaning Rc<RefCell<>> or Mutex etc. |
Beta Was this translation helpful? Give feedback.
-
This is the current trait:
If instead it was:
Then sort_key() could lookup the key based on anything accessible through O. This would allow the actual keys to not be owned by the collection, for example they could be dynamically loaded thru a cache or a memory mapped file. The "key" type stored in the collection can just be some kind of indexing data which may be very small compared to the real keys.
This is actually what I was hoping this crate did when I searched for it. Anyone know if something similar exists? Otherwise, perhaps the idea could be useful here. I have not spent any time looking at how feasible this is, but I thought it could be worth mentioning.
Beta Was this translation helpful? Give feedback.
All reactions