Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support fastlanes bitpacking #2886

Merged
merged 35 commits into from
Sep 27, 2024
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c089644
feature: support fastlanes bitpacking for uint8 type
broccoliSpicy Sep 16, 2024
a1e3cdf
minor fix
broccoliSpicy Sep 16, 2024
3f340a3
fix a bug, add self.buffer_offset in byte range
broccoliSpicy Sep 17, 2024
9a6c489
minor fix 2
broccoliSpicy Sep 17, 2024
f55a445
feat: add fastlanes bitpacking for other types
broccoliSpicy Sep 18, 2024
7c21438
address initial PR comments
broccoliSpicy Sep 18, 2024
3b82ec5
Merge branch 'main' into fastlanes
broccoliSpicy Sep 18, 2024
f0bd3a8
fix lint
broccoliSpicy Sep 18, 2024
ce0f798
return a slice of LanceBuffer in `decode`
broccoliSpicy Sep 18, 2024
fb9ede2
use `elems_per_chunk` constant to represent 1024, delete
broccoliSpicy Sep 19, 2024
3ad773c
use macro in encode method
broccoliSpicy Sep 19, 2024
403e89d
Don't pass strings to the choose_array_encoder method when choosing a…
westonpace Sep 18, 2024
0ae8362
fix a bug in `bitpacked_for_non_neg_decode`
broccoliSpicy Sep 20, 2024
23e261c
add stable rust fastlanes
broccoliSpicy Sep 20, 2024
3f92fcd
Merge remote-tracking branch 'refs/remotes/origin/fastlanes' into fas…
broccoliSpicy Sep 20, 2024
dba9a48
fix lint
broccoliSpicy Sep 20, 2024
1eb75e2
remove external fastlanes crate
broccoliSpicy Sep 21, 2024
1485759
license header
broccoliSpicy Sep 21, 2024
8543f54
fix lint
broccoliSpicy Sep 21, 2024
ee78fc6
delete a unnecessary file rust/lance-encoding/compression-algo/mod.rs
broccoliSpicy Sep 21, 2024
fe3fda8
delete two redundant file
broccoliSpicy Sep 21, 2024
697af4a
hangle nullable and all null data block in `encode`.
broccoliSpicy Sep 23, 2024
922c2fe
fix `choose_array_encoder` issue when enable V2.1
broccoliSpicy Sep 24, 2024
f09cad7
fix lint
broccoliSpicy Sep 24, 2024
fc89bf4
fix a bug scheduling ranges for data types other than 32-bit width
broccoliSpicy Sep 24, 2024
d5b9201
Make sure to use version 2.1 in tests for bitpacking
westonpace Sep 24, 2024
13b757a
make `locate_chunk_start` and `locate_chunk_end` a method
broccoliSpicy Sep 24, 2024
ca4dba3
Merge branch 'main' into fastlanes
broccoliSpicy Sep 24, 2024
f42af4c
add test_pack
broccoliSpicy Sep 25, 2024
1c2878b
add test_unchecked_pack
broccoliSpicy Sep 25, 2024
688bb1f
address PR comments
broccoliSpicy Sep 25, 2024
4d7557f
Update rust/lance-encoding/src/buffer.rs
broccoliSpicy Sep 25, 2024
c7ecb08
Merge branch 'fix/use-v2-1-on-bitpack-tests' of https://github.com/we…
broccoliSpicy Sep 25, 2024
dedb306
fix fastlanes original code link
broccoliSpicy Sep 27, 2024
655a063
lint
broccoliSpicy Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Don't pass strings to the choose_array_encoder method when choosing a…
…n encoder for dict indices
westonpace authored and broccoliSpicy committed Sep 19, 2024
commit 403e89df1efd76f08225270a194612b4fbf5686a
8 changes: 6 additions & 2 deletions rust/lance-encoding/src/encoder.rs
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@
use std::{collections::HashMap, env, sync::Arc};

use arrow::array::AsArray;
use arrow_array::{Array, ArrayRef, RecordBatch};
use arrow_array::{Array, ArrayRef, RecordBatch, UInt8Array};
use arrow_schema::DataType;
use bytes::{Bytes, BytesMut};
use futures::future::BoxFuture;
@@ -269,7 +269,11 @@ impl CoreArrayEncodingStrategy {
DataType::Utf8 | DataType::LargeUtf8 | DataType::Binary | DataType::LargeBinary => {
if use_dict_encoding {
let dict_indices_encoder = Self::choose_array_encoder(
arrays,
// We need to pass arrays to this method to figure out what kind of compression to
// use but we haven't actually calculated the indices yet. For now, we just assume
// worst case and use the full range. In the future maybe we can pass in statistics
// instead of the actual data
&[Arc::new(UInt8Array::from_iter_values(0_u8..255_u8))],
&DataType::UInt8,
data_size,
false,

Unchanged files with check annotations Beta

macro_rules! encode_fixed_width {
($self:expr, $unpacked:expr, $data_type:ty, $buffer_index:expr) => {{
let num_chunks = ($unpacked.num_values + ELEMS_PER_CHUNK - 1) / ELEMS_PER_CHUNK;

Check warning on line 199 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
let num_full_chunks = $unpacked.num_values / ELEMS_PER_CHUNK;
let uncompressed_bit_width = std::mem::size_of::<$data_type>() as u64 * 8;
let DataBlock::FixedWidth(mut unpacked) = data else {
return Err(Error::InvalidInput {
source: "Bitpacking only supports fixed width data blocks".into(),
location: location!(),

Check warning on line 290 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
});
};
bytes_idx_to_range_indices: &[Vec<std::ops::Range<u64>>],
num_rows: u64,
) -> LanceBuffer {
match uncompressed_bits_per_value {

Check warning on line 444 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
8 => {
let mut decompressed: Vec<u8> = Vec::with_capacity(num_rows as usize);
let packed_chunk_size: usize = ELEMS_PER_CHUNK as usize * compressed_bit_width as usize / 8;
compressed_bit_width as usize,
chunk,
&mut decompress_chunk_buf[..ELEMS_PER_CHUNK as usize],
);

Check warning on line 460 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
}
loop {
if curr_range_start + ELEMS_PER_CHUNK < bytes_idx_to_range_indices[i][ranges_idx].end {
curr_range_start += this_part_len;
break;
} else {
let this_part_len =

Check warning on line 471 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
bytes_idx_to_range_indices[i][ranges_idx].end - curr_range_start;
decompressed.extend_from_slice(
&decompress_chunk_buf[(curr_range_start % ELEMS_PER_CHUNK) as usize..]
}
LanceBuffer::Owned(decompressed)
}

Check warning on line 489 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
16 => {
let mut decompressed: Vec<u16> = Vec::with_capacity(num_rows as usize);
let packed_chunk_size_in_byte: usize = (ELEMS_PER_CHUNK * compressed_bit_width) as usize / 8;
compressed_bit_width as usize,
chunk,
&mut decompress_chunk_buf,
);

Check warning on line 507 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
}
loop {
if curr_range_start + ELEMS_PER_CHUNK < bytes_idx_to_range_indices[i][ranges_idx].end {
// we know this chunk has only data of this range
break;
} else {
let this_part_len =

Check warning on line 521 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
bytes_idx_to_range_indices[i][ranges_idx].end - curr_range_start;
decompressed.extend_from_slice(
&decompress_chunk_buf[(curr_range_start % ELEMS_PER_CHUNK) as usize..]
}
LanceBuffer::reinterpret_vec(decompressed).to_owned()
}

Check warning on line 539 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
32 => {
let mut decompressed: Vec<u32> = Vec::with_capacity(num_rows as usize);
let packed_chunk_size_in_byte: usize = (ELEMS_PER_CHUNK * compressed_bit_width) as usize / 8;
compressed_bit_width as usize,
chunk,
&mut decompress_chunk_buf,
);

Check warning on line 557 in rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

GitHub Actions / linux-arm

Diff in /runner/_work/lance/lance/rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs
}
loop {
if curr_range_start + ELEMS_PER_CHUNK < bytes_idx_to_range_indices[i][ranges_idx].end {