Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance between 'Values API' and 'Serde Compatible API'. #370

Open
ZhaiMo15 opened this issue Feb 23, 2024 · 3 comments
Open

Performance between 'Values API' and 'Serde Compatible API'. #370

ZhaiMo15 opened this issue Feb 23, 2024 · 3 comments

Comments

@ZhaiMo15
Copy link

I noticed that simd-json offers two main entry points for usage: 'Values API' and 'Serde Compatible API'.
I ran benches/parse.rs to test the performance. I added code below to test simd_json::serde::from_slice:

fn simd_from_slice(data: &mut [u8]) {
    let _: serde_json::Value = simd_json::serde::from_slice(data).unwrap();
}

group.bench_with_input("simd_json::serde::from_slice", &vec, |b, data| {
    b.iter_batched(
        || data.clone(),
        |mut bytes| simd_from_slice(&mut bytes),
        BatchSize::SmallInput,
    )
});

Here's the result:

Throughput(MiB/s) simd_json:: to_borrowed_value simd_json:: to_borrowed_value_with_buffers simd_json:: to_owned_value simd_json:: serde::from_slice serde_json:: from_slice
apache_builds 378.61 323.86 170.99 164.81 140.08
event_stacktrace_10kb 983.51 1070.9 762.83 709.25 501.10
github_events 455.06 442.80 226.45 178.68 141.91
canada 142.38 164.17 146.48 114.10 152.69
citm_catalog 312.69 340.59 234.10 237.55 225.40
log 403.55 467.40 196.92 155.88 119.01
twitter 367.24 372.41 207.64 156.72 120.17

The 'Values API' is mostly better than serde except canada(146.48 vs 152.69). However, the performance of 'Serde Compatible API' seems not that good, the result of canada cannot be acceptable(114.10 vs 152.69). I'd like to use simdjson to increase the performance, so it's better to use 'Values API'? And if my data is similar to canada, it's better not to use simdjson?

@Licenser
Copy link
Member

Hi,

sorry for the late reply I'm just finishing some vacationing and was mostly away from the computer.

That makes a good bit of sense, the serde mapping code is quite complex and by that expensive to execute - irespective of the decoder (serde-json, simd-json, serde-yaml ... etc).

By contrast the value API is kept as simple as possible so the overhead is just lower. In some cases the extra cost of allowing abritary values is less then the extra cost of serde.

There is a 3rd api, the simd-json-derive api that allows to decide directly into structs w/o the value API, that will be faster still but is less flexible.

Last but not least, I want to point out that this isn't serde being bad, or not well written. serde is made to allow nearly arbritary format translations and is extremely powerful that way - it does this extremely well, but such power usually comes at a cost. the simd-json-derive macros are extremly specific so can take a lot of shortcuts serde simply can't that's where the performance benefit can be gained.

@PSeitz
Copy link
Contributor

PSeitz commented May 21, 2024

I implemented the ValueBuilder trait for serde_json_borrow::Value.
In this bench it is slightly slower than the simd_json serde variant in most benches.
I think it should slightly faster than BorrowedValue, as the datastructures are simpler. I didn't profile, but could be related to missing inlines, as this is cross-crates.

There's also a dependency to halfbrown::Hashmap on the simd_json::value::deserialize API. It would be good to reexport or switch to something more generic like an Iterator.

The branch is here https://github.com/PSeitz/serde_json_borrow/tree/simd_json_value_builder.

Library Dataset Avg Speed
serde_json flat_json 137.96 MiB/s
serde_json_borrow flat_json 208.32 MiB/s
simd_serde_json_borrow flat_json 122.62 MiB/s
simd_serde_json_borrow_value_builder flat_json 118.30 MiB/s
simd_json_BorrowedValue flat_json 134.90 MiB/s
serde_json hdfs 287.32 MiB/s
serde_json_borrow hdfs 389.00 MiB/s
simd_serde_json_borrow hdfs 263.32 MiB/s
simd_serde_json_borrow_value_builder hdfs 253.54 MiB/s
simd_json_BorrowedValue hdfs 288.18 MiB/s
serde_json hdfs_with_array 200.19 MiB/s
serde_json_borrow hdfs_with_array 283.97 MiB/s
simd_serde_json_borrow hdfs_with_array 163.14 MiB/s
simd_serde_json_borrow_value_builder hdfs_with_array 192.23 MiB/s
simd_json_BorrowedValue hdfs_with_array 209.96 MiB/s
serde_json wiki 446.33 MiB/s
serde_json_borrow wiki 488.47 MiB/s
simd_serde_json_borrow wiki 555.05 MiB/s
simd_serde_json_borrow_value_builder wiki 544.57 MiB/s
simd_json_BorrowedValue wiki 582.90 MiB/s
serde_json gh-archive 175.92 MiB/s
serde_json_borrow gh-archive 362.67 MiB/s
simd_serde_json_borrow gh-archive 343.65 MiB/s
simd_serde_json_borrow_value_builder gh-archive 328.93 MiB/s
simd_json_BorrowedValue gh-archive 397.70 MiB/s

@Licenser
Copy link
Member

This is really awesome @PSeitz !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants