Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moka is too polymorphic (cargo-llvm-lines) #203

Open
Swatinem opened this issue Nov 19, 2022 · 4 comments
Open

moka is too polymorphic (cargo-llvm-lines) #203

Swatinem opened this issue Nov 19, 2022 · 4 comments

Comments

@Swatinem
Copy link
Contributor

I was playing around with https://github.com/dtolnay/cargo-llvm-lines recently, and found out that moka does generate a ton of llvm IR due to too much internal generics.

This is suboptimal for a couple of reasons:

  • it generates multiple copied of functions that have to be compiled, hurting compile times.
  • may lead to binary bloat
  • may even hurt performance as the instruction cache is not used optimally

Running cargo-llvm-lines on a small test program:

use moka::future::Cache;

pub struct Foo {
    cache: Cache<usize, usize>,
}

impl Foo {
    pub async fn bar(&self) -> usize {
        self.cache
            .entry(1)
            .or_insert_with_if(async { 1 }, |_| true)
            .await
            .into_value()
    }

    pub fn run(&self, runtime: tokio::runtime::Handle) -> usize {
        runtime.block_on(self.bar())
    }
}

gives me this these functions on the top:

  Lines                 Copies              Function name
  -----                 ------              -------------
  116847                3467                (TOTAL)
    6175 (5.3%,  5.3%)    57 (1.6%,  1.6%)  alloc::alloc::box_free
    3012 (2.6%,  7.9%)    21 (0.6%,  2.2%)  moka::cht::map::bucket::BucketArray<K,V>::probe_loop
    1988 (1.7%,  9.6%)     9 (0.3%,  2.5%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::remove_entry_if_and
    1918 (1.6%, 11.2%)    48 (1.4%,  3.9%)  core::option::Option<T>::map
    1680 (1.4%, 12.6%)     5 (0.1%,  4.0%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    1417 (1.2%, 13.9%)    34 (1.0%,  5.0%)  core::result::Result<T,E>::map_err
    1215 (1.0%, 14.9%)     9 (0.3%,  5.3%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if::{{closure}}
    1056 (0.9%, 15.8%)    18 (0.5%,  5.8%)  alloc::boxed::Box<T,A>::into_unique
    1008 (0.9%, 16.7%)     9 (0.3%,  6.1%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
    1002 (0.9%, 17.5%)     3 (0.1%,  6.1%)  moka::cht::map::bucket::BucketArray<K,V>::rehash
     996 (0.9%, 18.4%)    13 (0.4%,  6.5%)  <alloc::sync::Weak<T> as core::ops::drop::Drop>::drop
     978 (0.8%, 19.2%)     6 (0.2%,  6.7%)  alloc::raw_vec::RawVec<T,A>::allocate_in
     969 (0.8%, 20.0%)    12 (0.3%,  7.0%)  crossbeam_epoch::deferred::Deferred::new
     964 (0.8%, 20.9%)     1 (0.0%,  7.1%)  moka::future::value_initializer::ValueInitializer<K,V,S>::do_try_init::{{closure}}
     888 (0.8%, 21.6%)    18 (0.5%,  7.6%)  core::result::Result<T,E>::map
     886 (0.8%, 22.4%)     6 (0.2%,  7.8%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::get_key_value_and_then
     873 (0.7%, 23.1%)     9 (0.3%,  8.0%)  core::slice::iter::Iter<T>::new
     826 (0.7%, 23.8%)    13 (0.4%,  8.4%)  core::alloc::layout::Layout::for_value_raw
     825 (0.7%, 24.5%)     3 (0.1%,  8.5%)  crossbeam_channel::flavors::list::Channel<T>::start_send
     817 (0.7%, 25.2%)    19 (0.5%,  9.0%)  core::ptr::mut_ptr::<impl *mut T>::is_null
     777 (0.7%, 25.9%)     3 (0.1%,  9.1%)  core::sync::atomic::atomic_compare_exchange
     770 (0.7%, 26.6%)    10 (0.3%,  9.4%)  alloc::raw_vec::RawVec<T,A>::current_memory
     756 (0.6%, 27.2%)    21 (0.6%, 10.0%)  moka::cht::map::bucket::BucketArray<K,V>::probe_loop::{{closure}}

You can see that some functions are duplicated as often as 21 times.

I do have some ideas to reduce this bloat, so watch out for some patches ;-)

@Swatinem
Copy link
Contributor Author

looks like I was a bit overeager. Apart from the one I found, most of the things are either due explicit #[inline] annotations (if I understand things correctly), or justified differences because of generic functions that do different things.

@Swatinem
Copy link
Contributor Author

I’m been playing a bit with this, trying to create a stresstest aimed at getting a hold of the compile times.

use moka::future::Cache;

macro_rules! generate {
    ($($n:ident: $a:ty => $b: ty,)+) => {
        struct State {
            $($n: Cache<$a, $b>,)+
        }

        impl State {
            fn new() -> Self {
                Self {
                    $($n: Cache::builder().build(),)+
                }
            }
            $(async fn $n(&self, k: $a) -> $b {
                self.$n.get_with(k, async { 1 }).await
            })+
        }

        fn main() {
            let state = State::new();

            let sum = futures_executor::block_on(async {
                let mut sum = 0;
                $(sum += state.$n(0).await as u64;)+
                sum
            });

            println!("{sum}");
        }

    };
}

generate!(
    cache_u8_u8: u8 => u8,
    cache_u8_u16: u8 => u16,
    cache_u8_u32: u8 => u32,
    cache_u8_u64: u8 => u64,

    cache_u16_u8: u16 => u8,
    cache_u16_u16: u16 => u16,
    cache_u16_u32: u16 => u32,
    cache_u16_u64: u16 => u64,

    cache_u32_u8: u32 => u8,
    cache_u32_u16: u32 => u16,
    cache_u32_u32: u32 => u32,
    cache_u32_u64: u32 => u64,

    cache_u64_u8: u64 => u8,
    cache_u64_u16: u64 => u16,
    cache_u64_u32: u64 => u32,
    cache_u64_u64: u64 => u64,
);

This testcase will generate up to 16 different instantiations of future::Cache with different generic parameters.

On my Windows system, I see a compile-time (CARGO_INCREMENTAL=0 cargo build --release) of roughly ~6s with 4 instances, ~11.5s with 8, and ~22s with 16; for a debug build: ~3s with 4, ~5.5s with 8, ~10s with 16.
(fun fact: it looks like we use a total of 17 different moka instances in symbolicator, yes we indeed love this crate very much :-))

A run of CARGO_PROFILE_RELEASE_LTO=fat cargo llvm-lines --release finishes in 50 and gives me these top moka functions (plus a ton of crossbeam and std that I removed):

  Lines                  Copies               Function name
  -----                  ------               -------------
  1486211                43080                (TOTAL)
    52136 (3.5%,  3.5%)    316 (0.7%,  0.7%)  moka::cht::map::bucket::BucketArray<K,V>::probe_loop
    29148 (2.0%, 10.8%)    148 (0.3%,  3.3%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::remove_entry_if_and
    17680 (1.2%, 18.0%)     16 (0.0%,  7.2%)  moka::future::value_initializer::ValueInitializer<K,V,S>::try_init_or_read::{{closure}}
    14312 (1.0%, 21.1%)     36 (0.1%,  8.3%)  moka::cht::map::bucket::BucketArray<K,V>::rehash
    13912 (0.9%, 22.0%)    148 (0.3%,  8.7%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if::{{closure}}
    12480 (0.8%, 22.8%)     96 (0.2%,  8.9%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::get_key_value_and_then
    11232 (0.8%, 25.1%)     16 (0.0%,  9.5%)  moka::sync_base::base_cache::BaseCache<K,V,S>::do_insert_with_hash
    10736 (0.7%, 25.9%)     64 (0.1%,  9.6%)  moka::sync_base::base_cache::BaseCache<K,V,S>::expire_after_read_or_update
    10112 (0.7%, 28.7%)     16 (0.0%, 10.6%)  moka::sync_base::base_cache::Inner<K,V,S>::new
     9544 (0.6%, 30.0%)     32 (0.1%, 11.3%)  moka::sync_base::base_cache::BaseCache<K,V,S>::do_get_with_hash
     8480 (0.6%, 32.9%)     16 (0.0%, 12.5%)  moka::sync_base::base_cache::Inner<K,V,S>::handle_upsert
     8384 (0.6%, 33.5%)    256 (0.6%, 13.1%)  moka::sync_base::base_cache::BaseCache<K,V,S>::expire_after_read_or_update::{{closure}}
     8164 (0.5%, 34.6%)    148 (0.3%, 13.6%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if

Using cargo bloat I end up with these related to moka:

 File  .text     Size    Crate Name
11.6%  14.1% 404.5KiB    bloat bloat::main::async_block$0
 5.0%   6.1% 173.9KiB    bloat bloat::main
 1.8%   2.2%  64.2KiB     moka moka::sync_base::base_cache::impl$15::sync<u64,u16,std::collections::hash::map::RandomState>
^ 16x
 0.7%   0.8%  23.4KiB     moka moka::future::cache::impl$9::insert_with_hash::async_fn$0<u32,u64,std::collections::hash::map::RandomState>
^ 16x
 0.4%   0.5%  13.5KiB async_io async_io::driver::main_loop
 0.3%   0.4%  11.8KiB async_io async_io::reactor::Reactor::get
 0.2%   0.3%   8.3KiB      std core::ops::function::FnOnce::call_once<moka::sync_base::invalidator::impl$4::submit_task::closure_env$0<u8,u16,std::collections::hash::map::RandomState>,tuple$<> >
^ 16x
...
82.0% 100.0%   2.8MiB          .text section size, the file size is 3.4MiB

I think most of the methods actually do need to have 16 different variants, though internally it might be a good idea to deduplicate some things.

@Swatinem
Copy link
Contributor Author

A quick check with my #265 PR:

A debug build with 16 versions take ~9s, and a release build ~21s, so not that much of a difference.

cargo llvm-lines looks like this:

  Lines                  Copies               Function name
  -----                  ------               -------------
  1410440                41147                (TOTAL)
    29148 (2.1%,  7.7%)    148 (0.4%,  2.7%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::remove_entry_if_and
    21336 (1.5%, 10.9%)    148 (0.4%,  3.4%)  moka::cht::map::bucket::BucketArray<K,V>::remove_if
    17680 (1.3%, 14.7%)     16 (0.0%,  5.2%)  moka::future::value_initializer::ValueInitializer<K,V,S>::try_init_or_read::{{closure}}
    14312 (1.0%, 19.2%)     36 (0.1%,  7.5%)  moka::cht::map::bucket::BucketArray<K,V>::rehash
    12480 (0.9%, 20.0%)     96 (0.2%,  7.8%)  moka::cht::map::bucket_array_ref::BucketArrayRef<K,V,S>::get_key_value_and_then
    11232 (0.8%, 22.5%)     16 (0.0%,  8.4%)  moka::sync_base::base_cache::BaseCache<K,V,S>::do_insert_with_hash
    10736 (0.8%, 23.2%)     64 (0.2%,  8.5%)  moka::sync_base::base_cache::BaseCache<K,V,S>::expire_after_read_or_update
    10656 (0.8%, 24.0%)     96 (0.2%,  8.8%)  moka::cht::map::bucket::BucketArray<K,V>::get
    10112 (0.7%, 26.9%)     16 (0.0%,  9.8%)  moka::sync_base::base_cache::Inner<K,V,S>::new
     9544 (0.7%, 28.3%)     32 (0.1%, 10.5%)  moka::sync_base::base_cache::BaseCache<K,V,S>::do_get_with_hash
     8480 (0.6%, 31.4%)     16 (0.0%, 11.8%)  moka::sync_base::base_cache::Inner<K,V,S>::handle_upsert
     8384 (0.6%, 32.0%)    256 (0.6%, 12.4%)  moka::sync_base::base_cache::BaseCache<K,V,S>::expire_after_read_or_update::{{closure}}

Interestingly, cargo bloat shows both better and worse sizes, the total is about the same:

 File  .text     Size    Crate Name
12.2%  14.9% 425.5KiB    bloat bloat::main::async_block$0
 5.0%   6.1% 173.9KiB    bloat bloat::main
 1.8%   2.3%  64.5KiB     moka moka::sync_base::base_cache::impl$15::sync<u64,u16,std::collections::hash::map::RandomState>
 0.6%   0.8%  21.7KiB     moka moka::future::cache::impl$9::insert_with_hash::async_fn$0<u32,u64,std::collections::hash::map::RandomState>
...
81.9% 100.0%   2.8MiB          .text section size, the file size is 3.4MiB

@darren-fu
Copy link

great work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants