Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic causes "thread panicked at 'internal error: entered unreachable code'" #43

Closed
nickguletskii opened this issue Sep 17, 2021 · 3 comments · Fixed by #46
Closed

Panic causes "thread panicked at 'internal error: entered unreachable code'" #43

nickguletskii opened this issue Sep 17, 2021 · 3 comments · Fixed by #46
Assignees
Labels
bug Something isn't working
Milestone

Comments

@nickguletskii
Copy link

nickguletskii commented Sep 17, 2021

Hi, thank you for this very useful library! However, I've found a way to enter a supposedly unreachable state.

Here's the relevant part of the backtrace:

thread 'main' panicked at 'internal error: entered unreachable code', /home/user/.cargo/git/checkouts/moka-6ea430727379b61e/19b6925/src/future/value_initializer.rs:101:29
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
   1: core::panicking::panic_fmt
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:92:14
   2: core::panicking::panic
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:50:5
   3: moka::future::value_initializer::ValueInitializer<K,V,S>::try_init_or_read::{{closure}}
             at /home/user/.cargo/git/checkouts/moka-6ea430727379b61e/19b6925/src/future/value_initializer.rs:101:29
   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/future/mod.rs:80:19
   5: moka::future::cache::Cache<K,V,S>::get_or_try_insert_with_hash_and_fun::{{closure}}
             at /home/user/.cargo/git/checkouts/moka-6ea430727379b61e/19b6925/src/future/cache.rs:640:15
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/future/mod.rs:80:19
   7: moka::future::cache::Cache<K,V,S>::get_or_try_insert_with::{{closure}}
             at /home/user/.cargo/git/checkouts/moka-6ea430727379b61e/19b6925/src/future/cache.rs:445:9

This issue seems to occur when two threads attempt to fetch the same key and the thread on which the value initialization was started panics. I've built a simple program to demonstrate this issue:

use tokio;
use std::sync::Arc;

#[tokio::main]
async fn main() {
    let cache: Arc<moka::future::Cache<i32, i32>> = Arc::new(moka::future::Cache::new(16));
    let semaphore = Arc::new(tokio::sync::Semaphore::new(0));
    {
        let cache_ref = cache.clone();
        let semaphore_ref = semaphore.clone();
        tokio::task::spawn(async move {
            cache_ref.get_or_try_insert_with::<_, ()>(1, async move {
                semaphore_ref.add_permits(1);
                tokio::time::sleep(tokio::time::Duration::from_millis(1000));
                panic!("Panic during get_or_try_insert_with");
                Ok(10)
            }).await;
        });
    }
    semaphore.acquire().await;
    cache.get_or_try_insert_with::<_, ()>(1, async move {
        println!("Async block in second get_or_try_insert_with called");
        Ok(5)
    }).await;
}

I think the unreachable code is being hit because the write lock obtained at line 71 of value_initializer.rs gets dropped without any value being written (here or here).

I see two possible options for handling this issue:

  1. unreachable() can be replaced with a panic with a more helpful message.
  2. Perhaps the existing waiter should be removed and a new waiter should be created and started using the future passed in the second call?
@tatsuya6502 tatsuya6502 self-assigned this Sep 17, 2021
@tatsuya6502 tatsuya6502 added the bug Something isn't working label Sep 17, 2021
@tatsuya6502 tatsuya6502 added this to the v0.5.4 milestone Sep 17, 2021
@tatsuya6502
Copy link
Member

Hi. Thank you for the detailed report! I totally agree; we should take one of the options that you proposed.

  1. Perhaps the existing waiter should be removed and a new waiter should be created and started using the future passed in the second call?

I think we can take this option by using the catch_unwind method from the futures crate. (doc). I will give it a try.

@nickguletskii
Copy link
Author

I think we can take this option by using the catch_unwind method from the futures crate.

Unless I am missing something, using catch_unwind would impose a lot of restrictions of what can happen inside the future. For instance, if I modify my example to use catch_unwind, it won't compile because of internal mutability in the Semaphore:

use futures::future::FutureExt;
use tokio;
use std::sync::Arc;
use std::any::Any;
use futures::TryStreamExt;
use tokio::sync::Mutex;
use futures::TryFutureExt;

#[tokio::main]
async fn main() {
    let cache: Arc<moka::future::Cache<i32, i32>> = Arc::new(moka::future::Cache::new(16));
    let semaphore = Arc::new(tokio::sync::Semaphore::new(0));
    {
        let cache_ref = cache.clone();
        let semaphore_ref = semaphore.clone();
        tokio::task::spawn(async move {
            cache_ref.get_or_try_insert_with::<_, tokio::sync::Mutex<Box<(dyn Any + std::marker::Send + 'static)>>>(1, async move {
                semaphore_ref.add_permits(1);
                tokio::time::sleep(tokio::time::Duration::from_millis(1000));
                panic!("Panic during get_or_try_insert_with");
                10
            }.catch_unwind().map_err(|pn| Mutex::new(pn))).await;
        });
    }
    semaphore.acquire().await;
    cache.get_or_try_insert_with::<_, ()>(1, async move {
        println!("Async block in second get_or_try_insert_with called");
        Ok(5)
    }).await;
}

And here's the compiler error:

error[E0277]: the type `UnsafeCell<tokio::sync::batch_semaphore::Waitlist>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
  --> src/main.rs:22:15
   |
22 |             }.catch_unwind().map_err(|pn| Mutex::new(pn))).await;
   |               ^^^^^^^^^^^^ `UnsafeCell<tokio::sync::batch_semaphore::Waitlist>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
   |
   = help: within `Semaphore`, the trait `RefUnwindSafe` is not implemented for `UnsafeCell<tokio::sync::batch_semaphore::Waitlist>`
note: captured value does not implement `UnwindSafe`
  --> src/main.rs:18:17
   |
18 |                 semaphore_ref.add_permits(1);
   |                 ^^^^^^^^^^^^^ has type `Arc<Semaphore>` which does not implement `UnwindSafe`

However, it should be possible to detect the panic on the other thread without directly catching it by handling the case currently marked as unreachable in try_init_or_read, since None should only occur in cases when the initialization future has terminated before writing anything. A hacky solution would be to remove the waiter and call try_init_or_read recursively, but that could result in stack overflows.

@tatsuya6502
Copy link
Member

Unless I am missing something, using catch_unwind would impose a lot of restrictions of what can happen inside the future. For instance, if I modify my example to use catch_unwind, it won't compile because of internal mutability in the Semaphore:

Thanks for checking. I opened a pull request #46, which uses catch_unwind. It wraps the future with AssertUnwindSafe to avoid the restrictions. (doc).

// Our waiter was inserted. Let's resolve the init future.
// Catching panic is safe here as we do not try to resolve the future again.
match AssertUnwindSafe(init).catch_unwind().await {
// Resolved.
Ok(value) => return post_init(key, value, &mut lock),
// Panicked.
Err(payload) => {
*lock = None;
// Remove the waiter so that others can retry.
self.remove_waiter(key, type_id);
resume_unwind(payload);
} // The lock will be unlocked here.
}

I verified that your example compiles and now works as expected:

moka/src/future/cache.rs

Lines 1345 to 1370 in af44363

#[tokio::test]
// https://github.com/moka-rs/moka/issues/43
async fn handle_panic_in_get_or_try_insert_with() {
use tokio::time::{sleep, Duration};
let cache = Cache::new(16);
let semaphore = Arc::new(tokio::sync::Semaphore::new(0));
{
let cache_ref = cache.clone();
let semaphore_ref = semaphore.clone();
tokio::task::spawn(async move {
let _ = cache_ref
.get_or_try_insert_with(1, async move {
semaphore_ref.add_permits(1);
sleep(Duration::from_millis(50)).await;
panic!("Panic during get_or_try_insert_with");
})
.await as Result<_, Arc<Infallible>>;
});
}
let _ = semaphore.acquire().await.expect("semaphore acquire failed");
assert_eq!(
cache.get_or_try_insert_with(1, async { Ok(5) }).await as Result<_, Arc<Infallible>>,
Ok(5)
);
}

It should be safe to use AssertUnwindSafe for our use case because get_or_try_insert_with will not touch the panicked future and will do resume_unwind after removing the waiter. I checked Tokio and scheduled-thread-pool crates and found that Tokio uses catch_unwind and AssertUnwindSafe to ignore panics, and scheduled-thread-pool crate is doing a similar thing.

A hacky solution would be to remove the waiter and call try_init_or_read recursively, but that could result in stack overflows.

I would avoid to remove the waiter from the other threads because it will get messy if two or more threads are trying to do so at the same time (removing the old waiter and inserting its own one).

As for stack overflows, I used loop {} rather than recursively calling try_init_or_read so it will not use extra stack frames.

PoOnesNerfect added a commit to thuo-io/moka that referenced this issue Dec 21, 2023
* updates

- fix titan preset token
- update web-server image
- add gcloud config

* reorganize rust packages

* git subrepo clone https://github.com/thuo-io/openapi-rust.git libs/openapi

subrepo:
  subdir:   "libs/openapi"
  merged:   "ae8de950"
upstream:
  origin:   "https://github.com/thuo-io/openapi-rust.git"
  branch:   "main"
  commit:   "ae8de950"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/Homebrew/brew"
  commit:   "146b987fa"

* remove js packages

* move openapi dir

* remove landing page ingress

* try using tosserror

* reorganize typesense error

* update tosserror

* git subrepo clone (merge) https://github.com/thuo-io/js-packages.git frontend

subrepo:
  subdir:   "frontend"
  merged:   "61f63a88"
upstream:
  origin:   "https://github.com/thuo-io/js-packages.git"
  branch:   "main"
  commit:   "61f63a88"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/Homebrew/brew"
  commit:   "146b987fa"

* remove js packages

* git subrepo clone https://github.com/thuo-io/bundle.git services/bundle

subrepo:
  subdir:   "services/bundle"
  merged:   "867c88e7"
upstream:
  origin:   "https://github.com/thuo-io/bundle.git"
  branch:   "main"
  commit:   "867c88e7"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/Homebrew/brew"
  commit:   "146b987fa"

* add bundle

* remove subscribe handler and increase resources for sidecar

* git subrepo pull (merge) --branch=minikube services/bundle

subrepo:
  subdir:   "services/bundle"
  merged:   "ac09ed72"
upstream:
  origin:   "https://github.com/thuo-io/bundle.git"
  branch:   "minikube"
  commit:   "575cb812"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/Homebrew/brew"
  commit:   "146b987fa"

* minor change

* move bundle to inside web-server

* add cors for products api

* remove bundle

* git subrepo clone --branch=minikube https://github.com/thuo-io/bundle.git services/web-server/bundle

subrepo:
  subdir:   "services/web-server/bundle"
  merged:   "fc8a39a1"
upstream:
  origin:   "https://github.com/thuo-io/bundle.git"
  branch:   "minikube"
  commit:   "fc8a39a1"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/Homebrew/brew"
  commit:   "146b987fa"

* lfs

* remove js ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants