Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Typed)ThreadSafeFunction.Release() crashes renderer process on reload (& abort?) #1109

Closed
robinchrist opened this issue Dec 11, 2021 · 22 comments
Labels

Comments

@robinchrist
Copy link

robinchrist commented Dec 11, 2021

We are using a native addon with Electron (Both v14 and v16 show the same behaviour)

There is a class, which roughly looks like this:

class Foo : public Napi::ObjectWrap<Foo> {
public:
    Foo(
        const Napi::CallbackInfo& info
    ):
        Napi::ObjectWrap<Foo>(info)
    {
        tsfn1_ = ...
        tsfn2_ = ...
    }

    ~Foo() {
        tsfn1_.Release();
        tsfn2_.Release();
        ...
    }
private:
    TSFNT1 tsfn1_;
    TSFNT2 tsfn2_;
    ...
}

Everything works fine - Except when Electron reloads.
There are some situation when we need to reload the Electron application via getCurrentWebContents().reloadIgnoringCache() - Which fails spectacularly with a renderer crash.
This is an issue both for our users AND when in development when using hot reload.

I have tracked down the issue to the .Release() calls
If I remove the .Release() calls, everything seems to work fine, including reload.

I have tried to debug the issue further with a debug build of Electron, but at least v14 (have not tested v16) triggers another assertion when reloading which can't be fixed (I stripped the addon to not contain any exports and just the register call and it still caused a renderer crash on reload with Debug Electron)

My theory: Node destroys the TSFNs on Node's side before it frees / destroys the objects of the addon. I can't say whether this is just a wild theory or a real explanation, but I think a couple of Node devs here could know?

Our fix for now: Don't free the TSNFs in the destructor. The Foo object is long-living and usually only created once and destroyed once in the lifecycle of the addon.

FWIW, that's the stacktrace:

 thread #1, name = 'electron', stop reason = signal SIGABRT
  * frame #0: 0x00007f70aad7218b libc.so.6`__GI_raise(sig=<unavailable>) at raise.c:51:1
    frame #1: 0x00007f70aad51859 libc.so.6`__GI_abort at abort.c:79:7
    frame #2: 0x00005565c007d794 electron`uv_mutex_lock + 20
    frame #3: 0x00005565c63f7266 electron`napi_release_threadsafe_function + 38
    frame #4: 0x00007f709c498c9a addon.node`Napi::TypedThreadSafeFunction<std::nullptr_t, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<SecretType, std::default_delete<SecretType> > >, &(Foo::SecretJSGlueFunction(Napi::Env, Napi::Function, std::nullptr_t*, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<SecretType, std::default_delete<SecretType> > >*))>::Release(this=0x000013900794e100) at napi-inl.h:5163:10
    frame #5: 0x00007f709c4905d8 addon.node`Foo::~Foo(this=0x000013900794e000) at Foo.cc:166:25
    frame #6: 0x00007f709c490839 addon.node`Foo::~Foo(this=0x000013900794e000) at Foo.cc:164:48
    frame #7: 0x00007f709c4a2632 addon.node`Napi::ObjectWrap<Foo>::FinalizeCallback(env=0x000013900078ae20, data=0x000013900794e000, (null)=0x0000000000000000) at napi-inl.h:4387:3
    frame #8: 0x00005565c63f765f electron
    frame #9: 0x00005565c63f8adf electron
    frame #10: 0x00005565c63f74a6 electron
    frame #11: 0x00005565c63f7ea5 electron
    frame #12: 0x00005565c63f7f0e electron
    frame #13: 0x00005565c63f8218 electron
    frame #14: 0x00005565c0072729 electron`uv_run + 521
    frame #15: 0x00005565c63d00e1 electron
    frame #16: 0x00005565c63d0638 electron
    frame #17: 0x00005565c6398264 electron`node::FreeEnvironment(node::Environment*) + 180
    frame #18: 0x00005565c01d6698 electron
    frame #19: 0x00005565c56662d5 electron
    frame #20: 0x00005565c43085a5 electron
    frame #21: 0x00005565c43072a6 electron
    frame #22: 0x00005565c4684c0f electron
    frame #23: 0x00005565c4ba1468 electron
    frame #24: 0x00005565c4ba24f4 electron
    frame #25: 0x00005565c4bb69e2 electron
    frame #26: 0x00005565c4bb97bd electron
    frame #27: 0x00005565c470b503 electron
    frame #28: 0x00005565c5657761 electron
    frame #29: 0x00005565c5671db3 electron
    frame #30: 0x00005565c5671bfb electron
    frame #31: 0x00005565c56561cc electron
    frame #32: 0x00005565c5ae4805 electron
    frame #33: 0x00005565c0f493c9 electron
    frame #34: 0x00005565c5ae4d2b electron
    frame #35: 0x00005565c2fa29e8 electron
    frame #36: 0x00005565c2fa771d electron
    frame #37: 0x00005565c2fa3f4a electron
    frame #38: 0x00005565c31c0b29 electron
    frame #39: 0x00005565c2fa505f electron
    frame #40: 0x00005565c2c6e2f5 electron
    frame #41: 0x00005565c2c84316 electron
    frame #42: 0x00005565c2c854af electron
    frame #43: 0x00005565c2c3901e electron
    frame #44: 0x00005565c2c85914 electron
    frame #45: 0x00005565c2c575f2 electron
    frame #46: 0x00005565c61e682b electron
    frame #47: 0x00005565c104fe53 electron
    frame #48: 0x00005565c104d77f electron
    frame #49: 0x00005565c104e169 electron
    frame #50: 0x00005565c008335c electron
    frame #51: 0x00007f70aad530b3 libc.so.6`__libc_start_main(main=0x00005565c0083150, argc=15, argv=0x00007fff22fedd28, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff22fedd18) at libc-start.c:308:16
    frame #52: 0x00005565bfd18daa electron`_start + 42

Electron versions: v14 & v16
Compiler: Clang 10+
OS: definitely Linux, but seems to happen on MacOS and Windows
Arch: AMD64, but seems to happen on ARM64 too

Any ideas?

@robinchrist robinchrist changed the title (Typed)ThreadSafeFunction.Release() segfaults renderer process on reload (Typed)ThreadSafeFunction.Release() crashes renderer process on reload Dec 11, 2021
@robinchrist robinchrist changed the title (Typed)ThreadSafeFunction.Release() crashes renderer process on reload (Typed)ThreadSafeFunction.Release() crashes renderer process on reload (& abort?) Dec 11, 2021
@gabrielschulhof
Copy link
Contributor

@robinchrist if you remove the Release(), and since it now works, do you get memory leaks as you reload electron many times?

@mmomtchev
Copy link

@gabrielschulhof I am debugging what seems to be the same issue and I did asan it and here is what happens:

  • There is some class which contains a TSF
  • At some point its destructor calls TSF::Release() before deleting the object that contains the TSF
  • This triggers a call to details::ThreadSafeFinalize::FinalizeWrapperWithContext at some later moment
  • This function fails badly because the TSF object has already been deleted

If I remove the call to Release() - yes, then it works, but my Node process is left hanging at the end since there are open file handles

Currently the only way to make this work is to always have a finalizer and defer the deleting of the containing object. It is quite cumbersome, isn't it possible to skip FinalizeWrapperWithContext when there is no finalizer?

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Mar 28, 2022
@robinchrist
Copy link
Author

@gabrielschulhof What would be the best way to test this?

@github-actions github-actions bot removed the stale label Mar 29, 2022
@BB-fat
Copy link

BB-fat commented Mar 31, 2022

I have the same problem on mac, but I can't trigger the bug stably.

  • electron: 15.1.0
  • node-addon-api: 4.2.0
  • macos: 12.3 x64

I wrote something like this. See the full version here.

MDQuery::~MDQuery() {
    if (_queryRef) {
        auto localCenter = CFNotificationCenterGetLocalCenter();
        CFNotificationCenterRemoveObserver(localCenter, this, kMDQueryDidFinishNotification, _queryRef);
        CFNotificationCenterRemoveObserver(localCenter, this, kMDQueryDidUpdateNotification, _queryRef);
    }
    
    this->StopQuery();
    if (_queryRef) {
        CFRelease(_queryRef);
        _queryRef = NULL;
    }
    
    if (_queryResultCallback) {
        _queryResultCallback.Release();
        _queryResultCallback = NULL;
    }
    if (_updateCallback) {
        _updateCallback.Release();
        _updateCallback = NULL;
    }
}

Here is my crash log.

Crashed Thread:        0  CrBrowserMain  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
dyld: in dlopen_preflight()
abort() called

Thread 0 Crashed:: CrBrowserMain  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x00007fff70ec449a __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff70f816cb pthread_kill + 384
2   libsystem_c.dylib                   0x00007fff70e4ca1c abort + 120
3   com.github.Electron.framework       0x0000000103dd68c4 uv_mutex_lock + 20
4   com.github.Electron.framework       0x0000000107439696 napi_release_threadsafe_function + 38
5   node.napi.node                      0x000000011037d67e MDQuery::~MDQuery() + 174
6   node.napi.node                      0x000000011037d6de MDQuery::~MDQuery() + 14
7   node.napi.node                      0x000000011037f287 Napi::ObjectWrap<MDQuery>::FinalizeCallback(napi_env__*, void*, void*) + 55
8   com.github.Electron.framework       0x0000000107439a9f node_api_get_module_file_name + 767
9   com.github.Electron.framework       0x00000001074218eb node::EmitAsyncDestroy(node::Environment*, node::async_context) + 382219
10  com.github.Electron.framework       0x00000001074398d6 node_api_get_module_file_name + 310
11  com.github.Electron.framework       0x000000010743a465 node_api_get_module_file_name + 3269
12  com.github.Electron.framework       0x000000010743a4ce node_api_get_module_file_name + 3374
13  com.github.Electron.framework       0x000000010743a7db node_api_get_module_file_name + 4155
14  com.github.Electron.framework       0x0000000103dca977 uv_run + 535
15  com.github.Electron.framework       0x000000010740f2a4 node::EmitAsyncDestroy(node::Environment*, node::async_context) + 306884
16  com.github.Electron.framework       0x000000010740f745 node::EmitAsyncDestroy(node::Environment*, node::async_context) + 308069
17  com.github.Electron.framework       0x00000001073c1244 node::FreeEnvironment(node::Environment*) + 164
18  com.github.Electron.framework       0x0000000103ebca95 ElectronInitializeICUandStartNode + 918277
19  com.github.Electron.framework       0x0000000103eab3d9 ElectronInitializeICUandStartNode + 846921
20  com.github.Electron.framework       0x0000000105231e12 v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3294482
21  com.github.Electron.framework       0x000000010523341a v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3300122
22  com.github.Electron.framework       0x000000010522f298 v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 3283352
23  com.github.Electron.framework       0x0000000104600ab3 electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6674531
24  com.github.Electron.framework       0x000000010460054b electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6673147
25  com.github.Electron.framework       0x00000001045fef7b electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6667563
26  com.github.Electron.framework       0x00000001045ff819 electron::fuses::IsOnlyLoadAppFromAsarEnabled() + 6669769
27  com.github.Electron.framework       0x0000000103ddc756 ElectronMain + 134
28  org.tencent.xiaowei-desktop         0x0000000103d655e6 0x103d64000 + 5606
29  libdyld.dylib                       0x00007fff70d752e5 start + 1

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jun 30, 2022
@KevinEady
Copy link
Contributor

Hi @robinchrist / @mmomtchev / @BB-fat ,

We discussed this in the 1 July Node API meeting. Do you have a repository that reproduces this issue? Since this is using Electron, it is a little cumbersome to try to create a repro on our own.

Thanks, Kevin

@mmomtchev
Copy link

@KevinEady I don't have a repro, but this is not limited to Electron and it is not a bug but a design oversight.

From memory (this is the reason I don't use the C++ ThreadSafeFunction in exprtk.js - but a napi_threadsafe_function):
an object containing a ThreadSafeFunction cannot be destroyed. Calling Release will trigger an async callback on the ThreadSafeFunction object. You can't even use a dynamic object and delete it in the finalizer, since there is a wrapper that runs after your finalizer.

@KevinEady
Copy link
Contributor

Hi @mmomtchev ,

That logic does follow. However from what I'm seeing in the code for the finalize wrappers, these just call the finalizer with the data and context as decided in the TSFN::New call. The node-addon-api finalizer wrappers are static that take as its data this details::ThreadSafeFinalize type, which encapsulates the user-provided finalizer and data passed that are passed into New(). They do not access the ThreadSafeFunctions members themselves.

Are you using a finalizer callback or data that is held by the C++ object you are Release()ing from the destructor? If so, this makes sense to me that it will crash, as this memory is no longer valid when the finalizer wrapper gets called asynchronously after release. The TSFN finalizer callback and data must be alive throughout the existence of the TSFN.

Unless I am misunderstanding something...?

@mmomtchev
Copy link

Yes indeed they are static, it might have been the data then, I will try to revert to ThreadSafeFunction to check what was the problem with using a dynamic object - maybe I missed something. But you can't call Release in the destructor for sure - as it triggers an async callback - as the code in this issue does and this is not really a bug, it is the way it works.

@robinchrist
Copy link
Author

So what is the right way to release a TypedThreadSafeFunction that is held by an object that is destroyed then?

@KevinEady
Copy link
Contributor

Hi @robinchrist ,

In my threadsafe-function AsyncIterator example, I call Release() at the end of the std::thread's entry point.

@robinchrist
Copy link
Author

Yes, but that's not really a solution. The TSFN is held for a longer time and the objects which contain the TSFN are created and destroyed on the fly.

There must be a way to properly release a TSFN that is the member of an object inside the object's destructor?

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Sep 30, 2022
@KevinEady KevinEady removed the stale label Sep 30, 2022
@KevinEady
Copy link
Contributor

We discussed this in the 7 Oct Node API meeting.

My theory: Node destroys the TSFNs on Node's side before it frees / destroys the objects of the addon.

Both of the stacktraces posted above have a call to node::FreeEnvironment. We will need to look into node core to verify the above assumption. We will need to look at the process of how/when entities are cleaned up during environment destruction.

@mhdawson
Copy link
Member

@robinchrist is there any chance you could run under valgrind? That might give us more info on specifically what is leading to the crash. There are some instructions in https://github.com/nodejs/node/blob/main/doc/contributing/investigating-native-memory-leaks.md

@gregcotten
Copy link

Chiming in here to say I have this issue!

@legendecas
Copy link
Member

I've submitted an explicit document on the order of invocation of napi_finalize callbacks and the cleanup hooks at nodejs/node#45903. I'll compose an example of how to release nested resources properly at https://github.com/nodejs/node-addon-examples.

@KevinEady
Copy link
Contributor

We discussed this issue in the 24 Feb Node API team meeting.

There is a backing PR (nodejs/node#46692) in core to verify the order as described in nodejs/node#45903. Once the test PR has been merged, we can merge the documentation PR.

@gabrielschulhof
Copy link
Contributor

I think we need to thoroughly examine all the places where we NAPI_FATAL_IF_FAILED and decide if we need to add a NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS section. For example, at

NAPI_FATAL_IF_FAILED(status, "Error::Error", "napi_define_properties");
, we raise a fatal error if napi_define_properties() fails.

@BB-fat
Copy link

BB-fat commented May 25, 2023

It seems feasible to call TSFN.Release() within ObjectWrap::Finalize.

// jsCallback is an Napi::ThreadSafeFunction
void MyObject::Finalize(const Napi::Env env) {
    if (jsCallback) {
        jsCallback.Release();
        jsCallback = nil;
    }
}

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Aug 24, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants