Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncProgressWorker::OnProgress is invoked without arg sometimes #821

Closed
zbjornson opened this issue Oct 12, 2020 · 6 comments
Closed

AsyncProgressWorker::OnProgress is invoked without arg sometimes #821

zbjornson opened this issue Oct 12, 2020 · 6 comments
Labels

Comments

@zbjornson
Copy link

zbjornson commented Oct 12, 2020

I've got a basic AsyncProgressWorker that's very similar to the example. I'm invoking progress.Send(thing, 1) a few hundred times per second, and seeing OnProgress invoked sometimes with (info=nullptr, count=0):

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00007ffff523ae45 in UmapWorker::OnProgress (this=0x486e420, info=0x0, count=<optimized out>) at /home/zbjornson/node-cuml/node_modules/node-addon-api/napi-inl.h:4248
4248      return Napi::Env(_env);
(gdb) bt
#0  0x00007ffff523ae45 in UmapWorker::OnProgress (this=0x486e420, info=0x0, count=<optimized out>) at /home/zbjornson/node-cuml/node_modules/node-addon-api/napi-inl.h:4248
#1  0x00007ffff52354ef in Napi::AsyncProgressWorker<ProgressInfo>::OnWorkProgress (this=0x486e420) at /usr/include/x86_64-linux-gnu/c++/8/bits/gthr-default.h:778
#2  Napi::AsyncProgressWorkerBase<void>::OnAsyncWorkProgress (data=<optimized out>) at /home/zbjornson/node-cuml/node_modules/node-addon-api/napi-inl.h:4732
#3  0x00007ffff5237083 in std::function<void (Napi::Env, Napi::Function)>::operator()(Napi::Env, Napi::Function) const (__args#1=..., __args#0=..., this=0x7fff466e62f0) at /usr/include/c++/8/bits/std_function.h:682
#4  Napi::ThreadSafeFunction::CallJS (env=0x47e10c0, jsCallback=0x45443b8, data=0x7fff466e62f0) at /home/zbjornson/node-cuml/node_modules/node-addon-api/napi-inl.h:4667
#5  0x0000000000a0d738 in v8impl::(anonymous namespace)::ThreadSafeFunction::IdleCb(uv_idle_s*) ()
#6  0x00000000013772e9 in uv__run_idle (loop=loop@entry=0x4444220 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:68
#7  0x0000000001370110 in uv_run (loop=0x4444220 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:374
#8  0x0000000000a7ab94 in node::NodeMainInstance::Run() ()
#9  0x0000000000a0a861 in node::Start(int, char**) ()
#10 0x00007ffff7a780b3 in __libc_start_main (main=0x9a3cf0 <main>, argc=3, argv=0x7fffffffdfa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdf98) at ../csu/libc-start.c:308
#11 0x00000000009a526c in _start ()

It happens with less frequent invocations also, but far less frequently.

That's easy enough to guard for, but is that expected? I don't see it mentioned in the docs.

Edit 13-Oct I can't find a way to reproduce this by modifying the example. In my real scenario, the async work is CUDA code. Not sure if that's different in some critical way from sleeping the thread. Here's a pair of strace captures from when count=0 (top) and count=1 (bottom):

getpid()                                = 5091
futex(0x45bd1a4, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x45bd150, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x45bd150, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x45bd1a0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x45bd150, FUTEX_WAKE_PRIVATE, 1) = 1
getpid()                                = 5091
futex(0x44441ec, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x4444180, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 2886) = 1
read(16, "\2\0\0\0\0\0\0\0", 1024)      = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 2886) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x45e2b48, FUTEX_WAKE_PRIVATE, 1) = 0
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 2183) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 2182) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
futex(0x45e2b48, FUTEX_WAKE_PRIVATE, 1) = 0
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 2182) = 1
read(16, "\1\0\0\0\0\0\0\0", 1024)      = 8
write(1, "count=0\n", 8)                = 8
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()                                = 5091
gettid()                                = 5091
tgkill(5091, 5091, SIGABRT)             = 0
getpid()                                = 5005
futex(0x5c421a0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5c421a0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5c421a0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5c42150, FUTEX_WAKE_PRIVATE, 1) = 1
getpid()                                = 5005
futex(0x44441ec, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x4444180, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 5917) = 1
read(16, "\2\0\0\0\0\0\0\0", 1024)      = 8
write(1, "count=1\n", 8)                = 8
epoll_wait(13, strace: Process 5005 detached
 <detached ...>
@mhdawson
Copy link
Member

mhdawson commented Nov 2, 2020

@zbjornson can you share the code? Since you can't recreate with the sample, its probably something in the code that you've not shared in the issue.

@zbjornson
Copy link
Author

@mhdawson here's a repro minimally modified from the example (only Execute and OnProgress modified)

#include <chrono>
#include <thread>
#include <string>

using namespace Napi;

struct ProgressInfo {
  float progress;
  const char* status;
};

class EchoWorker : public AsyncProgressWorker<ProgressInfo> {
    public:
        EchoWorker(Function& callback, std::string& echo)
        : AsyncProgressWorker(callback), echo(echo) {}

        ~EchoWorker() {}

    void Execute(const ExecutionProgress& progress) {
        for (uint32_t i = 0; i < 500; ++i) {
          for (uint32_t j = 0; j < 5; ++j) {
            ProgressInfo pi{i * j, "running"};
            printf("sending %f\n", 1.0f * i * j);
            progress.Send(&pi, 1);
          }
          std::this_thread::sleep_for(std::chrono::milliseconds(2));
        }
    }

    void OnOK() {
        HandleScope scope(Env());
        Callback().Call({Env().Null(), String::New(Env(), echo)});
    }

    void OnProgress(const ProgressInfo* data, size_t count) {
        if (count == 1) {
          printf("progress %f\n", data->progress);
        } else {
          printf("count is 0!\n"); abort();
        }
        HandleScope scope(Env());
        Callback().Call({Env().Null(), Env().Null(), Number::New(Env(), data->progress)});
    }

    private:
        std::string echo;
};


Value Echo(const CallbackInfo& info) {
    // We need to validate the arguments here
    Function cb = info[1].As<Function>();
    std::string in = info[0].As<String>();
    EchoWorker* wk = new EchoWorker(cb, in);
    wk->Queue();
    return info.Env().Undefined();
}

Napi::Object Init(Napi::Env env, Napi::Object exports) {
  exports.Set(Napi::String::New(env, "Echo"), Napi::Function::New(env, Echo));
  return exports;
}
mymodule.Echo("a", console.log);

This prints something like:

...
null null 892
sending 0.000000
sending 224.000000
sending 448.000000
sending 672.000000
sending 896.000000
progress 896.000000
null null 896
count is 0!
Aborted (core dumped)

@mhdawson
Copy link
Member

mhdawson commented Nov 4, 2020

Can you give me all the files for the cut down example needed to build/run?

Possibly not relevant, I'm wondering if there is something going on with status part in

struct ProgressInfo {
  float progress;
  const char* status;
};

progress.Send(&pi, 1) is going to make a copy of pi, but not a deep copy of the status field (that might be just fine but just not 100% of the lifetime for the string defined in the initializer). I see that is not referred elsewhere in the code, do you still see the problem if you remove that field from ProgressInfo?

@zbjornson
Copy link
Author

Sure, full repo here: https://gist.github.com/zbjornson/5edc22bb7381e4c2a1f00aee44abe4b7 (or git clone git@gist.github.com:5edc22bb7381e4c2a1f00aee44abe4b7.git).

I pared it down further so it's just AsyncProgressWorker<float>.

@legendecas
Copy link
Member

legendecas commented Nov 8, 2020

@zbjornson thanks for your detailed re-produce and reporting here!

I investigated the re-produce and found out the issue may be related to the timing of SendProgress been called during two async callback of thread-safe function:

javascript thread worker thread
- send progress (A), activate uv_async_t
uv_async_t async callback before calling into module -
- send progress (B), override (A) before being read, re-activate uv_async_t
calling into module, AsyncProgressWorker read data (B) and clear out -
uv_async_t async callback before calling into module

-
calling into module, AsyncProgressWorker reads nothing -

As far as I can tell, the AsyncProgressWorker guarantees at least once of the progress callback. However, in the issue, the callback was called with unexpected data (data with nullptr and count with 0 - never been sent with that progress), so this is a confirmed bug.

I'll file a fix.

@legendecas legendecas added the bug label Nov 8, 2020
@zbjornson
Copy link
Author

Thank you!

kevindavies8 added a commit to kevindavies8/node-addon-api-Develop that referenced this issue Aug 24, 2022
Fixes: nodejs/node-addon-api#821
PR-URL: nodejs/node-addon-api#853
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Marlyfleitas added a commit to Marlyfleitas/node-api-addon-Development that referenced this issue Aug 26, 2022
Fixes: nodejs/node-addon-api#821
PR-URL: nodejs/node-addon-api#853
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
wroy7860 added a commit to wroy7860/addon-api-benchmark-node that referenced this issue Sep 19, 2022
Fixes: nodejs/node-addon-api#821
PR-URL: nodejs/node-addon-api#853
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
johnfrench3 pushed a commit to johnfrench3/node-addon-api-git that referenced this issue Aug 11, 2023
Fixes: nodejs/node-addon-api#821
PR-URL: nodejs/node-addon-api#853
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants