Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults due to executor generation in examples #594

Closed
upsj opened this issue Jul 17, 2020 · 4 comments · Fixed by #602
Closed

Segfaults due to executor generation in examples #594

upsj opened this issue Jul 17, 2020 · 4 comments · Fixed by #602
Labels
is:bug Something looks wrong. is:good-first-issue Good for newcomers. reg:example This is related to the examples.

Comments

@upsj
Copy link
Member

upsj commented Jul 17, 2020

Some of our examples use an executor map

std::map<std::string, std::shared_ptr<gko::Executor>> exec_map{
    {"omp", omp},
    {"cuda", gko::CudaExecutor::create(0, omp, true)},
    {"hip", gko::HipExecutor::create(0, omp, true)},
    {"reference", gko::ReferenceExecutor::create()}};

Executing these tests with HIP and CUDA compiled causes a segmentation fault in my tests, probably due to the CUDA and HIP executor destroying the same device and/or library handles (hipSPARSE etc.)

This should be fixed by only creating the executor on demand, like in the benchmark examples (replacing std::shared_ptr<...> by std::function<std::shared_ptr<...>()> and using lambdas)

@upsj upsj added is:bug Something looks wrong. is:good-first-issue Good for newcomers. reg:example This is related to the examples. labels Jul 17, 2020
@upsj upsj changed the title Fix Segfaults due to executor generation in examples Segfaults due to executor generation in examples Jul 17, 2020
@yhmtsai
Copy link
Member

yhmtsai commented Jul 17, 2020

The issue should be caused by device_reset if run test with cuda/hip on the same device.
When the first one of cuda/hip is destroyed, it call the device_reset because it own num_execs is zero.
Thus, the running stuff on the second one will run on the empty.

I discussed it with @tcojean before, and he has some idea to fix it.

@upsj
Copy link
Member Author

upsj commented Jul 17, 2020

I vaguely recall us also having some issues with cuSPARSE handles being destroyed twice or something similar? Or was that also caused by the device reset?

@greole
Copy link
Collaborator

greole commented Jul 28, 2020

I created a PR. But I was wondering, since creating executors by its name is such a common operation, if it should be moved to a dedicated function?

@upsj
Copy link
Member Author

upsj commented Jul 29, 2020

I'm not entirely sure about this. Since these are examples, they should be as self-contained as possible, and I'm not sure whether users would need this functionality in the library.

tcojean added a commit that referenced this issue Aug 7, 2020
Create executor only if requested using lambda functions inside the `exec_map`.

Closes: #594
Related PR: #602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug Something looks wrong. is:good-first-issue Good for newcomers. reg:example This is related to the examples.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants