Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon][Runtime] Add QuRT thread pool backend #11018

Merged
merged 31 commits into from
May 3, 2022

Conversation

supersat
Copy link
Contributor

@supersat supersat commented Apr 14, 2022

pthreads (and thus std::thread) does not work correctly on older versions of QuRT. This PR modifies threading_backend.cc to use the QuRT thread APIs directly when building for Hexagon devices.

@kparzysz-quic @areusch @csullivan

cc @mehrdadh

@github-actions github-actions bot requested a review from mehrdadh April 18, 2022 23:03
Copy link
Contributor

@huajsj huajsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment on lines 87 to 89
qurt_thread_t thread;
Callback f;
void* stack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qurt_thread_t thread_;
Callback f_;
void* stack_;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack_ = nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread_, f_, and stack_ have been renamed.

Now that the return value of posix_memalign is checked, stack_ is guaranteed to be initialized.

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Show resolved Hide resolved
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @supersat, one question for you

#endif
#include <algorithm>
#include <thread>
#define CURRENT_THREAD_HANDLE (static_cast<std::thread::native_handle_type>(0))
namespace tvm {
namespace runtime {
namespace threading {
#ifdef __hexagon__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if it's possible to do this with templating instead of with #ifdef __hexagon__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To a certain extent, although it's not clear it's worth the effort. As far as I can tell are a couple options:

  • Have multiple classes that implement the ThreadGroup::Impl interface -- one for pthreads, and one for QuRT threads. @csullivan was concerned that this might lead to duplicated code, making it more fragile to maintain.
  • Parameterizing ThreadPool::Impl on the underlying thread type. However, we'd still need #ifdefs to avoid calling SetAffinity, which isn't available on Hexagon.

Yield() also doesn't work as-is on Hexagon, so we'd need a fix for that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach would be to introduce

template <typename ThreadType>
class ThreadGroup::Impl;

and then wrap both std::thread and QuRT thread into types with a common interface that ThreadGroup::Impl calls into.

The thing I got hung up on there is the runtime dispatch. Right now it should be doable from the device type -- when it is kDLHexagon dispatch to ThreadGroup::Impl<QuRTThreadInterface> when it's kDLCPU dispatch to TheadGroup::Impl<StdThreadInterface>. However there is impetus to move Hexagon fully over to kDLCPU -- wherein we could no longer do runtime dispatch based on the device type.

Copy link
Contributor

@csullivan csullivan Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the only other options for dispatch I see are

(1) TVM compile-time dispatch -- during codegen, do something specific based on the target
(2) Build time dispatch via either preprocessor or through changes to the build system to conditionally include one translation unit over another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that abstracting thread control would be the best approach here. The runtime built for a particular target could then contain the implementation provided by that target (e.g. via template specialization), and there wouldn't be any run-time dependency on device kind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this PR to split up pthread and qurt threading implementations. This introduces the following changes:

  • ThreadGroup::Impl is now an abstract class.
  • A ThreadGroupImplTemplate class is introduced with common code between pthread and qurt implementations. It is a subclass of ThreadGroup::Impl. (AFAICT, you can't have a pointer to an unspecialized template class. We need a pointer to a concrete type for ThreadGroup to call.)
  • ThreadGroupPosixImpl now contains the bulk of the code from ThreadGroup::Impl. It inherits from ThreadGroupImplTemplatestd::thread. This is in a new file src/runtime/posix/threading_posix.cc.
  • ThreadGroupHexagonImpl inherits from ThreadGroupImplTemplate, which are both defined in a new file, src/runtime/hexagon/threading_hexagon.cc.
  • There are now two different versions of Yield()
  • CMakeLists.txt has been modified to either include src/runtime/posix/threading_posix.cc or src/runtime/hexagon/threading_hexagon.cc, depending on whether we're building for Hexagon.

Honestly, this seems like a pretty ugly solution, especially when threading_backend.cc is already littered with #ifdefs for various platforms. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch @kparzysz-quic PTAL. I'm inclined to take @supersat's recommendation if this approach is too ugly. But we'd like to keep forward progress on this one. Thanks!

src/runtime/threading_backend.cc Show resolved Hide resolved
CHECK_EQ(ret, QURT_EOK);
}
QuRTThread(QuRTThread&& other) : thread_(other.thread_), f_(other.f_), stack_(other.stack_) {
other.thread_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although line 73 already have the logic to avoid double free, after the construction from 'Rvalue' logically still need to clear member value(stack_, f_) of 'other', because these value already not owned by "other". doing this can help the future maintains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free(NULL) is defined to be a no-op, but I added an explicit check anyway.

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
src/runtime/threading_backend.cc Show resolved Hide resolved
src/runtime/threading_backend.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @areusch @kparzysz-quic please follow up on your above change requests. We'd like to converge on this early next week.

@@ -110,6 +110,37 @@ class ThreadGroup {
Impl* impl_;
};

class ThreadGroup::Impl {
public:
virtual void Join() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"virtual void Join() = 0" or {assert(0);} to make sure inherit class have a dedicated "Join" and get called.

virtual void Join() {}
virtual int Configure(AffinityMode mode, int nthreads, bool exclude_worker0,
std::vector<unsigned int> cpus) = 0;
virtual ~Impl() { Join(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like here is to expect ThreadGroupImplTemplate:Join get triggered, but that may not happen, because the calling is in a deconstruct function.

if (t.joinable()) t.join();
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ThreadGroupImplTemplate() { Join();}

@supersat supersat force-pushed the qurt-thread-pool branch from b4c4277 to 24f14f7 Compare May 2, 2022 04:17
@supersat
Copy link
Contributor Author

supersat commented May 2, 2022

After discussing with @csullivan, I've reverted the refactoring changes. Those changes relied on separate source files being built for separate targets, but PR #11090 introduces changes that build Hexagon sources for all targets, necessitating more #ifdefs.

Eventually the build system should only build target-specific sources when building for that target. There's a lot of cruft already in TVM because this isn't the case (e.g., #ifdef WIN32 in hexagon_buffer.cc and others). However, fixing that is a non-trivial task, so we plan to leave that to a separate PR.

Copy link
Contributor

@huajsj huajsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, about the refactor issue I agree with @supersat about that using a separate PR should can make the feature/commit be more clear, Thanks @supersat.

@csullivan csullivan requested review from areusch and kparzysz-quic May 2, 2022 18:20
@csullivan csullivan merged commit 530091a into apache:main May 3, 2022
@csullivan
Copy link
Contributor

Thanks @supersat @huajsj @areusch @kparzysz-quic, this PR is merged.

@supersat supersat deleted the qurt-thread-pool branch May 11, 2022 20:18
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* Initial take on adding QuRT thread support to TVM's thread pool. WIP; crashes

* Allocate QuRT thread stacks automatically

* Remove duplicate stack in QuRTThread

* Add more logging to QuRTThread

* Use QuRT mutexes and condition variables

* Get QuRT thread pools working perhaps

* Sleep for a little bit to let race condition bugs shine through

* ayeee it works!

* Remove custom hexagon implementations of std::mutex and std::condition_variable

* threading_backend.cc code cleanup

* Formatting changes

* remove hexagon debugging

* Initial take on adding QuRT thread support to TVM's thread pool. WIP; crashes

* Allocate QuRT thread stacks automatically

* Remove duplicate stack in QuRTThread

* Add more logging to QuRTThread

* Use QuRT mutexes and condition variables

* Get QuRT thread pools working perhaps

* Sleep for a little bit to let race condition bugs shine through

* ayeee it works!

* Remove custom hexagon implementations of std::mutex and std::condition_variable

* threading_backend.cc code cleanup

* Formatting changes

* remove hexagon debugging

* Add hexagon thread pool test

* style fixes for tests/python/contrib/test_hexagon/test_thread_pool.py

* Fix some style issues

* Address some reviewer comments
SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022
* Initial take on adding QuRT thread support to TVM's thread pool. WIP; crashes

* Allocate QuRT thread stacks automatically

* Remove duplicate stack in QuRTThread

* Add more logging to QuRTThread

* Use QuRT mutexes and condition variables

* Get QuRT thread pools working perhaps

* Sleep for a little bit to let race condition bugs shine through

* ayeee it works!

* Remove custom hexagon implementations of std::mutex and std::condition_variable

* threading_backend.cc code cleanup

* Formatting changes

* remove hexagon debugging

* Initial take on adding QuRT thread support to TVM's thread pool. WIP; crashes

* Allocate QuRT thread stacks automatically

* Remove duplicate stack in QuRTThread

* Add more logging to QuRTThread

* Use QuRT mutexes and condition variables

* Get QuRT thread pools working perhaps

* Sleep for a little bit to let race condition bugs shine through

* ayeee it works!

* Remove custom hexagon implementations of std::mutex and std::condition_variable

* threading_backend.cc code cleanup

* Formatting changes

* remove hexagon debugging

* Add hexagon thread pool test

* style fixes for tests/python/contrib/test_hexagon/test_thread_pool.py

* Fix some style issues

* Address some reviewer comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants