rpc : resource management rework #7562

rgerganov · 2024-05-27T11:10:01Z

This patch tries to address the concerns raised in PR #7435. We track how many times an RPC backend is referenced and we deallocate its resources when this count becomes 0. The reference count is increased when a new RPC buffer is allocated or ggml_backend_rpc_init() is called. Respectively it is decreased when an RPC buffer is freed or ggml_backend_rpc_free() is called.

The implementation is not thread-safe. I will address thread-safety in a follow-up patch.

chraac · 2024-05-27T12:12:14Z

ggml-rpc.cpp

+    delete rpc_ctx->buft;
+    delete rpc_ctx;
+    delete backend;
+    instances.erase(endpoint);


should we lock the instances here? will we access it from different thread simultaneously?

as I said in the description this implementation is not thread-safe; I'd like to get some feedback on the reference count approach and then I will add thread-safety

chraac · 2024-05-27T12:30:07Z

ggml-rpc.cpp

@@ -96,27 +96,37 @@ static ggml_guid_t ggml_backend_rpc_guid() {
    return &guid;
 }

-struct ggml_backend_rpc_buffer_type_context {
+struct rpc_backend {
+    int ref_count;


looks we have 2 reference counts for same object: 1. rpc_backend.ref_count, 2, inside the rpc_backend_ptr.
IMO, better to merge then together.

maybe we can try something link intrusive shared pointer: std::enable_shared_from_this

Yes, I think that the correct way to do this would be to change the pointers in instances to weak_ptr, then the backend will be automatically freed by shared_ptr on the last instance. create_rpc_backend would need to check if the weak_ptr in instances is still alive.

yeah, that could be the best, inside the instances we maintain a weak_ptr<rpc_backend>, without increase the reference count, and then at the create_rpc_backend we obtain shared_ptr from weak_ptr.
this also save the erase call at free_rpc_backend

Unfortunately it is not that simple. We keep a backend reference in ggml_backend_rpc_buffer_type_context and we use this reference when allocating new buffers. If we free this reference in ggml_backend_rpc_free() then we won't be able to allocate new buffers.

IIRC, we have several structure have a reference to rpc_backend here:

ggml_backend_rpc_buffer_type_context

ggml_backend_rpc_buffer_context

ggml_backend_rpc_context

instances.

first 1-3 should have a strong reference to let them allocate buffer after ggml_backend_rpc_free call explicitly - that's the case you mention.
and for the 4, i think we can be make it as weak_ptr since the life-span of ggml_backend_rpc_context is expected longer than the rpc_backend?

When do we free the reference which is kept in ggml_backend_rpc_buffer_type_context? If you free this reference, then you can no longer allocate buffers. If you don't free this reference, then the backend will never be freed because of this reference.

I don't see how to solve this without explicit reference count. I also don't see how weak_ptr can help here.

Buffer types are never freed, I think it would complicate the user code too much to require buffer types to be freed, for no real benefit. In the RPC backend, buffer types only need a connection during initialization and during buffer allocation. You can make a connection during initialization to obtain the alignment and max size, drop it, and only open it again once a buffer is allocated, and then rely on the shared_ptr to keep it alive until all buffers and backend instances have been freed.

This is probably complicated because the RPC backend uses a ggml_backend instance to represent a connection, but connections and ggml_backend backend instances should probably be different concepts.

ok, let's say that the connection is not tied to the backend instance and we have a separate entity (e.g. rpc_connection) which is being referenced with shared_ptr. How do we obtain a shared_ptr to the connection when a new buffer is allocated?

I think something like this should work:

std::shared_ptr<connection_t> get_connection(const std::string & endpoint) { static std::map<std::string, std::weak_ptr<connection_t>> connections; auto it = connections.find(endpoint); if (it != connections.end()) { if (auto connection = it->second.lock()) { return connection; } } auto connection = std::make_shared<connection_t>(endpoint); connections[endpoint] = connection; return connection; }

Use this function when you need to create a new buffer or backend instance. You can also use it during initialization of the buffer type, just use a local shared_ptr in the function only, but don't keep the instance.

rgerganov · 2024-05-28T09:05:43Z

I have decoupled the backend and its corresponding socket. Sockets are cached and shared between RPC buffers.

ggml-rpc.cpp

chraac · 2024-05-28T12:17:32Z

Nice work!

chraac · 2024-05-28T16:07:31Z

draw a picture for the backend objects here, for better understanding:

digraph rpc_objects {
    graph [splines=ortho];
    node [shape=record];

    ggml_backend_rpc_buffer_context [label="{ggml_backend_rpc_buffer_context|sock: std::shared_ptr\<socket_t\>\lbase_cache: std::unordered_map\<ggml_backend_buffer_t, void *\>\l|ggml_backend_rpc_buffer_type_alloc_buffer()\l}"];
    ggml_backend_rpc_context [label="{ggml_backend_rpc_context|endpoint: std::string\lname: std::string\l|ggml_backend_rpc_graph_compute()\l}"];
    ggml_backend_rpc_buffer_type_context [label="{ggml_backend_rpc_buffer_type_context|endpoint: std::string\lname: std::string\lalignment: size_t\lmax_size: size_t\l|ggml_backend_rpc_buffer_type()\l}"];
    socket_t [label="{socket_t|+ fd: sockfd_t\l|}"];

    ggml_backend_rpc_buffer_context -> socket_t [label="strong reference" style=solid];
    ggml_backend_rpc_context -> "get_socket(const std::string &)::sockets" [label="get 'socket_t' by 'endpoint'" style=dashed];
    ggml_backend_rpc_buffer_type_context -> "get_socket(const std::string &)::sockets" [label="get 'socket_t' by 'endpoint'" style=dashed];
    "get_socket(const std::string &)::sockets" -> socket_t [label="weak reference" style=dashed];
}

rgerganov mentioned this pull request May 27, 2024

rpc: remove backend handle from global map when freed #7517

Closed

chraac reviewed May 27, 2024

View reviewed changes

rpc : resource management rework

55a7834

rgerganov force-pushed the rpc-mngmt branch from 2ce50d1 to 55a7834 Compare May 28, 2024 09:03

slaren reviewed May 28, 2024

View reviewed changes

ggml-rpc.cpp Outdated Show resolved Hide resolved

slaren approved these changes May 28, 2024

View reviewed changes

address review comments

c1c99a3

chraac approved these changes May 28, 2024

View reviewed changes

rgerganov merged commit 2b737ca into ggerganov:master May 28, 2024
65 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc : resource management rework #7562

rpc : resource management rework #7562

rgerganov commented May 27, 2024

chraac May 27, 2024

rgerganov May 27, 2024

chraac May 27, 2024

chraac May 27, 2024

slaren May 27, 2024

chraac May 27, 2024

rgerganov May 27, 2024

chraac May 27, 2024 •

edited

Loading

rgerganov May 27, 2024

slaren May 27, 2024

rgerganov May 27, 2024

slaren May 27, 2024

rgerganov commented May 28, 2024

chraac commented May 28, 2024

chraac commented May 28, 2024

rpc : resource management rework #7562

rpc : resource management rework #7562

Conversation

rgerganov commented May 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chraac May 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgerganov commented May 28, 2024

chraac commented May 28, 2024

chraac commented May 28, 2024

chraac May 27, 2024 •

edited

Loading