Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc : resource management rework #7562

Merged
merged 2 commits into from
May 28, 2024
Merged

Conversation

rgerganov
Copy link
Collaborator

This patch tries to address the concerns raised in PR #7435. We track how many times an RPC backend is referenced and we deallocate its resources when this count becomes 0. The reference count is increased when a new RPC buffer is allocated or ggml_backend_rpc_init() is called. Respectively it is decreased when an RPC buffer is freed or ggml_backend_rpc_free() is called.

The implementation is not thread-safe. I will address thread-safety in a follow-up patch.

@mofosyne mofosyne added refactoring Refactoring Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level and removed Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 27, 2024
ggml-rpc.cpp Outdated
delete rpc_ctx->buft;
delete rpc_ctx;
delete backend;
instances.erase(endpoint);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we lock the instances here? will we access it from different thread simultaneously?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I said in the description this implementation is not thread-safe; I'd like to get some feedback on the reference count approach and then I will add thread-safety

ggml-rpc.cpp Outdated
@@ -96,27 +96,37 @@ static ggml_guid_t ggml_backend_rpc_guid() {
return &guid;
}

struct ggml_backend_rpc_buffer_type_context {
struct rpc_backend {
int ref_count;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks we have 2 reference counts for same object: 1. rpc_backend.ref_count, 2, inside the rpc_backend_ptr.
IMO, better to merge then together.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can try something link intrusive shared pointer: std::enable_shared_from_this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that the correct way to do this would be to change the pointers in instances to weak_ptr, then the backend will be automatically freed by shared_ptr on the last instance. create_rpc_backend would need to check if the weak_ptr in instances is still alive.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that could be the best, inside the instances we maintain a weak_ptr<rpc_backend>, without increase the reference count, and then at the create_rpc_backend we obtain shared_ptr from weak_ptr.
this also save the erase call at free_rpc_backend

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is not that simple. We keep a backend reference in ggml_backend_rpc_buffer_type_context and we use this reference when allocating new buffers. If we free this reference in ggml_backend_rpc_free() then we won't be able to allocate new buffers.

Copy link

@chraac chraac May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we have several structure have a reference to rpc_backend here:

  1. ggml_backend_rpc_buffer_type_context
  2. ggml_backend_rpc_buffer_context
  3. ggml_backend_rpc_context
  4. instances.

first 1-3 should have a strong reference to let them allocate buffer after ggml_backend_rpc_free call explicitly - that's the case you mention.
and for the 4, i think we can be make it as weak_ptr since the life-span of ggml_backend_rpc_context is expected longer than the rpc_backend?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we free the reference which is kept in ggml_backend_rpc_buffer_type_context? If you free this reference, then you can no longer allocate buffers. If you don't free this reference, then the backend will never be freed because of this reference.

I don't see how to solve this without explicit reference count. I also don't see how weak_ptr can help here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer types are never freed, I think it would complicate the user code too much to require buffer types to be freed, for no real benefit. In the RPC backend, buffer types only need a connection during initialization and during buffer allocation. You can make a connection during initialization to obtain the alignment and max size, drop it, and only open it again once a buffer is allocated, and then rely on the shared_ptr to keep it alive until all buffers and backend instances have been freed.

This is probably complicated because the RPC backend uses a ggml_backend instance to represent a connection, but connections and ggml_backend backend instances should probably be different concepts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's say that the connection is not tied to the backend instance and we have a separate entity (e.g. rpc_connection) which is being referenced with shared_ptr. How do we obtain a shared_ptr to the connection when a new buffer is allocated?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like this should work:

std::shared_ptr<connection_t> get_connection(const std::string & endpoint) {
    static std::map<std::string, std::weak_ptr<connection_t>> connections;

    auto it = connections.find(endpoint);
    if (it != connections.end()) {
        if (auto connection = it->second.lock()) {
            return connection;
        }
    }

    auto connection = std::make_shared<connection_t>(endpoint);
    connections[endpoint] = connection;
    return connection;
}

Use this function when you need to create a new buffer or backend instance. You can also use it during initialization of the buffer type, just use a local shared_ptr in the function only, but don't keep the instance.

@rgerganov
Copy link
Collaborator Author

I have decoupled the backend and its corresponding socket. Sockets are cached and shared between RPC buffers.

ggml-rpc.cpp Outdated Show resolved Hide resolved
@chraac
Copy link

chraac commented May 28, 2024

Nice work!

@rgerganov rgerganov merged commit 2b737ca into ggerganov:master May 28, 2024
65 checks passed
@chraac
Copy link

chraac commented May 28, 2024

draw a picture for the backend objects here, for better understanding:

image

digraph rpc_objects {
    graph [splines=ortho];
    node [shape=record];

    ggml_backend_rpc_buffer_context [label="{ggml_backend_rpc_buffer_context|sock: std::shared_ptr\<socket_t\>\lbase_cache: std::unordered_map\<ggml_backend_buffer_t, void *\>\l|ggml_backend_rpc_buffer_type_alloc_buffer()\l}"];
    ggml_backend_rpc_context [label="{ggml_backend_rpc_context|endpoint: std::string\lname: std::string\l|ggml_backend_rpc_graph_compute()\l}"];
    ggml_backend_rpc_buffer_type_context [label="{ggml_backend_rpc_buffer_type_context|endpoint: std::string\lname: std::string\lalignment: size_t\lmax_size: size_t\l|ggml_backend_rpc_buffer_type()\l}"];
    socket_t [label="{socket_t|+ fd: sockfd_t\l|}"];

    ggml_backend_rpc_buffer_context -> socket_t [label="strong reference" style=solid];
    ggml_backend_rpc_context -> "get_socket(const std::string &)::sockets" [label="get 'socket_t' by 'endpoint'" style=dashed];
    ggml_backend_rpc_buffer_type_context -> "get_socket(const std::string &)::sockets" [label="get 'socket_t' by 'endpoint'" style=dashed];
    "get_socket(const std::string &)::sockets" -> socket_t [label="weak reference" style=dashed];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
Development

Successfully merging this pull request may close these issues.

4 participants