Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiplayerSynchronizer get_cached_object miss returning ERR_UNAUTHORIZED on server relay #58193

Closed
MitchMakesThings opened this issue Feb 16, 2022 · 7 comments · Fixed by #58400

Comments

@MitchMakesThings
Copy link

Godot version

Godot Engine v4.0.alpha2.official.79077e6c1

System information

Windows 10, Vulkan API 1.2.0

Issue description

Using the new MultiplayerSynchronizer node throws errors when clients have authority, and a server is relaying from one client to another. This isn't an issue when the server has authority over MultiplayerSynchronizer.

On clients, I get:
E 0:00:13:0487 get_cached_object: ID 2 not found in cache of peer 1719405227. <C++ Error> Condition "!F" is true. Returning: nullptr <C++ Source> scene/multiplayer/scene_cache_interface.cpp:235 @ get_cached_object()
E 0:00:13:0487 on_sync_receive: Condition "!sync || sync->get_multiplayer_authority() != p_from" is true. Returning: ERR_UNAUTHORIZED <C++ Source> scene/multiplayer/scene_replication_interface.cpp:381 @ on_sync_receive()
The latter is caused by the former returning a nullptr. I started looking into the cached object stuff, but I'm not really familiar with the Godot codebase.

Unrelated, but the server also logs an error when closing one of the clients:
E 0:00:30:0819 _relay: Condition "!peers.has(p_to)" is true. <C++ Source> modules/enet/enet_multiplayer_peer.cpp:601 @ _relay()
There might not be anything different to do here - client A has asked the server to relay a message to client B, but B has disconnected. Logging and forgetting about the message seems sensible.

Steps to reproduce

Use the multiplayer demo project @Faless uploaded on #55950

  1. Open 3 instances
  2. Host from 1, join from the other 2.
  3. Start the game from the server instance
  4. Watch as errors start flooding in
  5. Close one client instance and note the server error

Minimal reproduction project

multiplayer_bomber_spawn_node.zip

@Calinou Calinou added this to the 4.0 milestone Feb 16, 2022
@Faless
Copy link
Collaborator

Faless commented Feb 16, 2022

Interestingly, this only happens when there is more than one client connected to the server, so my guess is a bug in the way path-only sync (i.e. synchronizers with a root object that was not spawned directly) are exchanged. Will need some further investigation, but confirmed.

@MitchMakesThings
Copy link
Author

I've found another slightly different use case that seems to do the same thing, and is the smallest example I can think of.
SyncBug.zip

Here we've got a node that is part of the default scene in the game, so it's loaded on clients by default (no MultiplayerSpawner spawning it across the network!).
No errors with just server & 1 client, but once you have more than one client all the clients start reporting the error.

Replication steps:
Run 1 instance as the server, by passing the --server command line argument (ie, C:\Users\Mitch\Documents\godot\Godot_v4.0-alpha2_win64.exe --path . --server - there's also a batch file in the zip)
Run 2 instances as client (ie, the cool new Debug > Run multiple instances option :) )

Errors should start piling up on both client instances.

@Faless Faless self-assigned this Feb 16, 2022
@Faless Faless moved this to To Assess in 4.x Priority Issues Feb 16, 2022
@Faless Faless moved this from To Assess to Todo in 4.x Priority Issues Feb 16, 2022
@MitchMakesThings
Copy link
Author

I tried looking into this a little further.
I want to blame


If I'm understanding correctly, for path-only nodes the net_id will get populated when we first sync to the first client - so we won't call send_object_cache for subsequent clients.

I don't fully have my head around the net_id stuff. My first instinct was to replace the if (net_id == 0) { check with if (!multiplayer->is_cache_confirmed(rel_path, p_peer)) {.
This "works" - using the bomberman demo the movements of all clients & the server are synced as expected. And looking at send_object_cache we're getting the net_id from path_send_cache, so I don't think this change would break the net_id.

However, it does raise a warning (

ERR_FAIL_COND_V(!F, false); // Should never happen.
) and I'm not confident enough to say the warning could be removed.

Am I heading in the right direction?

MitchMakesThings added a commit to MitchMakesThings/Godot-Things that referenced this issue Feb 20, 2022
… network authority, per discussion with Faless.

Note: This relies on a custom Godot build, due to path-only sync bug with Alpha 2 (godotengine/godot#58193)
@Faless
Copy link
Collaborator

Faless commented Feb 21, 2022

@MitchMakesThings thanks a lot for looking into this!
You definitely are in the right direction, but always setting the net_id based on the cache id will cause problems with spawned nodes.
The cache system was developed separately for RPCs (since 3.0) and it's now used by the replication in conjunction with its own net id assigned during spawn (which is delivered along with the spawn state to minimize delay).
At some point, the two should be unified, but as a stop-gap to fix this issue we can try this:

diff --git a/scene/multiplayer/scene_replication_interface.cpp b/scene/multiplayer/scene_replication_interface.cpp
index 25a704b37e..0764f136e4 100644
--- a/scene/multiplayer/scene_replication_interface.cpp
+++ b/scene/multiplayer/scene_replication_interface.cpp
@@ -350,11 +350,12 @@ void SceneReplicationInterface::_send_sync(int p_peer, uint64_t p_msec) {
                }
                if (size) {
                        uint32_t net_id = rep_state->get_net_id(oid);
-                       if (net_id == 0) {
+                       if (net_id == 0 || (net_id & 0x80000000)) {
                                // First time path based ID.
                                NodePath rel_path = multiplayer->get_root_path().rel_path_to(sync->get_path());
                                int path_id = 0;
                                multiplayer->send_object_cache(sync, rel_path, p_peer, path_id);
+                               ERR_CONTINUE_MSG(net_id && net_id != (uint32_t(path_id) | 0x80000000), "This should never happen!");
                                net_id = path_id;
                                rep_state->set_net_id(oid, net_id | 0x80000000);
                        }

@MitchMakesThings
Copy link
Author

Man, I definitely need to brush up on my C++!
So 0x80000000 is just an arbitrary bit to flag whether something has been processed by scene replication or not?

Doesn't that mean at the bottom of the if statement, when we call rep_state->set_net_id(oid, net_id | 0x80000000); that we're going to set the processed flag?
So the if statement might as well be if(true) since first client will be net_id == 0 and subsequent clients will have (net_id & 0x80000000)?

Wouldn't we need to know whether it's been cached for that particular p_peer?
Or is that what you mean about needing to unify the cache system & net_id system in future, and as a stop gap we'll just do the "First time" logic every time?

Is the idea of net_id that it's the unique identifier for a node across all peers? That way we have a nice small number as an ID, to keep packet sizes down?
Sorry for the barrage of questions, it's taking me a while to wrap my head around everything :)

@Faless
Copy link
Collaborator

Faless commented Feb 21, 2022

So 0x80000000 is just an arbitrary bit to flag whether something has been processed by scene replication or not?

0x80000000 is a bit flag used internally by the SceneReplicationInterface to distinguish net_ids assigned during spawn from those assigned via the cache interface (aka path_only sync - i.e. those that have a MultiplayerSyncronizer which root object is not spawned by a MultiplayerSpawner).

What the if is doing, is checking if the net_id is empty or an assigned path_only ID.
In that case, ensure the object cache has been sent since it must be a path_only sync and send_object_cache only resend when needed, reusing the same id across multiple peers for the given node, although that check is very unoptimized for now.

@MitchMakesThings
Copy link
Author

Ah, that's the bit I was missing!
So we'll be calling send_object_cache all the time for path-only nodes, but it only sends the simplify path packet to clients that haven't confirmed yet.

Thanks for the explanation!

I've tested your change against the two example projects in this thread, and it seems to be working perfectly :)

@Faless Faless moved this from Todo to Done in 4.x Priority Issues Feb 21, 2022
@Faless Faless moved this from Done to In Progress in 4.x Priority Issues Feb 21, 2022
Repository owner moved this from In Progress to Done in 4.x Priority Issues Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants