Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networked Replication #19

Closed
wants to merge 43 commits into from
Closed

Networked Replication #19

wants to merge 43 commits into from

Conversation

maniwani
Copy link

RENDERED

Proposes an implementation of engine features for developing networked games. Main focus is replication with key interest in providing it transparently (i.e. minimal, if any, networking boilerplate).

implementation_details.md Outdated Show resolved Hide resolved
@vladbat00
Copy link

vladbat00 commented Apr 27, 2021

Can't add anything useful at the moment, but just want to thank you for the effort you've put into this RFC. I think this might turn into something quite unique among general-purpose engines and extremely valuable for multiplayer game developers.

I've been developing my own prototype of a multiplayer game in Bevy. I'm by no means an expert in writing netcode, but so far this is my most successful attempt. I thought I could share it, probably it would give some insights on what might be needed for a multiplayer game, what patterns we could incorporate into the engine and make using them easier, or what patterns to avoid. :)

vladbat00/muddle-run#7 (still work in progress, but it already works for desktop and is possible to play around with)

cargo run -p mr_server --features "use-udp"
cargo run -p mr_desktop_client

I've implemented the following features:

  • Entities interpolation
  • Storing framebuffers of the components that are synced
  • Lag compensation
  • Rewinding world state
  • Adjustable client speed (inspired by the Overwatch GDC presentation, to mitigate latency fluctuations)

I don't pretend like they are perfectly executed and don't have any bugs.. But at least one can control the spawned cubes with WASD, although the movement is a bit jittery.

## "Clock" Synchronization
Ideally, clients predict ahead by just enough to have their inputs reach the server right before they're needed. People often try to have clients estimate the clock time on the server (with some SNTP handshake) and use that to schedule the next simulation step, but that's overly complex.

What we really care about is: How much time passes between when the server receives my input and when that input is consumed? If the server simply tells clients how long their inputs are waiting in its buffer, the clients can use that information to converge on the correct lead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can clients just estimate by how much they need to be ahead by calculating this value based on RTT? They can usually guess the current frame number on server (assuming it comes with updates), they know how much time on average it takes for their packets to be acknowledged and they can also estimate jitter and packet loss.

Or is it a good practice for servers to tell by how much a client has to be ahead explicitly?

Copy link
Author

@maniwani maniwani Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO guessing is more work, not less.

What we're trying to control is the predict-ahead amount. It's simpler and more accurate to just do that directly.

RTT can only be approximated whereas the server can measure the exact duration inputs spend in its buffer (more informative too since it "contains" both latency and jitter). Loss can be interpreted as a negative duration. Loss doesn't need to be estimated either.

Or is it a good practice for servers to tell by how much a client has to be ahead explicitly?

This can be an entirely client-sided process. In my example, the server only tells clients how much they are ahead (sent with the updates). Clients could decide for themselves how much to be ahead.


3. Always rollback and re-simulate.

Now, if you're thinking that's wasteful, the "if mispredicted" gives you a false sense of security. If I make a game and claim it can rollback 250ms, that basically should mean *any* 250ms, with no stuttering. If clients *always* rollback and re-sim, it'll be easier to profile and optimize for that. As a bonus, clients never need to store old predicted states.
Copy link

@vladbat00 vladbat00 Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I just wanted to clarify this bit:

As a bonus, clients never need to store old predicted states.

It means that we still store the history of authoritative updates, but clients can simply avoid adding predicted states on top of those, correct? (I.e. they can store just the latest one.) And we still need to store a buffer of local players' commands, to be able to re-sim the predicted states.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly it.


## Unresolved Questions

- Can we provide lints for undefined behavior like mutating networked state outside of `NetworkFixedUpdate`?
- Do rollbacks break change detection or events?
- ~~Do rollbacks break change detection or events?~~ As long as we're careful to update the appropriate change ticks, it should be okay.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events don't work off of change ticks at the moment sadly :( This will need to be explicitly tested; or we can try to work out an events design to use change ticks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. I was tunnel-visioned on the change detection when I edited that.

I haven't invested much thought into events yet, but they will be tricky, I think. The ones contained inside the simulation loop might actually be the most straightforward.
Events that are sent out to be consumed by "non-networked" systems, UI, etc. will have weird requirements because of mispredictions.

maniwani added 5 commits May 29, 2021 08:26
Sorry to all my English teachers. Also moved the player vs. client thing back to the top. The distinction is important for interest management.
trying to be exact with terminology
maniwani added 5 commits July 8, 2021 11:54
Been working on a proof of concept for this and found some mistakes in the rushed pseudocode I wrote in the time sync section. Made some revisions, but realized some other stuff was out of data too.
@recatek
Copy link

recatek commented Oct 4, 2021

bevy::net::replication
  • save and restore
  • prediction
  • serialization
  • delta compression
  • interest management
  • visual error correction
  • lag compensation
  • statistics (high-level)

Would it be worth separating this into two modules? The serialization aspect seems distinct from (but a dependency of) the replication aspect. Splitting off bevy::net::serialization would more cleanly allow a user to take advantage of the nice serialization macros/attributes (with configurable precision, etc.) but employ their own replication strategy for games that don't necessarily need clientside prediction or want to have more strict control over how various entities are synced.

@maniwani
Copy link
Author

maniwani commented Oct 4, 2021

Would it be worth separating this into two modules? The serialization aspect seems distinct from (but a dependency of) the replication aspect. Splitting off bevy::net::serialization would more cleanly allow a user to take advantage of the nice serialization macros/attributes (with configurable precision, etc.) but employ their own replication strategy for games that don't necessarily need clientside prediction or want to have more strict control over how various entities are synced.

So this doc is a bit outdated / missing details now that I've had some time to work on a plugin implementation. Me having "serialization" here is honestly a mistake since my snapshot memory arena strategy does not (de)serialize (it's just a copy). But yes, you still make a good point, and I will most likely put the underlying allocator in its own crate.

I wanna clarify some stuff, though. Prediction comes for free with my strategy. Nothing is blocked upfront, so users will actually have to explicitly opt-out using a new Predicted query filter and conditional compilation depending on the role (client, server, host, etc.)

How state is synced is where my impl gets very opinonated. My versions of full and interest-managed state transfer are both dependent on the contiguous memory arena (mostly for compression). I still need to flesh out an API for users to change the relevance of component values for each client at run-time. I'm open to suggestions there.

My approach to prioritization does not involve user-defined priorities and changing update frequencies, but those can maybe be added as an option later. My current plan is:

  • Users can set the quantization precision of each networked float field (compile-time).
  • Users can add and remove volumes of interest for clients (run-time).
  • Users can set which clients are privy to what values (run-time).
  • Users can mark values as "always relevant" to specific clients (run-time).

@recatek
Copy link

recatek commented Oct 4, 2021

I think those are good models for many game types. I do believe that in any situation where the networking system has to be opinionated, there should be good API slice points for games that don't fit the typical mold to insert their own logic instead. It would be a shame to have things coupled in a way where you can't use something like the helpful serialization attributes for byte-packing component fields because your game's networking model doesn't match up with how this system does its sync/rollback and interest decisions. Networking model has a lot of influence on project cost, so if a custom implementation of part of the stack can save bandwidth by sacrificing features they don't need, all the more power to them and it would be nice to make that easy to insert as an alternative.

@maniwani
Copy link
Author

maniwani commented Oct 4, 2021

I think those are good models for many game types. I do believe that in any situation where the networking system has to be opinionated, there should be good API slice points for games that don't fit the typical mold to insert their own logic instead.

I understand where you're coming from, but you're making a lot of assumptions.

Let me start by saying that my priorities are, in order:

  1. Ergonomics (be as close to local multiplayer in vanilla Bevy as possible)
  2. Speed (use as little CPU time as possible)
  3. Bandwidth (transmit information in as few bytes as possible)

I'm confident that the architecture I ended up with is minimally invasive and puts the fewest restrictions on what users can do and without sacrificing any performance. I do not see a mold being enforced here. The only hard requirement is something every game can do—the game physics have to be a discrete simulation.

Users are not going to have to fight against it to "get back" anything because there's nothing happening behind the scenes that incurs extra cost. Rollback will be as fast as it could ever be and there's nothing to turn off to make it faster. The performance of a client-side simulation step (and subsequent resims) will be solely up to the user.

It would be a shame to have things coupled in a way where you can't use something like the helpful serialization attributes for byte-packing component fields because your game's networking model doesn't match up with how this system does its sync/rollback and interest decisions. Networking model has a lot of influence on project cost, so if a custom implementation of part of the stack can save bandwidth by sacrificing features they don't need, all the more power to them and it would be nice to make that easy to insert as an alternative.

I'm open to attribute suggestions here, but there are no other "serialization" attributes except decimal precision because nothing is serialized. If you knew that some integer field was always between two values, store it encoded like that and then add the minimum value in your systems. I only have a special thing for quaternions, just so that decimal precision corresponds to angle precision.

There is still compression on top of this, for both full and interest-managed state transfer. And I'm confident the default will pack a lot more data than most strategies users would think of. The only thing that may add waste are the padding bytes in a user's structs (depends on the alignment). Storing packed versions of structs is likely worth exploring later, but I'm shelving that until someone demonstrates a need for it after I release my plugin.

I am open to exposing more control over interest management. I already want to leave room for users to swap in their own form of AOI (e.g. potentially visible sets). I just think the default prioritization will work very well.

My main goal is a high-performing plugin that everyone can use (without grokking networking to the degree I think I do). That means competitive defaults. Situations where users need to DIY things to fit their game design or meet their performance goals should be rare, and I'd consider it a massive failure if that were common. From our past discussions, I don't think me talking will convince you that one plugin and some default strategies can serve most games well, but I'm leaning into it.

@recatek
Copy link

recatek commented Oct 4, 2021

I understand where you're coming from, but you're making a lot of assumptions.

I don't think saying that there are there are many idiosyncratic game netcode requirements, and no single library can adequately service them all is much of an assumption. It's no coincidence that no single gamedev networking paradigm has really emerged victorious. There are divides between genres, at different degrees of scale, between platforms, and there's a rather significant divide between AAA and Indie. Unreal works similarly to this on the surface, but there are good reasons to make an Unreal game without using Unreal networking, and in that situation Unreal networking can actually be a hurdle to get over since it gets in your way.

That said, this seems like a good baseline that will serve many projects with its defaults, and bevy's structure makes everything flexible to begin with. There are also parts of this that could be expanded upon to service more paradigms than the ones presented directly here. I do think it would be useful to have different parts of the stack be as individually accessible as possible, so that the process of using something that isn't necessarily this rollback/prediction model can still be served by this plugin (at least, if it's intended to be the bevy networking solution). At the very least, a division between the things bevy can do to make networking generally easier (fixed timesteps for network processing, serialization/memory arrangement of components, etc.) and the actual networking solution itself would be immensely useful.

@maniwani
Copy link
Author

maniwani commented Oct 4, 2021

There are many idiosyncratic game netcode requirements and no single library can adequately service them all.

If nothing else, I hope I conveyed that I think these are pretty big assumptions, and that challenging them is what led me to start this. I'll admit I don't plan on supporting everything (e.g. no peer-to-peer over 2 players). I'm not going to opine on it, but now that I know what I know, I don't think there's much natural division in multiplayer netcode at all. Deterministic vs authoritative is the big one, but that doesn't translate to much under this architecture.

It's no coincidence that no single gamedev networking paradigm has really emerged victorious.

I mean, not to knock anyone, but it certainly feels like a coincidence to me. Developers have time and budget constraints and they usually aren't designing general-purpose engines. Even in the engine space, Unreal is the only one trying.

IMO multiplayer programming just lacks the abundance of learning material that e.g. rendering has. That is changing, but there's still just not much concise, accurate information available (lots of misleading and incorrect information though). I know of only one multiplayer programming textbook. Ideally, some blog posts would expose the fact that multiplayer games are just reinventing distributed systems and databases with worse terminology.

I do think it would be useful to have different parts of the stack be as individually accessible as possible.

I'll try to keep things as decoupled as I can.

snapshot time is the time stored in the snapshot, not the time when it was received
@reneeichhorn
Copy link

First of all thanks for the write up! I agree that having replication "early" on would be a big plus for bevy. Not sure if this is PR is still being worked on but here are my two cents.

For one of my projects in bevy I've implemented a really rough network replication feature that was more meant to be a working prototype / example than an production ready thing so there are tons of missing features and often the shortest path was taken. But the most important learning in the end is / was that the concept of replication works great and building your server and client with it is as easy as it can get, so we are on a good path there.

But I think mixing replication and networking could be a bad idea. In the end the goal should be to have a replication feature that allows you to sync up two or more Worlds. Whether that happens via network on a server, peer to peer to another client, the same machine or even same process should not matter to much. If you think about other use cases for replication one would be for simulations (physics, ai, life simulations,..) where your simulation runs in its own world at its own frame/tick rate. For a simulations you might even want to "record" all replications (think of it is a diff every tick) so that you can playback the simulations world in a visualizer at any time.

The networking side is another huge part and I feel like bevy should only care as little as possible about it. There are already amazing projects like quinn (https://github.com/quinn-rs/quinn) that implements the QUIC protocol (UDP) with both reliable and unreliable messages (through a QUIC extension). On top of the service / client it self there are a lot of other networking features that you also mentioned: encryptions, statistics, safety, reliability, scalability and many more. It's not to uncommon to have rather "simple and unsafe" server implementations that have proxys on top of it. That applies to both large scaled https infrastructures but also game server. We now even have Quilkin a udp proxy (https://github.com/googleforgames/quilkin) in the rust ecosystem. No need to reinvent the wheel here. Looking even further at typical infrastructures with kubernetes for example there is even more work needed.

tl;dr: Keep networking and replication seperated concerns (while of course not forgetting that these two can (but not have to be) closely related to each other. And let bevy focus more on the replication part rather than networking.

@maniwani
Copy link
Author

Thank you for the feedback. I want to say upfront that I agree with a lot of what you've said. This RFC already advocates for keeping things modular. I do not want any coupling between the replication code (the save and restore) and whatever plumbing sends and receives the data.

@cart has said on Discord before that built-in multiplayer engine features would most likely come from adopting or forking a plugin that's proven itself, so plugins are the route I'll be going with this. I think that's a good call. User first impressions and Bevy's general reputation are important, so this is something we'd definitely want to get mostly right before bringing it to the engine.

In the end the goal should be to have a replication feature that allows you to sync up two or more Worlds. Whether that happens via network on a server, peer to peer to another client, the same machine or even same process should not matter to much.

A generalized API for saving and restoring the state of a world (or copying data from one world to another) is outside the scope of this RFC. I think a better place to discuss that would be the "multiple worlds" RFC #43.

While I agree that their APIs should be modular, solving replication while ignoring the constraints imposed by networking won't produce a good solution for online multiplayer. That's why this is an RFC for "networked replication" and not "cloning worlds." We could definitely extend World with new replication methods and then just incorporate them here, but that doesn't help address the unique challenges of sending data over the Internet. Offline contexts don't have loss or harsh size limits (MTU).

Those unique challenges have to be addressed and they heavily motivate storing data in ways that can be compressed really well, really quickly, and I described the strategy I think solves those challenges the best. Said strategy would work in offline contexts too (where compression is unnecessary), but those aren't my focus.

The networking side is another huge part and I feel like bevy should only care as little as possible about it.

Sockets, messaging protocols, etc. are beyond the scope of the RFC.

I know how Steam, Epic, Microsoft, Amazon, etc. all have their own socket and authentication libraries that you must use to access their infrastructure, so I agree to the extent that I think you mean keep things modular.

If you're saying that Bevy shouldn't have some UDP thing of its own, I'm inclined to disagree. I think if we're gonna advertise "Bevy can do online multiplayer" at some point, there should at least be something simple built-in for testing purposes. It just needs to be modular.

There are already amazing projects like quinn (https://github.com/quinn-rs/quinn) that implements the QUIC protocol (UDP) with both reliable and unreliable messages (through a QUIC extension).

QUIC would be nice, but I'm not sure if its unreliable extension is enough for games.

QUIC datagrams are exclusively unreliable and unordered, while most realtime multiplayer games are unreliable but sequenced—where the receiving side rejects messages older than the latest received (stale data is useless). We'd have to implement this on top of QUIC. Likewise, QUIC multiplexing is only supported for its reliable streams. There is no equivalent channel abstraction for datagrams, so we'd have to implement that too.

Implementing those on top of quinn-rs or quiche might be possible, even simple, I just haven't tried it. Definitely an appealing idea, but again just not in scope.

@heavyrain266
Copy link

Which networking model should be default in your eyes? Peer-to-peer or client<->server?

By looking at the, Bevy is used by indie developers, with either no real budget at the time of writing it or really low one which is huge barrier for maintaining Client <-> Server architecture in their game. Here comes Steam Network, a peer-to-peer abstraction which fits best for sole developers or small studios with low budget. Steam handles lobbing and matchmaking for the game through their APIs which is huge help and offloads a lot of costs.

If studio does not have their Server Infrastructure then renting many external servers comes with huge costs over pretty much FREE peer-to-peer model which sometimes requires small relay server to reduce hiccups when original host leaves session. Azure PlayFab is there to help sole devs and small/medium studios with maintaining networking by providing FREE Client <-> Server abstraction up to 100k of unique players, which is fine for the beginning but then costs start growning but they're still much smaller then external servers or own infrastructure. PlayFab provides SDK which can be wrapped as external or internal plugin.

Have you considered adding LAN option as bevy::net::lan::*?

In the early 2000s, many games had that option because it was supersingly cheap to implement. That change introduced LAN parties which exists even now, there public places like "Gamer Pubs" which organize such events for players.

@maniwani
Copy link
Author

maniwani commented Oct 2, 2022

Which networking model should be default in your eyes? Peer-to-peer or client-server?

In my eyes, client-server and peer-to-peer aren't models, they're topologies. And neither one forces anyone to pay for hosting.

Client-server Peer-to-peer
client-server peer-to-peer

I know some places say "P2P == not using paid hosting", but that way of categorizing things is meaningless here. We only care what the client and server roles do, not who's doing them.

If everyone connects to a specific player who's hosting the session, that's still a client-server topology. That player just happens to be running a server and client on the same device.

Does that address your main concern(s)? (Edit: Maybe it didn't come through in the RFC, but I do want to keep things backend-agnostic.)

Topology-wise, I think client-server should be the only option. The only advantage of a mesh network is that a player disconnecting can't end the game. But it turns too many other things into consensus problems and boosts the risk of NAT issues, and nobody wants to deal with those.

Well, 1v1 can stick around as a special case.

Steam handles lobbing and matchmaking for the game through their APIs which is huge help and offloads a lot of costs.

Almost everyone is behind NAT, so you always need a middleman (machine with a public IP address) for players to find each other. But yeah, Steam, Epic, Microsoft, etc. offer free access to services that handle that for you.

[Sometimes you do need a simple] relay server to reduce hiccups when original host leaves session.

Also true, you'd have to bring your own machines to handle host migrations unless the provider gives you a way to use theirs.

Have you considered adding LAN option as bevy::net::lan::*?

No, I haven't. That isn't necessary. We'd just bind sockets to LAN IP addresses instead of public ones. Nothing else would change (except your ping).

@heavyrain266
Copy link

Does that address your main concern(s)?

Yes, thanks for the explaination. That was important for some future work😊. I don't really know much about networking other than some backend dev and UDP based file servers.

About QUIC from above, I have tested it in quick unreal project with 10 different players in realtime and there is no difference with UDP, except that connection is more secure.

@maniwani
Copy link
Author

maniwani commented Oct 2, 2022

QUIC having secure connections baked in is an enormous plus, for sure. That said, IMO the only thing that's relevant in this context is that it exposes an unreliable channel.

At the moment, WebRTC is the only web API that exposes them, which is why there are crates like https://github.com/triplehex/webrtc-unreliable and https://github.com/naia-lib/webrtc-unreliable-client. Otherwise, you're stuck with WebSockets (built on TCP).

There's a new WebTransport API (built on QUIC) that exposes an unreliable channel with no headaches. That would be the go-to choice, but it isn't available in all browsers yet.

@heavyrain266
Copy link

Right, I forgot about it's WebAssmebly limitations. Good point

@maniwani
Copy link
Author

Closing this as (1) it hasn't been my focus for quite some time and (2) a lot of the implementation details discussed—particularly the paragraphs about allocation/compression—would IMO be too hard to implement in a reasonable timeframe, either because they're reliant on Rust features that still haven't stabilized yet, or because there's already a queue of large ECS features that have higher priority.

Basically, implementing this RFC as envisioned would first need bevy_ecs to implement a way to partition component storage (e.g. into "networked" and "non-networked"), and I just don't see that coming any time soon.

@maniwani maniwani closed this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants