Our network topology is a little weird. We have two backend services involved for the user-user comms. One is reticulum, which is a mesh network of erlang/elixir/phoenix nodes, and is responsible for all non-voice/video traffic between users. when you connect to a room, you are connecting to a load-balanced node on this mesh over websockets, and messages are relayed between all users in that room across the mesh via a pub/sub system called phoenix channels. Meanwhile, for voice and video, you are also connected to a shared SFU node running Janus and our Janus plugin (so all users are connected in a room to the same physical node for voice/video relay.) There is no P2P traffic. Everything operationally is captured in our ops repo.
Our game networking itself is implemented using the networked-aframe library. currently, authorization and authentication is enforced on a very small subset of messages like joining and kicking, but the capability is there for us to do message-level authorization. However all simulation is done on clients -- there is no server-side simulation of any kind (eg physics) -- the servers are basically just a message bus that the clients use that does slight modifications and authorization to messages along the way before being broadcast to all peers. Things like ownership over objects and other incidental concerns to orchestrate the in-game experience among peers is all based upon the client protocol implementation
For persistent state, we're not doing anything fancy. we have a postgresql database behind reticulum and a file store for the two methods of durable storage. Reticulum manages both, and when you update permanent room state, pin objects, etc, you are interfacing with APIs in reticulum to update bits on those two backing stores.
Our networking serialization is..somewhat embarrassingly... JSON. (and somewhat remarkably, when you consider how well it actually works in practice)
All the assets are backed in the file store, and encrypted with single use keys. The files are served out via reticulum which is front-ed by a CDN for caching.
Incidentally we are using EFS on AWS for the filesystem but this approach is nice in that it dosen't couple the service with any AWS services. Currently we have only one relatively tight AWS-coupling -- the new screenshot service is a lambda which is used to generate a thumbnail when you paste an arbitrary web URL into hubs. Also, all of our ops Terraform scripts are AWS-based, but the code itself does not (for example) require any AWS services to run, nor does it even have a place to hand it AWS credentials.
We are running a "real" physics simulation on every client, and the main difference between clients is which clients at any point are responsible for simulating a given object, vs receiving messages about that object. We have a basic 'ownership transfer' algorithm that is guaranteed to converge on an owner (mod any bugs :)) There can be slight periods where there is a conflict but eventually the owner should converge and hence the game state should converge across all clients for a given object. However our networking implementation is not designed for use cases like your typical FPS or highly sensitive game mechanics -- i wouldn't be surprised if someone could get a basic "game" like that working with hubs, but you sure wouldn't want it to be a competitive one :smiley: For example our clients are not adversarial and are (generally speaking) trusted among one another -- a sane assumption for our use case where you are already mutually sharing the room link with trusted peers (and the failure mode of a hacked client trying to perform abuse, you have other mitigation like kicking etc) whereas for typical games a hacked client is often disguised to look like other peers except have some subtle advantage over others, etc. we don't design any mitigations for that kind of thing since there's no concept of "game advantage" in hubs etc.