Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce sharding rules to MongoDB collections #642

Closed
wants to merge 11 commits into from
90 changes: 90 additions & 0 deletions build/docker/sharding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
## Components
A MongoDB sharded cluster consists of the following components.
1. shard: Each shard contains a subset of the sharded data.
2. mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
3. config servers: Config servers store metadata and configuration settings for the cluster.

For a production deployment, consider the following to ensure data redundancy and system availability.
* Config Server (3 member replica set): `config1`,`config2`,`config3`
* 3 Shards (each a 3 member replica set):
* `shard1-1`,`shard1-2`, `shard1-3`
* `shard2-1`,`shard2-2`, `shard2-3`
* `shard3-1`,`shard3-2`, `shard3-3`
* 2 Mongos: `mongos1`, `mongos2`

## How to deploy
Go to the `prod` or `dev` directory and execute the following command.
It takes 1~2 minutes to completely deploy a cluster.
```bash
docker-compose up -d
```

## How to examine
You can examine the configuration and status using the following commands.
### Config server
```bash
docker-compose exec config1 mongosh --port 27017
rs.status()
```

### Shards
```bash
docker-compose exec shard1-1 mongosh --port 27017
rs.status()
```

### Mongos
```bash
docker-compose exec mongos1 mongosh --port 27017
sh.status()
```

## How to connect
You can use the sharded cluster by adding `mongo-connection-uri` option in Yorkie.
Note that the Yorkie should be on the same network as the cluster.
For the production,
```
--mongo-connection-uri "mongodb://localhost:27020,localhost:27021"
```
For the development,
```
--mongo-connection-uri "mongodb://localhost:27020"
```


## Details
As mongos determines which shard a document is located and routes queries, it's necessary to configure mongos to identify shards and sharding rules.
Therefore, the following commands should be applied to the primary mongos.
* `sh.addShard()` method adds each shard to the cluster.
* `sh.shardCollections()` method shards each collection with the specified shard key and strategy.

Two strategies are available for sharding collections.
1. Hashed sharding: It uses a hashed index of a single field as the shard key to partition data across your sharded cluster.
2. Range-based sharding: It can use multiple fields as the shard key and divides data into contiguous ranges determined by the shard key values.

```javascript
sh.addShard("shard-rs-1/shard1-1:27017,shard1-2:27017,shard1-3:27017")
sh.addShard("shard-rs-2/shard2-1:27017,shard2-2:27017,shard2-3:27017")
sh.addShard("shard-rs-3/shard3-1:27017,shard3-2:27017,shard3-3:27017")

sh.enableSharding("yorkie-meta")
sh.shardCollection("yorkie-meta.projects", { _id: "hashed" })
sh.shardCollection("yorkie-meta.users", { username: "hashed" })
sh.shardCollection("yorkie-meta.clients", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.documents", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.changes", { doc_id: "hashed", server_seq: 1 })
sh.shardCollection("yorkie-meta.snapshots", { doc_id: "hashed" })
sh.shardCollection("yorkie-meta.syncedseqs", { doc_id: "hashed" })
```

Considering the common query patterns used in Yorkie, it's possible to employ Range-based sharding using the following command.
```javascript
sh.shardCollection("<database>.<collection>", { <shard key field> : 1, ... } )
```

## Considerations


## References
* https://www.mongodb.com/docs/v6.0/tutorial/deploy-shard-cluster/
* https://www.mongodb.com/docs/manual/core/sharded-cluster-components/
126 changes: 126 additions & 0 deletions build/docker/sharding/dev/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
version: '3'
services:

# Config Server
config1:
image: mongo:7.0.1
container_name: mongo-config1
command:
- /bin/sh
- -c
- | # run the init script for the primary replica after mongod has been started.
sh -c "sleep 5 && mongosh < /scripts/init-config1.js" &
mongod --port 27017 --configsvr --replSet config-rs --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27100:27017
restart: always
networks:
- sharding

# Shards
# Shards 1
shard1-1:
image: mongo:7.0.1
container_name: mongo-shard1-1
command:
- /bin/sh
- -c
- | # run the init script for the primary replica after mongod has been started.
sh -c "sleep 5 && mongosh < /scripts/init-shard1-1.js" &
mongod --port 27017 --shardsvr --replSet shard-rs-1 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27110:27017
restart: always
networks:
- sharding

shard1-2:
image: mongo:7.0.1
container_name: mongo-shard1-2
command: mongod --port 27017 --shardsvr --replSet shard-rs-1 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27111:27017
restart: always
networks:
- sharding

shard1-3:
image: mongo:7.0.1
container_name: mongo-shard1-3
command: mongod --port 27017 --shardsvr --replSet shard-rs-1 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27112:27017
restart: always
networks:
- sharding

# Shards 2
shard2-1:
image: mongo:7.0.1
container_name: mongo-shard2-1
command:
- /bin/sh
- -c
- | # run the init script for the primary replica after mongod has been started.
sh -c "sleep 5 && mongosh < /scripts/init-shard2-1.js" &
mongod --port 27017 --shardsvr --replSet shard-rs-2 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27113:27017
restart: always
networks:
- sharding

shard2-2:
image: mongo:7.0.1
container_name: mongo-shard2-2
command: mongod --port 27017 --shardsvr --replSet shard-rs-2 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27114:27017
restart: always
networks:
- sharding

shard2-3:
image: mongo:7.0.1
container_name: mongo-shard2-3
command: mongod --port 27017 --shardsvr --replSet shard-rs-2 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27115:27017
restart: always
networks:
- sharding

# Mongos
mongos1:
image: mongo:7.0.1
container_name: mongos1
command:
- /bin/sh
- -c
- | # run the init script for the primary replica after the config servers and the shards has been configured
sh -c "sleep 40 && mongosh < /scripts/init-mongos1.js" &
mongos --port 27017 --configdb config-rs/config1:27017 --bind_ip_all
ports:
- 27017:27017
restart: always
volumes:
- ./scripts:/scripts
networks:
- sharding

networks:
sharding:
9 changes: 9 additions & 0 deletions build/docker/sharding/dev/scripts/init-config1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
rs.initiate(
{
_id: "config-rs",
configsvr: true,
members: [
{ _id : 0, host : "config1:27017" },
]
}
)
11 changes: 11 additions & 0 deletions build/docker/sharding/dev/scripts/init-mongos1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
sh.addShard("shard-rs-1/shard1-1:27017,shard1-2:27017,shard1-3:27017")
sh.addShard("shard-rs-2/shard2-1:27017,shard2-2:27017,shard2-3:27017")

sh.enableSharding("yorkie-meta")
sh.shardCollection("yorkie-meta.projects", { _id: "hashed" })
sh.shardCollection("yorkie-meta.users", { username: "hashed" })
sh.shardCollection("yorkie-meta.clients", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.documents", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.changes", { doc_id: "hashed", server_seq: 1 })
sh.shardCollection("yorkie-meta.snapshots", { doc_id: "hashed" })
sh.shardCollection("yorkie-meta.syncedseqs", { doc_id: "hashed" })
10 changes: 10 additions & 0 deletions build/docker/sharding/dev/scripts/init-shard1-1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
rs.initiate(
{
_id : "shard-rs-1",
members: [
{ _id : 0, host : "shard1-1:27017" },
{ _id : 1, host : "shard1-2:27017" },
{ _id : 2, host : "shard1-3:27017" }
]
}
)
10 changes: 10 additions & 0 deletions build/docker/sharding/dev/scripts/init-shard2-1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
rs.initiate(
{
_id : "shard-rs-2",
members: [
{ _id : 0, host : "shard2-1:27017" },
{ _id : 1, host : "shard2-2:27017" },
{ _id : 2, host : "shard2-3:27017" }
]
}
)
Loading