Skip to content

Commit

Permalink
Introduce MongoDB sharding rules to Project-wide and Document-wide co…
Browse files Browse the repository at this point in the history
…llections (#776)

This commit introduces MongoDB sharding rules to Project-wide and
Document-wide collections.

There are three types of collections:
1. Cluster-wide: `users`, `projects`(less than 10K)
4. Project-wide: `documents`, `clients`(more than 1M)
7. Document-wide: `changes`, `snapshots`, `syncedseqs`(more than 100M)

We determine whether a collection is required to be sharded based on
the expected data count of its types.

1. Cluster-wide: not sharded
2. Project-wide, Document-wide: sharded 

Project-wide collections contain range queries using a `project_id`
filter, and document-wide collections usually contain them using a
`doc_id` filter. We choose the shard key based on the query pattern of
each collection: 

1. Project-wide: `project_id`
2. Document-wide: `doc_id`

This involves changes in reference keys of collections:
1. `documents`: `_id` -> `(project_id, _id)`
2. `clients`: `_id` -> `(project_id, _id)`
4. `changes`: `_id` -> `(project_id, doc_id, server_seq)`
5. `snapshots`: `_id` -> `(project_id, doc_id, server_seq)`
6. `syncedseqs`: `_id` -> `(project_id, doc_id, client_id)`

---------

Co-authored-by: Youngteac Hong <[email protected]>
  • Loading branch information
sejongk and hackerwins authored Jan 29, 2024
1 parent 768dad2 commit 0a7f4c8
Show file tree
Hide file tree
Showing 53 changed files with 2,809 additions and 1,624 deletions.
40 changes: 40 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,43 @@ jobs:
fail-on-alert: false
github-token: ${{ secrets.GITHUB_TOKEN }}
comment-always: true

sharding_test:
name: sharding_test
runs-on: ubuntu-latest
steps:

- name: Set up Go ${{ env.GO_VERSION }}
uses: actions/setup-go@v3
with:
go-version: ${{ env.GO_VERSION }}

- name: Check out code
uses: actions/checkout@v4

- name: Get tools dependencies
run: make tools

- name: Check Docker Compose Version
run: docker compose --version

- name: Run the Config server, Shard 1 and Shard 2
run: docker compose -f build/docker/sharding/docker-compose.yml up --build -d --wait config1 shard1-1 shard2-1

- name: Initialize the Config server
run: docker compose -f build/docker/sharding/docker-compose.yml exec config1 mongosh test /scripts/init-config1.js

- name: Initialize the Shard 1
run: docker compose -f build/docker/sharding/docker-compose.yml exec shard1-1 mongosh test /scripts/init-shard1-1.js

- name: Initialize the Shard 2
run: docker compose -f build/docker/sharding/docker-compose.yml exec shard2-1 mongosh test /scripts/init-shard2-1.js

- name: Run the Mongos
run: docker compose -f build/docker/sharding/docker-compose.yml up --build -d --wait mongos1

- name: Initialize the Mongos
run: docker compose -f build/docker/sharding/docker-compose.yml exec mongos1 mongosh test /scripts/init-mongos1.js

- name: Run the tests with sharding tag
run: go test -tags sharding -race -v ./test/sharding/...
4 changes: 3 additions & 1 deletion api/types/project.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@

package types

import "time"
import (
"time"
)

// Project is a project that consists of multiple documents and clients.
type Project struct {
Expand Down
54 changes: 54 additions & 0 deletions api/types/resource_ref_key.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Copyright 2023 The Yorkie Authors. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package types

import (
"fmt"
)

// DocRefKey represents an identifier used to reference a document.
type DocRefKey struct {
ProjectID ID
DocID ID
}

// String returns the string representation of the given DocRefKey.
func (r DocRefKey) String() string {
return fmt.Sprintf("Document (%s.%s)", r.ProjectID, r.DocID)
}

// ClientRefKey represents an identifier used to reference a client.
type ClientRefKey struct {
ProjectID ID
ClientID ID
}

// String returns the string representation of the given ClientRefKey.
func (r ClientRefKey) String() string {
return fmt.Sprintf("Client (%s.%s)", r.ProjectID, r.ClientID)
}

// SnapshotRefKey represents an identifier used to reference a snapshot.
type SnapshotRefKey struct {
DocRefKey
ServerSeq int64
}

// String returns the string representation of the given SnapshotRefKey.
func (r SnapshotRefKey) String() string {
return fmt.Sprintf("Snapshot (%s.%s.%d)", r.ProjectID, r.DocID, r.ServerSeq)
}
5 changes: 5 additions & 0 deletions build/charts/yorkie-mongodb/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,11 @@ sharded:
restartPolicy: Never
backoffLimit: 0
rules:
- collectionName: clients
shardKeys:
- name: project_id
method: "1"
unique: false
- collectionName: documents
shardKeys:
- name: project_id
Expand Down
29 changes: 29 additions & 0 deletions build/docker/sharding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Docker Compose File for Sharded Cluster

These files deploy and set up a MongoDB sharded cluster using `docker compose`.

The cluster consists of the following components and offers the minimum
configuration required for testing.
- Config Server (Primary Only)
- 2 Shards (Each Primary Only)
- 1 Mongos

```bash
# Run the deploy.sh script to deploy and set up a sharded cluster.
./scripts/deploy.sh

# Shut down the apps
docker compose -f docker-compose.yml down
```

The files we use are as follows:
- `docker-compose.yml`: This file is used to run Yorkie's integration tests with a
MongoDB sharded cluster. It runs a MongoDB sharded cluster.
- `scripts/init-config.yml`: This file is used to set up a replica set of the config
server.
- `scripts/init-shard1.yml`: This file is used to set up a replica set of the shard 1.
- `scripts/init-shard2.yml`: This file is used to set up a replica set of the shard 2.
- `scripts/init-mongos.yml`: This file is used to shard the `yorkie-meta` database and
the collections of it.
- `scripts/deploy.sh`: This script runs a MongoDB sharded cluster and sets up the cluster
step by step.
121 changes: 121 additions & 0 deletions build/docker/sharding/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
version: '3'
services:

# Config Server
config1:
image: mongo:6.0
container_name: mongo-config1
command: mongod --port 27017 --configsvr --replSet config-rs --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27100:27017
restart: always
networks:
- sharding
healthcheck:
test:
[
"CMD",
"mongosh",
"--host",
"config1",
"--port",
"27017",
"--quiet",
"--eval",
"'db.runCommand(\"ping\").ok'"
]
interval: 30s

# Shards
# Shards 1
shard1-1:
image: mongo:6.0
container_name: mongo-shard1-1
command: mongod --port 27017 --shardsvr --replSet shard-rs-1 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27110:27017
restart: always
networks:
- sharding
healthcheck:
test:
[
"CMD",
"mongosh",
"--host",
"shard1-1",
"--port",
"27017",
"--quiet",
"--eval",
"'db.runCommand(\"ping\").ok'"
]
interval: 30s

# Shards 2
shard2-1:
image: mongo:6.0
container_name: mongo-shard2-1
command: mongod --port 27017 --shardsvr --replSet shard-rs-2 --bind_ip_all
volumes:
- ./scripts:/scripts
ports:
- 27113:27017
restart: always
networks:
- sharding
healthcheck:
test:
[
"CMD",
"mongosh",
"--host",
"shard2-1",
"--port",
"27017",
"--quiet",
"--eval",
"'db.runCommand(\"ping\").ok'"
]
interval: 30s

# Mongos
mongos1:
image: mongo:6.0
container_name: mongos1
command: mongos --port 27017 --configdb config-rs/config1:27017 --bind_ip_all
ports:
- 27017:27017
restart: always
volumes:
- ./scripts:/scripts
networks:
- sharding
healthcheck:
test:
[
"CMD",
"mongosh",
"--host",
"mongos1",
"--port",
"27017",
"--quiet",
"--eval",
"'db.runCommand(\"ping\").ok'"
]
interval: 30s
depends_on:
config1:
condition: service_healthy
shard1-1:
condition: service_healthy
shard2-1:
condition: service_healthy

networks:
sharding:
19 changes: 19 additions & 0 deletions build/docker/sharding/scripts/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/sh
echo "Run the Config server, Shard 1 and Shard 2"
docker compose -f docker-compose.yml up --build -d --wait config1 shard1-1 shard2-1
echo $?
echo "Init config"
docker compose -f docker-compose.yml exec config1 mongosh test /scripts/init-config1.js
echo $?
echo "Init shard1"
docker compose -f docker-compose.yml exec shard1-1 mongosh test /scripts/init-shard1-1.js
echo $?
echo "Init shard2"
docker compose -f docker-compose.yml exec shard2-1 mongosh test /scripts/init-shard2-1.js
echo $?
echo "Run the Mongos"
docker compose -f docker-compose.yml up --build -d --wait mongos1
echo $?
echo "Init mongos1"
docker compose -f docker-compose.yml exec mongos1 mongosh test /scripts/init-mongos1.js
echo $?
9 changes: 9 additions & 0 deletions build/docker/sharding/scripts/init-config1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
rs.initiate(
{
_id: "config-rs",
configsvr: true,
members: [
{ _id: 0, host: "config1:27017" },
]
}
)
39 changes: 39 additions & 0 deletions build/docker/sharding/scripts/init-mongos1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
sh.addShard("shard-rs-1/shard1-1:27017")
sh.addShard("shard-rs-2/shard2-1:27017")

function findAnotherShard(shard) {
if (shard == "shard-rs-1") {
return "shard-rs-2"
} else {
return "shard-rs-1"
}
}

function shardOfChunk(minKeyOfChunk) {
return db.getSiblingDB("config").chunks.findOne({ min: { project_id: minKeyOfChunk } }).shard
}

// Shard the database for the mongo client test
const mongoClientDB = "test-yorkie-meta-mongo-client"
sh.enableSharding(mongoClientDB)
sh.shardCollection(mongoClientDB + ".clients", { project_id: 1 })
sh.shardCollection(mongoClientDB + ".documents", { project_id: 1 })
sh.shardCollection(mongoClientDB + ".changes", { doc_id: 1 })
sh.shardCollection(mongoClientDB + ".snapshots", { doc_id: 1 })
sh.shardCollection(mongoClientDB + ".syncedseqs", { doc_id: 1 })

// Split the inital range at `splitPoint` to allow doc_ids duplicate in different shards.
const splitPoint = ObjectId("500000000000000000000000")
sh.splitAt(mongoClientDB + ".documents", { project_id: splitPoint })
// Move the chunk to another shard.
db.adminCommand({ moveChunk: mongoClientDB + ".documents", find: { project_id: splitPoint }, to: findAnotherShard(shardOfChunk(splitPoint)) })

// Shard the database for the server test
const serverDB = "test-yorkie-meta-server"
sh.enableSharding(serverDB)
sh.shardCollection(serverDB + ".clients", { project_id: 1 })
sh.shardCollection(serverDB + ".documents", { project_id: 1 })
sh.shardCollection(serverDB + ".changes", { doc_id: 1 })
sh.shardCollection(serverDB + ".snapshots", { doc_id: 1 })
sh.shardCollection(serverDB + ".syncedseqs", { doc_id: 1 })

8 changes: 8 additions & 0 deletions build/docker/sharding/scripts/init-shard1-1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
rs.initiate(
{
_id: "shard-rs-1",
members: [
{ _id: 0, host: "shard1-1:27017" },
]
}
)
8 changes: 8 additions & 0 deletions build/docker/sharding/scripts/init-shard2-1.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
rs.initiate(
{
_id: "shard-rs-2",
members: [
{ _id: 0, host: "shard2-1:27017" },
]
}
)
11 changes: 0 additions & 11 deletions client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -581,17 +581,6 @@ func handleResponse(
return nil, ErrUnsupportedWatchResponseType
}

// FindDocKey returns the document key of the given document id.
func (c *Client) FindDocKey(docID string) (key.Key, error) {
for _, attachment := range c.attachments {
if attachment.docID.String() == docID {
return attachment.doc.Key(), nil
}
}

return "", ErrDocumentNotAttached
}

// ID returns the ID of this client.
func (c *Client) ID() *time.ActorID {
return c.id
Expand Down
1 change: 1 addition & 0 deletions server/backend/database/change_info.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ var ErrDecodeOperationFailed = errors.New("decode operations failed")
// ChangeInfo is a structure representing information of a change.
type ChangeInfo struct {
ID types.ID `bson:"_id"`
ProjectID types.ID `bson:"project_id"`
DocID types.ID `bson:"doc_id"`
ServerSeq int64 `bson:"server_seq"`
ClientSeq uint32 `bson:"client_seq"`
Expand Down
Loading

0 comments on commit 0a7f4c8

Please sign in to comment.