Skip to content

Commit

Permalink
Remove workers from committee (MystenLabs/narwhal#670)
Browse files Browse the repository at this point in the history
* Remove workers from committee

* Fix local benchmark, client demo & tests

* Fix remote benchmark

* revert settings.json

* Fix docker compose configurations

* rebase

* Use singleton cache & update reconfiguration

* Update todo comments

* Fix license check

* Address review comments
  • Loading branch information
arun-koshy authored Aug 23, 2022
1 parent 4cf926a commit f7932a1
Show file tree
Hide file tree
Showing 63 changed files with 1,190 additions and 598 deletions.
30 changes: 16 additions & 14 deletions narwhal/Docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ First, you must install:
* [Docker](https://docs.docker.com/get-docker/)
* [Docker-compose](https://docs.docker.com/compose/install/)

Afterward, you will start the Narwhal cluster.
Afterward, you will start the Narwhal cluster.

First, **make sure that you are on the `Docker folder`** . In the rest of the
document, we'll assume that we are under this folder:
```
$ cd Docker # Change to Docker directory
$ pwd # Print the current directory
$ pwd # Print the current directory
narwhal/Docker
```

Expand All @@ -32,7 +32,7 @@ $ docker-compose -f docker-compose.yml up
```

The first time this runs, `docker-compose` will build the Narwhal docker image. (This can take a few minutes
since the narwhal node binary needs to be built from the source code.) And then it will spin up
since the narwhal node binary needs to be built from the source code.) And then it will spin up
a cluster for *four nodes* by doing the necessary setup for `primary` and `worker` nodes. Each
`primary` node will be connected to *one worker* node.

Expand Down Expand Up @@ -80,7 +80,7 @@ for the primary nodes, and that allows interaction with the node (ex. the consen
The gRPC server for a primary node is running on port `8000`. However, by default, a container's port
is not accessible to hit by the host (local) machine unless it's exported a mapping between a host's
machine port and the corresponding container's port (ex. for someone to use a gRPC client on their
computer to hit a primary's node container gRPC server). The [docker-compose.yml](docker-compose.yml) file
computer to hit a primary's node container gRPC server). The [docker-compose.yml](docker-compose.yml) file
exports the gRPC port for each primary node so they can be accessible from the host machine.

For the default setup of *four primary* nodes, the gRPC servers are listening to the following
Expand Down Expand Up @@ -128,7 +128,7 @@ Here is the Docker folder structure:
```

Under the `validators` folder find the independent configuration
folder for each validator node. (Remember, each `validator` is
folder for each validator node. (Remember, each `validator` is
constituted from one `primary` node and several `worker` nodes.)

The `key.json` file contains the private `key` for the corresponding node that
Expand All @@ -154,7 +154,7 @@ The following environment variables are available to be used for each service in
ID of the validator that the node/service corresponds to. This defines which
configuration to use under the `validators` folder.
* `LOG_LEVEL` is the level of logging for the node defined as number of `v` parameters (ex `-vvv`). The following
levels are defined according to the number of "v"s provided: `0 | 1 => "error", 2 => "warn", 3 => "info",
levels are defined according to the number of "v"s provided: `0 | 1 => "error", 2 => "warn", 3 => "info",
4 => "debug", 5 => "trace"`.
* `CONSENSUS_DISABLED`. This value disables consensus (`Tusk`) for a primary node and enables the
`gRPC` server. The corresponding argument is: `--consensus-disabled`
Expand All @@ -168,14 +168,16 @@ from the database and log data. This is useful to preserve the state between mul
- You must build the narwhal `node` binary at top level:

```cargo build --release --features "benchmark"```

That binary is necessary for generating the keys for the validators and the committee.json seed file.

### Running the `gen.validators.sh #` script to generate a larger cluster.


```
./gen.validators.sh 6
# arguments for script are {num_primary} & {num_worker_per_primary} in that order
./gen.validators.sh 6 1
# That will create a docker-compose.yaml file in ./validators-6/docker-compose.yaml
Expand Down Expand Up @@ -224,12 +226,12 @@ browse the logs via the "Explorer", selecting the Loki datasource.
If you encounter errors while the Docker image is being built, for example errors like:
```
error: could not compile `tonic`
#9 373.3
#9 373.3
#9 373.3 Caused by:
#9 373.4 process didn't exit successfully: `rustc --crate-name tonic --edition=2018
....
#9 398.4 The following warnings were emitted during compilation:
#9 398.4
#9 398.4
#9 398.4 warning: c++: fatal error: Killed signal terminated program cc1plus
#9 398.4 warning: compilation terminated.
```
Expand All @@ -240,11 +242,11 @@ compile the code. In this case please, increase the available RAM to at least 2G
### 2. Mounts denied or cannot start service errors

If you try to spin up the nodes via `docker-compose` and you come across errors such as `mounts denied`
or `cannot start service`, make sure that you allow Docker to share your host's [Docker/validators](validators) folder
or `cannot start service`, make sure that you allow Docker to share your host's [Docker/validators](validators) folder
with the containers. If you are using Docker Desktop, you can find more information on how to do
that here: [mac](https://docs.docker.com/desktop/mac/#file-sharing), [linux](https://docs.docker.com/desktop/linux/#file-sharing),
[windows](https://docs.docker.com/desktop/windows/#file-sharing) .

Also, check that you are not using the deprecated `devicemapper storage driver`, which might also
cause you issues. See how to [migrate to an overlayfs driver](https://docs.docker.com/storage/storagedriver/overlayfs-driver/) .
More information about the deprecation can be found [here](https://docs.docker.com/engine/deprecated/#device-mapper-storage-driver)
cause you issues. See how to [migrate to an overlayfs driver](https://docs.docker.com/storage/storagedriver/overlayfs-driver/) .
More information about the deprecation can be found [here](https://docs.docker.com/engine/deprecated/#device-mapper-storage-driver)
3 changes: 3 additions & 0 deletions narwhal/Docker/entry.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ fi
NODE_BIN="./bin/node"
KEYS_PATH=${KEYS_PATH:="/validators/validator-$VALIDATOR_ID/key.json"}
COMMITTEE_PATH=${COMMITTEE_PATH:="/validators/committee.json"}
WORKERS_PATH=${WORKERS_PATH:="/validators/workers.json"}
PARAMETERS_PATH=${PARAMETERS_PATH:="/validators/parameters.json"}
DATA_PATH=${DATA_PATH:="/validators"}

Expand All @@ -37,6 +38,7 @@ if [[ "$NODE_TYPE" = "primary" ]]; then
$NODE_BIN $LOG_LEVEL run \
--keys $KEYS_PATH \
--committee $COMMITTEE_PATH \
--workers $WORKERS_PATH \
--store "${DATA_PATH}/validator-$VALIDATOR_ID/db-primary" \
--parameters $PARAMETERS_PATH \
primary $CONSENSUS_DISABLED
Expand All @@ -46,6 +48,7 @@ elif [[ "$NODE_TYPE" = "worker" ]]; then
$NODE_BIN $LOG_LEVEL run \
--keys $KEYS_PATH \
--committee $COMMITTEE_PATH \
--workers $WORKERS_PATH \
--store "${DATA_PATH}/validator-$VALIDATOR_ID/db-worker-$WORKER_ID" \
--parameters $PARAMETERS_PATH \
worker --id $WORKER_ID
Expand Down
20 changes: 12 additions & 8 deletions narwhal/Docker/gen.validators.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
#!/usr/bin/env bash
set -e

# number of primary+worker instances to start.
num=$1
# number of primary instances to start.
num_primary=$1

if [ -z "${num}" ]; then
# number of worker instances per primary to start.
num_worker=$2

if [ -z "${num_primary}" ]; then
echo usage: $0 number_of_instances
exit 1
fi
Expand All @@ -16,10 +19,10 @@ fi

node=../target/release/node

target=validators-${num}
target=validators-${num_primary}
mkdir -p $target

./scripts/gen.compose.py -n ${num} -t templates/node.template > ${target}/docker-compose.yaml
./scripts/gen.compose.py -np ${num_primary} -t templates/node.template > ${target}/docker-compose.yaml

# loki config
cat > ${target}/loki-config.yaml <<EOF
Expand Down Expand Up @@ -90,7 +93,7 @@ scrape_configs:
__path__: /validators/validator-*/logs/log-*.txt
EOF

t=$(($num - 1))
t=$(($num_primary - 1))
for i in $(seq -f %02g 0 ${t})
do
val=${target}/validator-${i}
Expand All @@ -100,12 +103,13 @@ done

cp validators/parameters.json ${target}/parameters.json

./scripts/gen.committee.py -n ${num} -d ${target} > ${target}/committee.json
./scripts/gen.committee.py -n ${num_primary} -d ${target} > ${target}/committee.json
./scripts/gen.workers.py -np ${num_primary} -nw ${num_worker} -d ${target} > ${target}/workers.json

cp -r templates/{grafana,prometheus} ${target}/

# add the primary and worker nodes to the prometheus.yaml scrape configs.
t=$(($num - 1))
t=$(($num_primary - 1))
for i in $(seq -f %02g 0 ${t})
do
scrape_primary="primary_${i}:8010"
Expand Down
18 changes: 7 additions & 11 deletions narwhal/Docker/scripts/gen.committee.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,16 @@

def main():
"mainly main."
parser = argparse.ArgumentParser(description="committee file generator")
parser = argparse.ArgumentParser(
description="committee file generator")

parser.add_argument("-n", default=4, type=int, help="number of primary+worker instances")
parser.add_argument("-f", default="committee.json", help="committee.json file name")
parser.add_argument("-n", default=4, type=int,
help="number of primary instances")
parser.add_argument("-f", default="committee.json",
help="committee.json file name")
parser.add_argument("-d", default=None, help="target directory")
args = parser.parse_args()


# load keys
keys = []
for i in range(args.n):
Expand All @@ -31,16 +33,10 @@ def main():
"worker_to_primary": "/dns/primary_{:02d}/tcp/3001/http".format(i)
},
"stake": 1,
"workers": {
"0": {
"primary_to_worker": "/dns/worker_{:02d}/tcp/4000/http".format(i),
"transactions": "/dns/worker_{:02d}/tcp/4001/http".format(i),
"worker_to_worker": "/dns/worker_{:02d}/tcp/4002/http".format(i)
}
}
}
out = {"authorities": temp, "epoch": 0}
print(json.dumps(out, indent=4))


if __name__ == '__main__':
sys.exit(main())
15 changes: 10 additions & 5 deletions narwhal/Docker/scripts/gen.compose.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,23 +218,28 @@
FOOTER = """...
"""


def main():
"mainly main."
parser = argparse.ArgumentParser(description="docker compose config generator")
parser = argparse.ArgumentParser(
description="docker compose config generator")

parser.add_argument("-n", default=4, type=int, help="number of primary+worker instances")
parser.add_argument("-t", default="node.template", help="node template file")
parser.add_argument("-np", default=4, type=int,
help="number of primary instances")
parser.add_argument("-t", default="node.template",
help="node template file")
args = parser.parse_args()

templ = open(args.t).read()

print(HEADER)

for i in range(args.n):
for i in range(args.np):
tmp = "{:02d}".format(i)
print(templ.format(counter=tmp, num=args.n))
print(templ.format(counter=tmp, num=args.np))

print(FOOTER)


if __name__ == '__main__':
sys.exit(main())
48 changes: 48 additions & 0 deletions narwhal/Docker/scripts/gen.workers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env python3
"""Generate a workers.json file
"""

import sys
import argparse
import json


def main():
"mainly main."
parser = argparse.ArgumentParser(
description="workers file generator")

parser.add_argument("-np", default=4, type=int,
help="number of primary instances")
parser.add_argument("-nw", default=1, type=int,
help="number of worker instances per primary")
parser.add_argument("-f", default="workers.json",
help="workers.json file name")
parser.add_argument("-d", default=None, help="target directory")
args = parser.parse_args()

# load keys
keys = []
for i in range(args.np):
k = open("{}/validator-{:02d}/key.json".format(args.d, i)).read()
keys.append(json.loads(k))

temp = {}
starting_port = 4000
for i, k in enumerate(keys):
workers = {}
port = starting_port
for i in range(args.nw):
workers[i] = {
"primary_to_worker": "/dns/worker_{:02d}/tcp/{}/http".format(i, port),
"transactions": "/dns/worker_{:02d}/tcp/{}/http".format(i, port+1),
"worker_to_worker": "/dns/worker_{:02d}/tcp/{}/http".format(i, port+2)
}
port += 3
temp[k['name']] = workers
out = {"workers": temp, "epoch": 0}
print(json.dumps(out, indent=4))


if __name__ == '__main__':
sys.exit(main())
38 changes: 5 additions & 33 deletions narwhal/Docker/validators/committee.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,57 +5,29 @@
"primary_to_primary": "/dns/primary_0/tcp/3000/http",
"worker_to_primary": "/dns/primary_0/tcp/3001/http"
},
"stake": 1,
"workers": {
"0": {
"primary_to_worker": "/dns/worker_0/tcp/4000/http",
"transactions": "/dns/worker_0/tcp/4001/http",
"worker_to_worker": "/dns/worker_0/tcp/4002/http"
}
}
"stake": 1
},
"fbhvgLnet2HdE0NUITUpekQxdRRWKxbZczM6Qg55sP8=": {
"primary": {
"primary_to_primary": "/dns/primary_1/tcp/3000/http",
"worker_to_primary": "/dns/primary_1/tcp/3001/http"
},
"stake": 1,
"workers": {
"0": {
"primary_to_worker": "/dns/worker_1/tcp/4000/http",
"transactions": "/dns/worker_1/tcp/4001/http",
"worker_to_worker": "/dns/worker_1/tcp/4002/http"
}
}
"stake": 1
},
"noDjBFfXGqQioHTf6jEIPYthhUWCMsC12ZJ9DMh7Ujk=": {
"primary": {
"primary_to_primary": "/dns/primary_2/tcp/3000/http",
"worker_to_primary": "/dns/primary_2/tcp/3001/http"
},
"stake": 1,
"workers": {
"0": {
"primary_to_worker": "/dns/worker_2/tcp/4000/http",
"transactions": "/dns/worker_2/tcp/4001/http",
"worker_to_worker": "/dns/worker_2/tcp/4002/http"
}
}
"stake": 1
},
"Z+K3OEI/eldyTTdp27mQFDdBPqjkss9wOkN6RceDTuM=": {
"primary": {
"primary_to_primary": "/dns/primary_3/tcp/3000/http",
"worker_to_primary": "/dns/primary_3/tcp/3001/http"
},
"stake": 1,
"workers": {
"0": {
"primary_to_worker": "/dns/worker_3/tcp/4000/http",
"transactions": "/dns/worker_3/tcp/4001/http",
"worker_to_worker": "/dns/worker_3/tcp/4002/http"
}
}
"stake": 1
}
},
"epoch": 0
}
}
Loading

0 comments on commit f7932a1

Please sign in to comment.