Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oasis-chain-monit on another host rather than validator #13

Open
valentinbud opened this issue Aug 21, 2021 · 0 comments
Open

oasis-chain-monit on another host rather than validator #13

valentinbud opened this issue Aug 21, 2021 · 0 comments

Comments

@valentinbud
Copy link

Greetings everyone 👋. I really like the approach this tool takes to monitor an Oasis Validator Node.
I would like to use it with an oasis-node at version 21.1.1 for a couple of days, until I upgrade
the Validator to 21.2.8.

What would I like to achieve?

I would like to install oasis-chain-monit on a separate server, other than the validator.
The docs say that we should contact you if that's the case. @chris-remus pointed me
here to open an issue.

Also InfluxDB and Grafana should be installed on that other separate server.

The Validator would have installed only Telegraf and Prometheus Node Exporter.

What have I done?

Variant 1

I have installed NGINX on the validator and configured it to grpc proxy to oasis-node
unix socket.

cat /etc/nginx/conf.d/oasis-node.conf
server {
    listen 9000 http2;

    location / {
        grpc_pass unix:/mnt/data/node/internal.sock;
    }
}

I have installed oasis-chain-monit on the metrics node, together with Grafana
and InfluxDB. Telegraf installed on the validator and successfully delivering data to
InfluxDB.

I thought I could have in config.toml one of the following settings and it would work.

[validator_details]
...
socket_path = "X.X.X.X:9000" 
# socket_path="grpc://X.X.X.X:9000"

It didn't, oasis-chain-monit complains

2021/08/21 06:52:52 Error while fetching validator block height from db
2021/08/21 06:52:52 Version: 21.1
2021/08/21 06:52:52 sendMessage resp: {"ok":true,"result":{"message_id":1426,"from":{"id":1905491558,"is_bot":true,"first_name":"OasisAlerts","username":"munaynetworkOAbot"},"chat":{"id":637960901,"first_name":"Valentin","last_name":"Bud","username":"valentinbud","type":"private"},"date":1629521572,"text":"Oasis node on your validator instance is not running: \nstat 9000: no such file or directory"}}
2021/08/21 06:52:52 sendMessage req : map[chat_id:[637960901] disable_notification:[false] disable_web_page_preview:[false] text:[Oasis node on your validator instance is not running:
stat 9000: no such file or directory]]

I believe this is because the socket_path configuration parameter is for local paths only.

Variant 2

Because of the above and because the InfluxDB hostname is hardcoded to localhost
in https://github.com/Chainflow/oasis-mission-control/blob/master/main.go#L26
I have moved to installing oasis-mission-control on the validator together with InfluxDB
and Telegraf. Grafana still on the metrics node. I would have run InfluxDB also on the
metrics node would have it been possible.

Now oasis-chain-monit starts and almost works ok.

2021/08/21 07:19:02 File exists!
2021/08/21 07:19:02 Network epoch number : 0
2021/08/21 07:19:02 block height : 4680900
2021/08/21 07:19:02 val hex address from val set : B70BD502585CE*******86112B1B0C9E65EE6
2021/08/21 07:19:02 VOTING POWER: 22456883370378
2021/08/21 07:19:02 Validator epoch number : 7791
2021/08/21 07:19:02 Epoch difference : 7791
2021/08/21 07:19:02 Blok meta header height : 4680899
2021/08/21 07:19:02 Network height: 0
2021/08/21 07:19:02 Network height: 0 and Validator Height: 4676939
2021/08/21 07:19:02 validator worker epoch number : 7797
2021/08/21 07:19:02 Address Balance: 19645
2021/08/21 07:19:02 Block time diff: 6.00
2021/08/21 07:19:02 Version: 21.1
2021/08/21 07:19:02 Peers count : 45 and validator latest height : 4680900
2021/08/21 07:19:02 validator status... true
2021/08/21 07:19:02 a1, a2 and present time :  2:30PM 2:30AM 5:19AM
2021/08/21 07:19:02 val alert count.. 0

Two things I've noticed:

  • the Network height is always 0. I am almost sure that the
    network height is much bigger than that.
  • Also, most of the dashboards imported in Grafana version 8.1.1 are empty, N/A.

For the reference please see below my config.toml.

validator_addr = "oasis1*****vm"
validator_hex_addr = "B70BD502*******B1B0C9E65EE6"
validator_name = "oasis-validator-mainnet"
network_url = "http://157.230.100.229:3000"
network_node_name = "Oasis_Local"
socket_path = "unix:/mnt/data/node/internal.sock"

[alerts_threshold]
voting_power_threshold = 50
num_peers_threshold = 1
missed_blocks_threshold = 4
block_diff_threshold = 3
epoch_diff_threshold = 3
emergency_missed_blocks_threshold = 50

[enable_alerts]
enable_telegram_alerts = "yes"
enable_email_alerts = "false"

[daily_alerts]
alert_time1 = "02:30PM"
alert_time2 = "02:30AM"

[telegram]
tg_chat_id = 63***901
tg_bot_token = "190***558:AAHhmV*******xJKkYXt99I"

[sendgrid]
sendgrid_token = "SG.fMdY2lmRQ8absvgsdgsdlGpqQ.5OKsFzyc_1ccoC8y_kwvIxsofJ_1UuOvRFiQVBdMb1Q"
email_address = "[email protected]"
pagerduty_email = "[email protected]"

[influxdb]
port = "8086"
database = "oasis"
# username = "vitwit"

[scraper]
rate = "3s"
validator_rate = "60s"

What am I missing? Am I doing something wrong? And would it be possible to run oasis-mission-control on a different machine? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant