Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment: Design and recommendation for production deployment #100

Closed
ashcherbakov opened this issue Oct 29, 2021 · 6 comments
Closed

Comments

@ashcherbakov
Copy link
Contributor

ashcherbakov commented Oct 29, 2021

Acceptance criteria:

  • Design Production Deployment:
    • Define validators deployment pattern
    • Define observers deployment pattern
    • Define clients deployment pattern, types of client Apps and how they connect to the pool
    • Define security of the validators and network (sentry nodes, firewalls, etc.)
    • Other production-ready recommendations (HSM, DB setup, etc.)
  • Do necessary experiments
  • Design doc as the main output (as a wiki document for example)

Links

@ashcherbakov
Copy link
Contributor Author

@ashcherbakov
Copy link
Contributor Author

I believe we can start with the following items:

  1. Decide if we need Sentry nodes taken into account permissioned essence of the DCL Net and strict firewall/IPSec rules we are going to apply.
  2. Decide whether we need IPSec/VPN.
  3. Decide how exactly we are going to protect Validators (firewall, IPSec, Sentry, etc.)
  4. Investigate HSM usage, whether we need a bare metal for that.

@ashcherbakov ashcherbakov self-assigned this Jan 27, 2022
@ashcherbakov
Copy link
Contributor Author

Current Design Notes: https://github.com/zigbee-alliance/distributed-compliance-ledger/wiki/DCL-MainNet-Deployment

Answers for the questions above

  • 1: Sentry nodes are rather optional for permissioned DCL, but we recommend to use them for most deployments. Reasons:
    • Public Sentries are essentially Observers, so no need for more Observers
    • Harder to DDoS the real Validator node (in case malicious Validators present)
    • Hides real Validator node's IP, so harder to attack a real validator
    • Can support HSM and Validators at physical machines w/o access to Internet (if not from beginning, then HSM support can be added in future)
    • Can potentially auto-scale Sentry nodes (create new Sentries when attack is detected)
  • 2,3: needs to be investigated in more details
  • 4: HSM is rather optional for permissioned DCL. The main reason why it's must-have for PoS permisionless networks (such as Cosmos) is protection against Double Sign attacks which can cause loose of money (tokens), clients and reputation. In DCL we don't have any tokens, and if double sign is detected, the node will be just removed from the pool.
    Only physical HSMs can be supported (YubiHSM2 or Ledger) as 25519 curves are used for signing in Tendermint/Cosmos. AWS Cloud HSM doesn't support 25519.

Things to be checked/experimented (@andkononykhin):

  • TLS/HTTPS config for gRPC and REST
  • IPSec/VPN VS whitelist-based firewalls for secure communication between VNs
    • There are concerns about the authenticated encryption on Tendermint P2P level. So, additional level of auth encryption (IP) provided by IPSec can be beneficial.
  • Health & Monitoring details
    • Prometheus/Graphana for metrics
    • Exact metrics to be monitored
    • Metric-based alerts
    • Log analysis (ELK stack?)

@andkononykhin
Copy link
Member

andkononykhin commented Feb 4, 2022

TLS/HTTPS config for gRPC and REST
need to support TLS 1.3+

TLS 1.3 is supported for Tendermint RPC only, verified with self-signed generated cert

$ openssl ecparam -genkey -name secp384r1 -out server.key
$ openssl req -new -x509 -sha256 -key server.key -out server.crt -days 3650
$ # ... start a node with server.key and server.crt provided
$ curl -Lv --cacert tmp/server.crt https://localhost:26657/
*   Trying 127.0.0.1:26657...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 26657 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: tmp/server.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=AU; ST=Some-State; O=Internet Widgits Pty Ltd
*  start date: Feb  4 10:22:52 2022 GMT
*  expire date: Feb  2 10:22:52 2032 GMT
* SSL: unable to obtain common name from peer certificate
* Closing connection 0
* TLSv1.3 (OUT), TLS alert, close notify (256):
curl: (60) SSL: unable to obtain common name from peer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

for both cosmos gRPC and cosmos REST (over gRPC) only HTTP is available, looks like they don't consider that as part of the cosmos's codebase (cosmos/cosmos-sdk#6420 (comment))

@andkononykhin
Copy link
Member

TLS (client part):

  • CLI client communicates with tendermint RPC and should work (verified) well with server TLS enabled but requires proper configuration of the node endpoint (e.g. dcld status --node https://localhost:26657) and good certificate on server

@ashcherbakov
Copy link
Contributor Author

ashcherbakov commented Feb 7, 2022

@ashcherbakov ashcherbakov changed the title Recommendation for production deployment Deployment: Design and Recommendation for production deployment Feb 9, 2022
@ashcherbakov ashcherbakov changed the title Deployment: Design and Recommendation for production deployment Deployment: Design and recommendation for production deployment Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants