Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P get stuck on public api #2210

Closed
Tracked by #2405
social244305-Architect opened this issue Feb 16, 2023 · 13 comments
Closed
Tracked by #2405

P2P get stuck on public api #2210

social244305-Architect opened this issue Feb 16, 2023 · 13 comments
Assignees
Labels
type: bug Issues that need priority attention -- something isn't working

Comments

@social244305-Architect
Copy link

Summary of Bug

Enabling API on Cosmos Hub public nodes causes node to stop syncing on mainnet.

Version

Ubuntu-2204-jammy-amd64-base:~# gaiad version
v8.0.0

Steps to Reproduce

Enable API on a public node (available on Cosmos Directory) on Cosmos Hub mainnet.
Start the gaiad process. Node will sync few blocks and then will stop syncing. I can still see more than 100 peers connected to node but no more blocks are synced.
Logs attached


Public API logs.txt

@MSalopek
Copy link
Contributor

Hello! Thanks for posting the issue.

Would you mind sharing the node configuration too (<node_home>/config/config.toml and /<node_home>/config/app.toml)? The node configuration gives us more insight into the setup and speeds up debugging and simulations.

Just in case, if you post the config you can remove sensitive information (if any).

Does the issue still persist or has it stopped?

@social244305-Architect
Copy link
Author

I stopped public API and kept only RPC available to avoid this issue.

app.toml
minimum-gas-prices = "0.001uatom"
pruning = "custom"
pruning-keep-recent = "300000"
pruning-keep-every = "0"
pruning-interval = "67"
halt-height = 0
halt-time = 0
min-retain-blocks = 0
inter-block-cache = true
index-events = []
iavl-cache-size = 781250
[telemetry]
service-name = ""
enabled = false
enable-hostname = false
enable-hostname-label = false
enable-service-label = false
prometheus-retention-time = 0
global-labels = [
]
[api]
enable = true
swagger = false
address = "tcp://0.0.0.0:XXXXX"
max-open-connections = 100
rpc-read-timeout = 10
rpc-write-timeout = 0
rpc-max-body-bytes = 1000000
enabled-unsafe-cors = false
[rosetta]
enable = false
address = ":8080"
blockchain = "app"
network = "network"
retries = 3
offline = false
[grpc]
enable = true
address = "127.0.0.1:XXXXX"
[grpc-web]
enable = false
address = "0.0.0.0:XXXXX"
enable-unsafe-cors = false
[state-sync]
snapshot-interval = 0
snapshot-keep-recent = 10

config.toml
proxy_app = "tcp://127.0.0.1:XXXXX"
moniker = "Architect Nodes RPC"
fast_sync = true
db_backend = "goleveldb"
db_dir = "data"
log_level = "info"
log_format = "plain"
genesis_file = "config/genesis.json"
priv_validator_key_file = "config/priv_validator_key.json"
priv_validator_state_file = "data/priv_validator_state.json"
priv_validator_laddr = ""
node_key_file = "config/node_key.json"
abci = "socket"
filter_peers = false
[rpc]
laddr = "tcp://0.0.0.0:XXXXX"
cors_allowed_origins = []
cors_allowed_methods = ["HEAD", "GET", "POST", ]
cors_allowed_headers = ["Origin", "Accept", "Content-Type", "X-Requested-With", "X-Server-Time", ]
grpc_laddr = ""
grpc_max_open_connections = 900
unsafe = false
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 20
timeout_broadcast_tx_commit = "10s"
max_body_bytes = 1000000
max_header_bytes = 1048576
tls_cert_file = ""
tls_key_file = ""
pprof_laddr = "localhost:XXXXX"
[p2p]
laddr = "tcp://0.0.0.0:XXXXX"
external_address = ""
seeds = ""
persistent_peers = ""
upnp = false
addr_book_file = "config/addrbook.json"
addr_book_strict = true
max_num_inbound_peers = 200
max_num_outbound_peers = 200
unconditional_peer_ids = ""
persistent_peers_max_dial_period = "0s"
flush_throttle_timeout = "100ms"
max_packet_msg_payload_size = 1024
send_rate = 5120000
recv_rate = 5120000
pex = true
seed_mode = false
private_peer_ids = ""
allow_duplicate_ip = false
handshake_timeout = "20s"
dial_timeout = "3s"
[mempool]
recheck = true
broadcast = true
wal_dir = ""
size = 5000
max_txs_bytes = 1073741824
cache_size = 10000
keep-invalid-txs-in-cache = false
max_tx_bytes = 1048576
max_batch_bytes = 0
[statesync]
enable = false
rpc_servers = ""
trust_height = 0
trust_hash = ""
trust_period = "168h0m0s"
discovery_time = "15s"
temp_dir = ""
chunk_request_timeout = "10s"
chunk_fetchers = "4"
[fastsync]
version = "v0"
[consensus]
wal_file = "data/cs.wal/wal"
timeout_propose = "3s"
timeout_propose_delta = "500ms"
timeout_prevote = "1s"
timeout_prevote_delta = "500ms"
timeout_precommit = "1s"
timeout_precommit_delta = "500ms"
timeout_commit = "5s"
double_sign_check_height = 0
skip_timeout_commit = false
create_empty_blocks = true
create_empty_blocks_interval = "0s"
peer_gossip_sleep_duration = "100ms"
peer_query_maj23_sleep_duration = "2s"
[tx_index]
indexer = "kv"
[instrumentation]
prometheus = false
prometheus_listen_addr = ":XXXXX"
max_open_connections = 3
namespace = "tendermint"

@github-project-automation github-project-automation bot moved this to 🩹 Triage in Cosmos Hub Feb 23, 2023
@mmulji-ic mmulji-ic moved this from 🩹 Triage to 🏗 In progress in Cosmos Hub Feb 23, 2023
@joslee7410
Copy link

Facing same issue here, disable the API server only the node will run and sync smoothly.

@mmulji-ic mmulji-ic self-assigned this Feb 24, 2023
@social244305-Architect
Copy link
Author

Checking in if you need any more details to reproduce. It's fairly consistent issue when node is exposed as public API.

@mmulji-ic
Copy link
Contributor

Hi @social244305-Architect , we fixed some issues with synching with v8.0.1 could you try with that version to see if you still have issues?

@social244305-Architect
Copy link
Author

Sure. Let me test it tonight and update with my findings.

@joslee7410
Copy link

Hi @social244305-Architect , we fixed some issues with synching with v8.0.1 could you try with that version to see if you still have issues?

I did try with v8.0.1, not really fix the problem.

But I try what @social244305-Architect said, I only allow my server to access api and currently no out of sync problem occur. Thank you.

@social244305-Architect
Copy link
Author

I tried with v8.0.1 but realized that my API end point was removed from Cosmos Directory. Created a PR to add my API end point for testing.

@social244305-Architect
Copy link
Author

Issue is still there with v8.0.1.

@mpoke mpoke added the type: bug Issues that need priority attention -- something isn't working label Mar 6, 2023
@mmulji
Copy link

mmulji commented Mar 7, 2023

I tried with v8.0.1 but realized that my API end point was removed from Cosmos Directory. Created a PR to add my API end point for testing.

Hi @social244305-Architect do you mean the chain-registry or something else? Could you add a link to the PR here.

@social244305-Architect
Copy link
Author

@mmulji You are correct. My rest end point was deleted from chain registry probably due to unavailability. I created PR to reinstate that end point and it was merged and i was able to test with v8.0.1 and it cause my node to choke up. PR provided below. Issue seems to be related to activity node receives when it's available on Cosmos Directory. If i turn of rest and restart node, i see no problem with p2p.

cosmos/chain-registry#1539

@mmulji-ic
Copy link
Contributor

Hi @social244305-Architect thanks for the update.

Indeed, if your node is published to the chain-registry it will be used by others. That will cause an increased load.
To mitigate this, if you still want a published node, reduce the number of max_num_inbound_peers nodes / max_num_outbound_peers in your config.toml.

Mainnet has now been upgraded to v9.0.1, I would like to check in to see if the issue still with the new release.

@mmulji-ic
Copy link
Contributor

Hi @social244305-Architect wanted to checkin to see if this was still an issue with the newer release. If we don't hear back from you next week, will close this issue.

@mmulji-ic mmulji-ic moved this from 🏗 In progress to 🛑 Blocked in Cosmos Hub Apr 4, 2023
@github-project-automation github-project-automation bot moved this from 🛑 Blocked to ✅ Done in Cosmos Hub Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Issues that need priority attention -- something isn't working
Projects
Status: ✅ Done
Development

No branches or pull requests

6 participants