Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Ingestion broken (Horizon vers 2.15.1) #4321

Closed
jcx120 opened this issue Apr 5, 2022 · 7 comments
Closed

Parallel Ingestion broken (Horizon vers 2.15.1) #4321

jcx120 opened this issue Apr 5, 2022 · 7 comments
Assignees

Comments

@jcx120
Copy link

jcx120 commented Apr 5, 2022

-parallel-workers seems to be broken.

From stellar.private.horizon (keybase channel)

the injestion thread started with 24 but slowly dwindled to 1 thread

> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py 
> [9693588, 'applying', '96.69%', '3308 to go']
> [9792722, 'applying', '95.86%', '4142 to go']
> [9885780, 'applying', '88.94%', '11052 to go']
> [9980985, 'applying', '84.18%', '15815 to go']
> [10071786, 'applying', '75.01%', '24982 to go']
> [10151121, 'applying', '54.37%', '45615 to go']
> [10264485, 'applying', '67.77%', '32219 to go']
> [10373081, 'applying', '76.4%', '23591 to go']
> [10472306, 'applying', '75.66%', '24334 to go']
> [10537983, 'applying', '41.36%', '58625 to go']
> [10651883, 'applying', '55.29%', '44693 to go']
> [10696576, 'processing']
> [10796544, 'processing']
> [10896512, 'processing']
> [10996480, 'processing']
> [11096448, 'processing']
> [11196416, 'processing']
> [11296384, 'processing']
> [11396352, 'processing']
> [11496320, 'processing']
> [11696256, 'processing']
> [11796224, 'processing']
> [11896192, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 24
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 23
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 1
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 1
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py
> [12895872, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ tail -n1 /data/nodestate/collector.log | jq
> {
>   "time": "2022-04-05T11:17:33.116795",
>   "level": "fatal",
>   "responding": false,
>   "healthy": false,
>   "network": "mainnet",
>   "ref": "https://dev.xxxxxx/",
>   "peer_count": 0,
>   "peers_missing": 2,
>   "peer_count_expected": 2,
>   "block_number": 12945856,
>   "highest_block_number": 40339141,
>   "blocks_behind": 27393285,
>   "stellar_core": {
>     "ingest_low": 12895872,
>     "ingest_high": 12995840,
>     "ingest_num_threads": 1
>   },
>   "message": "Can't request "https://localhost:8000/"",
>   "status": "Booting"
> }
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 1
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py | wc -l
> 1
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py
> [12895872, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py
> [12895872, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py
> [12895872, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ ./monitor-full-history.py
> [12895872, 'processing']
> enode@c95vr35adhrhjpm9eap0:~/bin$ tail -n1 /data/nodestate/collector.log | jq
> {
>   "time": "2022-04-05T12:09:15.702804",
>   "level": "fatal",
>   "responding": false,
>   "healthy": false,
>   "network": "mainnet",
>   "ref": "<URL>",
>   "peer_count": 0,
>   "peers_missing": 2,
>   "peer_count_expected": 2,
>   "block_number": 12945856,
>   "highest_block_number": 40339642,
>   "blocks_behind": 27393786,
>   "stellar_core": {
>     "ingest_low": 12895872,
>     "ingest_high": 12995840,
>     "ingest_num_threads": 1
>   },
>   "message": "Can't request "https://localhost:8000/"",
>   "status": "Booting"
> }
> this is currently stopping us from being able to deploy FH nodes

Version Info ---

> <>:/# stellar-horizon version
> 2.15.1-e29c7803d487c3f5b44a74773e6277fee16482cf
> go1.17.8
> <>:/# stellar-core version
> stellar-core 18.4.0 (13ef7c0f3ae85306ddb8633702c649c8f6ee94bb)
@jcx120
Copy link
Author

jcx120 commented Apr 5, 2022

may be related to this issue from a different customer: #4319

@2opremio 2opremio self-assigned this Apr 6, 2022
@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

Related: #4255

The parsing error is probably a red herring (and likely caused by stellar core crashing, my guess is it's OOM-killed)

@d4vido
Copy link

d4vido commented Apr 6, 2022

gcp instance
n2d-standard-32 32(vcpu) 128(memory)
https://cloud.google.com/compute/docs/general-purpose-machines#n2d-standard

version
root@abae030320fd:/# stellar-horizon version
2.15.1-e29c7803d487c3f5b44a74773e6277fee16482cf
go1.17.8
root@abae030320fd:/# stellar-core version
stellar-core 18.4.0 (13ef7c0f3ae85306ddb8633702c649c8f6ee94bb)

command
Ingesting from 1 to 40355048 (40355048 ledgers) using reingest range

  • docker exec -t -e PARALLEL_JOB_SIZE=100000 -e RETRIES=10 -e RETRY_BACKOFF_SECONDS=5 stellar-horizon stellar-horizon db reingest range --parallel-workers=24 1 40355048

@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

24 workers is probably too much

With recent ledgers, captive core consumes >10GB or RAM

That's 24*10 = 240 GB of RAM, which is higher than 128GB (without even counting the RAM consumed by the system and by horizon).

Did you monitor the used RAM while running reingestion?

@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

Note that we are planning to reduce the RAM requirements in Horizon 3.0 by not requiring the use of Captive Core.

@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

Also note that there is an issue in Core version > 18.2.0 and < 18.5.0 which breaks reingestion.

See stellar/stellar-core#3360

Please upgrade to Core 18.5.0 or downgrade to Core 18.2.0

@jcx120
Copy link
Author

jcx120 commented Apr 14, 2022

Seems to be resolved with update of Core > 18.5.0

@jcx120 jcx120 closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants