Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

beta channel ./net.sh sanity fails frequently #2314

Closed
3 tasks done
mvines opened this issue Jan 6, 2019 · 13 comments
Closed
3 tasks done

beta channel ./net.sh sanity fails frequently #2314

mvines opened this issue Jan 6, 2019 · 13 comments
Assignees
Labels
monitor Need more information. Waiting for this issue to reproduce

Comments

@mvines
Copy link
Contributor

mvines commented Jan 6, 2019

With #2278 fixed ./net.sh sanity succeeds sometimes but is still not stable (observable by the "testnet-edge Sanity" notifications in #ci-status).

Examining recent failures most of them seem to be related to failing to successfully execute/confirm an airdrop transaction. Need further debug on a local setup.

Blocked by:

@mvines mvines added this to the v0.12 Beacons milestone Jan 6, 2019
@mvines mvines self-assigned this Jan 6, 2019
@mvines
Copy link
Contributor Author

mvines commented Jan 7, 2019

Note: it's all good with leader rotation disabled

@mvines
Copy link
Contributor Author

mvines commented Jan 10, 2019

Simple STR for at least one of the N restart/sanity problems:

$ cd net/
$ ./ec2.sh create -n 3 -c 0
$ ./net.sh start
$ while sleep 60; do ./net.sh restart -r; done

@mvines
Copy link
Contributor Author

mvines commented Jan 16, 2019

Mostly waiting on #2317 to be fixed now

@mvines
Copy link
Contributor Author

mvines commented Jan 18, 2019

#2484 contains a couple fixes that ought to help with ./net.sh sanity

@mvines
Copy link
Contributor Author

mvines commented Feb 1, 2019

Recently there's been a noticeable improvement in the ./net.sh sanity pass rate. Needs continued monitoring but we appear to be on the right path

@mvines
Copy link
Contributor Author

mvines commented Feb 12, 2019

We've moved backwards here significantly with all the new features coming in over the last couple weeks. Putting this on hold for a while until more of forking lands

@mvines mvines added the blocked Unable to proceed label Feb 12, 2019
@mvines mvines removed the blocked Unable to proceed label Mar 6, 2019
@mvines mvines added the blocked Unable to proceed label Mar 8, 2019
@mvines mvines changed the title edge channel ./net.sh sanity fails frequently beta channel ./net.sh sanity fails frequently Mar 23, 2019
@mvines mvines removed the blocked Unable to proceed label Mar 23, 2019
@mvines
Copy link
Contributor Author

mvines commented Mar 23, 2019

No longer blocked by anything. This still manifests on testnet-beta. Here's one example from today, https://buildkite.com/solana-labs/testnet-management/builds/26334#01916cd9-c153-4ef7-99b8-987818aea73a, where an airdrop failed during the sanity test:

[2019-03-23T02:13:36.204647707Z INFO  solana_drone::drone] request_airdrop_transaction: drone_addr=13.57.20.27:9900 id=Bh8xara7wSc2LekDF7ML8CSEzedRRYQCXLzgZK252p1M lamports=60 blockhash=CnkLbhs9xGuXCBoNu1yJWRK3WNHRajqfoJ2KuxSDW18W
Error: RpcRequestError("Unable to fetch new blockhash, blockhash stuck at CnkLbhs9xGuXCBoNu1yJWRK3WNHRajqfoJ2KuxSDW18W")

@mvines mvines removed their assignment Mar 23, 2019
@pgarg66
Copy link
Contributor

pgarg66 commented Mar 29, 2019

@CriesofCarrots , is this the same problem you triaged yesterday? If so, could you add your analysis here?

@CriesofCarrots
Copy link
Contributor

Looking at the logs for these failures, it appears the the poh_recorder on the leader is no longer progressing, which results in the stuck blockhash and no transactions being processed.
That is as far as I dug, as my focus was on testnet boot errors.

@mvines
Copy link
Contributor Author

mvines commented Apr 13, 2019

#3749 hopefully helps here...

@mvines mvines added the monitor Need more information. Waiting for this issue to reproduce label Apr 15, 2019
@mvines
Copy link
Contributor Author

mvines commented Apr 15, 2019

Tip of master looks pretty stable recently, let's see how beta performs when we branch off v0.13 later today...

@mvines
Copy link
Contributor Author

mvines commented Apr 17, 2019

065cf51

@mvines
Copy link
Contributor Author

mvines commented Apr 18, 2019

Looking pretty solid now. No sanity failures overnight. If we're in a similar state tomorrow then this issue can be closed

@mvines mvines closed this as completed Apr 19, 2019
tao-stones pushed a commit to tao-stones/solana that referenced this issue Jul 29, 2024
willhickey pushed a commit that referenced this issue Aug 3, 2024
…2314) (#2342)

ledger-tool: Set initial last full snapshot slot (#2314)

(cherry picked from commit 75a640e)

Co-authored-by: Brooks <[email protected]>
ripatel-fd pushed a commit to ripatel-fd/solana that referenced this issue Aug 8, 2024
…olana-labs#2314) (solana-labs#2343)

ledger-tool: Set initial last full snapshot slot (solana-labs#2314)

(cherry picked from commit 75a640e)

Co-authored-by: Brooks <[email protected]>
thesoftwarejedi pushed a commit to step-finance/solana that referenced this issue Aug 8, 2024
…olana-labs#2314) (solana-labs#2342)

ledger-tool: Set initial last full snapshot slot (solana-labs#2314)

(cherry picked from commit 75a640e)

Co-authored-by: Brooks <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
monitor Need more information. Waiting for this issue to reproduce
Projects
None yet
Development

No branches or pull requests

4 participants