Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: v3 e2e upgrade #3910

Merged
merged 33 commits into from
Oct 9, 2024
Merged

test: v3 e2e upgrade #3910

merged 33 commits into from
Oct 9, 2024

Conversation

cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Sep 25, 2024

Closes #3772
Opens #3947

Testing

make test-e2e MajorUpgradeToV3
test-e2e2024/10/08 22:36:00 --- ✅ PASS: MajorUpgradeToV3

@rootulp rootulp self-assigned this Sep 30, 2024
@rootulp
Copy link
Collaborator

rootulp commented Sep 30, 2024

$ make test-e2e MajorUpgradeToV3
--> Running end to end tests
go run ./test/e2e MajorUpgradeToV3
test-e2e2024/09/30 10:24:36 === RUN MajorUpgradeToV3
test-e2e2024/09/30 10:24:36 Creating testnet
INFO[2024-09-30T10:24:36-04:00]log/logger.go:41 Current log level                             env_log_level=LOG_LEVEL log_level=info
ERRO[2024-09-30T10:24:41-04:00]traefik/traefik.go:355 Failed to discover Traefik API resources      error="the server could not find the requested resource"
2024/09/30 10:24:41 failed to create testnet: traefik API is not available
exit status 1
make: *** [test-e2e] Error 1

@evan-forbes @mojtaba-esk have either of you hit this?

@mojtaba-esk
Copy link
Member

$ make test-e2e MajorUpgradeToV3
--> Running end to end tests
go run ./test/e2e MajorUpgradeToV3
test-e2e2024/09/30 10:24:36 === RUN MajorUpgradeToV3
test-e2e2024/09/30 10:24:36 Creating testnet
INFO[2024-09-30T10:24:36-04:00]log/logger.go:41 Current log level                             env_log_level=LOG_LEVEL log_level=info
ERRO[2024-09-30T10:24:41-04:00]traefik/traefik.go:355 Failed to discover Traefik API resources      error="the server could not find the requested resource"
2024/09/30 10:24:41 failed to create testnet: traefik API is not available
exit status 1
make: *** [test-e2e] Error 1

@evan-forbes @mojtaba-esk have either of you hit this?

yes traefik is not installed on your cluster, ask @celestiaorg/devops team to install it on your target cluster.

@rootulp
Copy link
Collaborator

rootulp commented Sep 30, 2024

Sysrex added traefik API to the Robusta cluster so this is now unblocked. The test hangs for me though:

$ make test-e2e MajorUpgradeToV3
--> Running end to end tests
go run ./test/e2e MajorUpgradeToV3
test-e2e2024/09/30 13:53:39 === RUN MajorUpgradeToV3
test-e2e2024/09/30 13:53:39 Creating testnet
INFO[2024-09-30T13:53:39-04:00]log/logger.go:41 Current log level                             env_log_level=LOG_LEVEL log_level=info
{"level":"info","scope":"runmajorupgradetov3-20240930-135339","time":"2024-09-30T13:54:04-04:00","message":"Knuu initialized"}
test-e2e2024/09/30 13:54:04 Running major upgrade to v3 test version ef37dcd
test-e2e2024/09/30 13:54:04 Creating genesis nodes
test-e2e2024/09/30 13:54:04 Creating txsim
{"level":"info","name":"txsim","directory":"/var/folders/y0/dd92_x8x4tlf397xstgwfz_c0000gn/T/txsim","time":"2024-09-30T13:54:06-04:00","message":"txsim keyring directory created"}
{"level":"info","name":"txsim","pk":"PubKeySecp256k1{0300454F0C8C5CBE6ADBFF78C2CAC1AB432041A42FA29DB24510C7A15EFA9D32B2}","time":"2024-09-30T13:54:06-04:00","message":"txsim account created and added to genesis"}
{"level":"info","name":"txsim","image":"ghcr.io/celestiaorg/txsim:ef37dcd","args":"--key-path /home/celestia --grpc-endpoint 10.45.255.174:9090 --poll-time 1s --seed 42 --upgrade-schedule 20:3","time":"2024-09-30T13:54:06-04:00","message":"created tx client"}
test-e2e2024/09/30 13:54:06 Setting up testnet
{"level":"info","name":"val0","directory":"/var/folders/y0/dd92_x8x4tlf397xstgwfz_c0000gn/T/val0","time":"2024-09-30T13:54:06-04:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val1","directory":"/var/folders/y0/dd92_x8x4tlf397xstgwfz_c0000gn/T/val1","time":"2024-09-30T13:54:06-04:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val2","directory":"/var/folders/y0/dd92_x8x4tlf397xstgwfz_c0000gn/T/val2","time":"2024-09-30T13:54:06-04:00","message":"Creating validator's config and data directories"}
{"level":"info","name":"val3","directory":"/var/folders/y0/dd92_x8x4tlf397xstgwfz_c0000gn/T/val3","time":"2024-09-30T13:54:06-04:00","message":"Creating validator's config and data directories"}
test-e2e2024/09/30 13:54:06 Starting testnet
{"level":"info","time":"2024-09-30T13:54:11-04:00","message":"create endpoint proxies for genesis nodes"}
{"level":"info","name":"val0","version":"ef37dcd","time":"2024-09-30T13:54:20-04:00","message":"started and ports forwarded"}
{"level":"info","name":"val1","version":"ef37dcd","time":"2024-09-30T13:54:25-04:00","message":"started and ports forwarded"}
{"level":"info","name":"val2","version":"ef37dcd","time":"2024-09-30T13:54:31-04:00","message":"started and ports forwarded"}
{"level":"info","name":"val3","version":"ef37dcd","time":"2024-09-30T13:54:37-04:00","message":"started and ports forwarded"}
{"level":"info","time":"2024-09-30T13:54:37-04:00","message":"waiting for genesis nodes to sync"}
{"level":"info","name":"val0","time":"2024-09-30T13:54:37-04:00","message":"waiting for node to sync"}
{"level":"debug","RPC Address":"http://151.115.14.57:80/val0-f0257b5f-26657","time":"2024-09-30T13:54:37-04:00","message":"Creating HTTP client for node"}
{"level":"info","attempts":0,"name":"val0","time":"2024-09-30T13:54:37-04:00","message":"node has synced"}
{"level":"info","name":"val1","time":"2024-09-30T13:54:37-04:00","message":"waiting for node to sync"}
{"level":"debug","RPC Address":"http://151.115.14.57:80/val1-4060741b-26657","time":"2024-09-30T13:54:37-04:00","message":"Creating HTTP client for node"}
{"level":"info","attempts":0,"name":"val1","time":"2024-09-30T13:54:38-04:00","message":"node has synced"}
{"level":"info","name":"val2","time":"2024-09-30T13:54:38-04:00","message":"waiting for node to sync"}
{"level":"debug","RPC Address":"http://151.115.14.57:80/val2-c4614da0-26657","time":"2024-09-30T13:54:38-04:00","message":"Creating HTTP client for node"}
{"level":"info","attempts":0,"name":"val2","time":"2024-09-30T13:54:38-04:00","message":"node has synced"}
{"level":"info","name":"val3","time":"2024-09-30T13:54:38-04:00","message":"waiting for node to sync"}
{"level":"debug","RPC Address":"http://151.115.14.57:80/val3-b221ba4c-26657","time":"2024-09-30T13:54:38-04:00","message":"Creating HTTP client for node"}
{"level":"info","name":"val3","attempt":0,"time":"2024-09-30T13:54:38-04:00","message":"node is not synced yet, waiting..."}
{"level":"info","name":"val3","attempt":1,"time":"2024-09-30T13:54:38-04:00","message":"node is not synced yet, waiting..."}
{"level":"info","attempts":2,"name":"val3","time":"2024-09-30T13:54:39-04:00","message":"node has synced"}
{"level":"info","name":"txsim","time":"2024-09-30T13:54:40-04:00","message":"txsim started"}
test-e2e2024/09/30 14:31:30 waiting for upgrade
{"level":"debug","RPC Address":"http://151.115.14.57:80/val0-f0257b5f-26657","time":"2024-09-30T14:31:30-04:00","message":"Creating HTTP client for node"}
height 1669
height 1671
height 1674
height 1676
height 1678
height 1680
height 1683
height 1685
height 1687
height 1690
height 1692
height 1694
height 1696
height 1699
height 1701
height 1703
height 1705
height 1708
height 1710
test-e2e2024/09/30 14:32:30 --- ERROR MajorUpgradeToV3: failed to upgrade to v3, last height: 1710
exit status 1
make: *** [test-e2e] Error 1

Update: it didn't hang, it just took forever to complete.

@mojtaba-esk
Copy link
Member

Can we please merge this PR after merging #3911 as it has some updates to bump to knuu v0.16.1 ?
@cmwaters @rootulp

@rootulp
Copy link
Collaborator

rootulp commented Oct 1, 2024

Sure! This PR isn't ready and #3911 is ready.

@cmwaters
Copy link
Contributor Author

cmwaters commented Oct 1, 2024

I wasn't aware of this problem when testing. The last problem I experienced with this test is that the txsim wasn't able to execute the MsgSignalVersion. The keyring couldn't find the account of the validator to submit the message. So I was in the process of investigating which keys got added as I had assumed the validator keys would be part of it.

@evan-forbes evan-forbes added WS: V3 3️⃣ item is directly relevant to the v3 hardfork required issue is required to be closed before workstream can be closed labels Oct 2, 2024
@rootulp
Copy link
Collaborator

rootulp commented Oct 3, 2024

Note to self: can see logs for txsim by looking in Lens admin@k8s-knuu-1 and then filtering to namespace like: runmajorupgradetov3-20241003-163612

txsim logs say:

Starting txsim with command:
/bin/txsim --key-path /home/celestia --grpc-endpoint 10.42.89.14:9090 --poll-time 1s --seed 42 --upgrade-schedule 20:3

Error: no sequences specified. Use --stake, --send or --blob
Usage:
  txsim [flags]

Examples:
txsim --key-path /path/to/keyring --grpc-endpoint localhost:9090 --seed 1234 --poll-time 1s --blob 5

Flags:
      --blob int                  number of blob sequences to run
      --blob-amounts string       range of blobs per PFB specified as a single value or a min-max range (e.g., 10 or 5-10). A single value indicates the exact number of blobs to be created. (default "1")
      --blob-share-version int    optionally specify a share version to use for the blob sequences (default -1)
      --blob-sizes string         range of blob sizes to send (default "100-1000")
      --feegrant                  use the feegrant module to pay for fees
      --grpc-endpoint string      grpc endpoint to a running node
  -h, --help                      help for txsim
      --key-mnemonic string       space separated mnemonic for the keyring. The hdpath used is an empty string
      --key-path string           path to the keyring
      --master string             the account name of the master account. Leaving empty will result in using the account with the most funds.
      --poll-time duration        poll time for the transaction client (default 3s)
      --seed int                  seed for the random number generator

I'm considering adding a post-hook so that the validators can submit a signal message instead of requiring txSim to do that.

Update: I don't think this test is using the new txSim binary b/c the --upgrade-schedule flag isn't listed there

@rootulp
Copy link
Collaborator

rootulp commented Oct 4, 2024

Starting txsim with command:
/bin/txsim --key-path /home/celestia --grpc-endpoint 10.44.168.16:9090 --poll-time 1s --seed 42 --blob 1 --blob-amounts 100 --blob-sizes 100-2000 --upgrade-schedule 20:3

upgradeScheduleMap: map[20:3]
{"level":"info","address":"celestia1uu3jgdyvaqdntshffs7e6j3lsqceq5wd5ugfpm","balance":9999998999899600,"time":"2024-10-04T16:59:03Z","message":"set master account"}
{"level":"info","height":17,"address":"celestia1uu3jgdyvaqdntshffs7e6j3lsqceq5wd5ugfpm","msgs":"/cosmos.bank.v1beta1.MsgSend,/cosmos.bank.v1beta1.MsgSend","time":"2024-10-04T16:59:06Z","message":"tx committed"}
{"level":"info","address":"celestia1syvxa98zd5hr25r7qjq8tja9pkd3y4tydhmzjz","balance":1000000000,"time":"2024-10-04T16:59:06Z","message":"initialized account"}
{"level":"info","address":"celestia1wuh7pqua3qm70gs6cqvkk4k49jcuk8gfrk8wt8","balance":100000,"time":"2024-10-04T16:59:06Z","message":"initialized account"}
{"level":"error","error":"key with address celestia1j3wq8c7edndasx460hwfyrm26f26frj6a3fnyp not found: key not found","address":"celestia1j3wq8c7edndasx460hwfyrm26f26frj6a3fnyp","msgs":"/celestia.signal.v1.MsgSignalVersion","time":"2024-10-04T16:59:06Z","message":"tx failed"}
{"level":"error","error":"sequence 1: key with address celestia1j3wq8c7edndasx460hwfyrm26f26frj6a3fnyp not found: key not found [celestiaorg/[email protected]/crypto/keyring/keyring.go:489]","time":"2024-10-04T16:59:06Z","message":"sequence failed"}
{"level":"error","error":"broadcast tx error: share version 1 is not supported in 2. Supported from v3 onwards: unsupported share version","address":"celestia1syvxa98zd5hr25r7qjq8tja9pkd3y4tydhmzjz","blobs count":"100","total byte size of blobs":105956,"time":"2024-10-04T16:59:06Z","message":"tx failed"}
{"level":"error","error":"sequence 0: broadcast tx error: share version 1 is not supported in 2. Supported from v3 onwards: unsupported share version","time":"2024-10-04T16:59:06Z","message":"sequence failed"}
Error: sequence 0: broadcast tx error: share version 1 is not supported in 2. Supported from v3 onwards: unsupported share version

which makes sense because it's trying to send authored blobs before the upgrade from v2 -> v3 happens.

@rootulp
Copy link
Collaborator

rootulp commented Oct 4, 2024

I hit the error @cmwaters described

Starting txsim with command:
/bin/txsim --key-path /home/celestia --grpc-endpoint 10.45.61.218:9090 --poll-time 1s --seed 42 --blob 1 --blob-amounts 100 --blob-sizes 100-2000 --upgrade-schedule 20:3 --blob-share-version 0

upgradeScheduleMap: map[20:3]
{"level":"info","address":"celestia165t58ve34dr9kz4s9tvhtazdj0chkwshzfjwz5","balance":10000000000000000,"time":"2024-10-04T17:04:25Z","message":"set master account"}
{"level":"info","height":14,"address":"celestia165t58ve34dr9kz4s9tvhtazdj0chkwshzfjwz5","msgs":"/cosmos.bank.v1beta1.MsgSend,/cosmos.bank.v1beta1.MsgSend","time":"2024-10-04T17:04:28Z","message":"tx committed"}
{"level":"info","address":"celestia15jhfgfrmuhzj2yszdnr0wgc6vnh57sfktylm4g","balance":1000000000,"time":"2024-10-04T17:04:28Z","message":"initialized account"}
{"level":"info","address":"celestia17mhtmkp8mtp7u75sjs0jsyw7v54xqffc002q93","balance":100000,"time":"2024-10-04T17:04:28Z","message":"initialized account"}
{"level":"error","error":"key with address celestia1aq0pgp009862vskwr59785jnwc327mv6gtv5f5 not found: key not found","address":"celestia1aq0pgp009862vskwr59785jnwc327mv6gtv5f5","msgs":"/celestia.signal.v1.MsgSignalVersion","time":"2024-10-04T17:04:28Z","message":"tx failed"}
{"level":"error","error":"sequence 1: key with address celestia1aq0pgp009862vskwr59785jnwc327mv6gtv5f5 not found: key not found [celestiaorg/[email protected]/crypto/keyring/keyring.go:489]","time":"2024-10-04T17:04:28Z","message":"sequence failed"}

@rootulp
Copy link
Collaborator

rootulp commented Oct 4, 2024

Some debug logs:

Starting txsim with command:
/bin/txsim --key-path /home/celestia --grpc-endpoint 10.35.177.178:9090 --poll-time 1s --seed 42 --blob 1 --blob-amounts 100 --blob-sizes 100-2000 --upgrade-schedule 20:3 --blob-share-version 0

keys: %!v(MISSING)
reccord name: txsim address 4318238D9FF69295A3E86EE6C699457C40773B17
upgradeScheduleMap: map[20:3]
{"level":"info","address":"celestia1gvvz8rvl76fftglgdmnvdx2903q8wwch02sph6","balance":10000000000000000,"time":"2024-10-04T17:35:26Z","message":"set master account"}
{"level":"info","height":22,"address":"celestia1gvvz8rvl76fftglgdmnvdx2903q8wwch02sph6","msgs":"/cosmos.bank.v1beta1.MsgSend,/cosmos.bank.v1beta1.MsgSend","time":"2024-10-04T17:35:29Z","message":"tx committed"}
{"level":"info","address":"celestia1zwkfr4v6scp393y36dvm9tng0rgc3dkdnesu0q","balance":1000000000,"time":"2024-10-04T17:35:29Z","message":"initialized account"}
{"level":"info","address":"celestia1jatkxmcnz825vjd6cm83zk5gdzwe9un0cxnlc7","balance":100000,"time":"2024-10-04T17:35:29Z","message":"initialized account"}
{"level":"error","error":"key with address celestia1h6aj6gp3wz405dvw2jtr9pj6hz0wuzl2w07rlk not found: key not found","address":"celestia1h6aj6gp3wz405dvw2jtr9pj6hz0wuzl2w07rlk","msgs":"/celestia.signal.v1.MsgSignalVersion","time":"2024-10-04T17:35:29Z","message":"tx failed"}
{"level":"error","error":"sequence 1: key with address celestia1h6aj6gp3wz405dvw2jtr9pj6hz0wuzl2w07rlk not found: key not found [celestiaorg/[email protected]/crypto/keyring/keyring.go:489]","time":"2024-10-04T17:35:29Z","message":"sequence failed"}
{"level":"info","height":25,"address":"celestia1zwkfr4v6scp393y36dvm9tng0rgc3dkdnesu0q","blobs count":"100","total byte size of blobs":105956,"time":"2024-10-04T17:35:32Z","message":"tx committed"}

Looks like validator keys don't get added. Validator instances get a volume /home/celestia/.celestia-app. TxSim nodes use a temp directory to add a key.

@rootulp
Copy link
Collaborator

rootulp commented Oct 4, 2024

Still didn't work:

Starting txsim with command:
/bin/txsim --key-path /home/celestia/.celestia-app --grpc-endpoint 10.45.203.36:9090 --poll-time 1s --seed 42 --blob 1 --blob-amounts 100 --blob-sizes 100-2000 --upgrade-schedule 20:3 --blob-share-version 0

keys: {0xc00120e5d0 0xc00121e220 test {[{}] [{}]}}
reccord name: txsim address celestia1vtskfhazquysg58ufc5dq4g5jp9rfd86kdz0fc
upgradeScheduleMap: map[20:3]
{"level":"info","address":"celestia1vtskfhazquysg58ufc5dq4g5jp9rfd86kdz0fc","balance":10000000000000000,"time":"2024-10-04T17:53:41Z","message":"set master account"}
{"level":"info","height":90,"address":"celestia1vtskfhazquysg58ufc5dq4g5jp9rfd86kdz0fc","msgs":"/cosmos.bank.v1beta1.MsgSend,/cosmos.bank.v1beta1.MsgSend","time":"2024-10-04T17:53:44Z","message":"tx committed"}
{"level":"info","address":"celestia1keldxjcxavgztpmjd9yfd9fk0gsdgfded2zhwp","balance":1000000000,"time":"2024-10-04T17:53:44Z","message":"initialized account"}
{"level":"info","address":"celestia10e7r95r7v32l2q7l0rmyww6nse58xyz8mfkyn6","balance":100000,"time":"2024-10-04T17:53:44Z","message":"initialized account"}
{"level":"error","error":"key with address celestia16lnajh2lcrf8crrv3ez7l54pk2tegsgh56dk0a not found: key not found","address":"celestia16lnajh2lcrf8crrv3ez7l54pk2tegsgh56dk0a","msgs":"/celestia.signal.v1.MsgSignalVersion","time":"2024-10-04T17:53:44Z","message":"tx failed"}
{"level":"error","error":"sequence 1: key with address celestia16lnajh2lcrf8crrv3ez7l54pk2tegsgh56dk0a not found: key not found [celestiaorg/[email protected]/crypto/keyring/keyring.go:489]","time":"2024-10-04T17:53:44Z","message":"sequence failed"}

Update: I don't see any logic to copy the keys from the validators to the txSim nodes.

@rootulp
Copy link
Collaborator

rootulp commented Oct 7, 2024

Did you encounter a similar issue?

I encountered that on an earlier commit of this PR but thought it was fixed after subsequent commits. Hmm I can repro on the most recent commit though.

@rootulp rootulp marked this pull request as draft October 7, 2024 20:19
@rootulp
Copy link
Collaborator

rootulp commented Oct 9, 2024

713bd50 works
acc95b0 doesn't work
9641129 doesn't work
6e4fa05 doesn't work
11a01b0 doesn't work

@rootulp rootulp requested review from rootulp, staheri14 and evan-forbes and removed request for rootulp October 9, 2024 03:07
@rootulp rootulp marked this pull request as ready for review October 9, 2024 03:08
Copy link
Contributor Author

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I can't approve because it's my PR :)

test/e2e/major_upgrade_v3.go Show resolved Hide resolved
Copy link
Contributor

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
required issue is required to be closed before workstream can be closed WS: V3 3️⃣ item is directly relevant to the v3 hardfork
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an e2e major upgrade test for v3
6 participants