-
Notifications
You must be signed in to change notification settings - Fork 89
chore(consensus): reuse db and prevent InsufficientPeers error #2242
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2242 +/- ##
==========================================
+ Coverage 66.31% 66.34% +0.02%
==========================================
Files 139 139
Lines 18341 18387 +46
Branches 18341 18387 +46
==========================================
+ Hits 12163 12199 +36
- Misses 4883 4891 +8
- Partials 1295 1297 +2 ☔ View full report in Codecov by Sentry. |
5f948dd
to
a76a409
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @asmaastarkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 139 at r1 (raw file):
# 1. The bootnode (validator 1) is started first for network peering. # 2. Validators 2+ are started next to join the network through the bootnode. # 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.
Why did you remove this?
Code quote:
# Validators are started in a specific order to ensure proper network formation:
# 1. The bootnode (validator 1) is started first for network peering.
# 2. Validators 2+ are started next to join the network through the bootnode.
# 3. Validator 0, which is the proposer, is started last so the validators don't miss the proposals.
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
return nodes def acquire_lock(data_dir):
Have you tested this both succeeding and failing?
Code quote:
def acquire_lock(data_dir):
crates/sequencing/papyrus_consensus/run_consensus.py
line 165 at r1 (raw file):
logs_dir = tempfile.mkdtemp() if db_dir:
Suggestion:
if db_dir is not None:
crates/sequencing/papyrus_consensus/run_consensus.py
line 166 at r1 (raw file):
if db_dir: assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist."
listdir triggers FileNotFoundError
which I think is descriptive enough.
crates/sequencing/papyrus_consensus/run_consensus.py
line 167 at r1 (raw file):
if db_dir: assert os.path.exists(db_dir), f"The specified directory {db_dir} does not exist." data_dirs = [d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))]
Just create here as a set.
Suggestion:
data_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))}
crates/sequencing/papyrus_consensus/run_consensus.py
line 172 at r1 (raw file):
assert ( len(data_dirs) == num_validators ), f"The specified directory {db_dir} must contain exactly {num_validators} validator directories."
This is not needed based on the expected_dirs
check below.
crates/sequencing/papyrus_consensus/run_consensus.py
line 176 at r1 (raw file):
# Ensure the directories are named data0, data1, ..., data{num_validators - 1} expected_dirs = {f"data{i}" for i in range(num_validators)} actual_dirs = set(data_dirs)
Create as set above
crates/sequencing/papyrus_consensus/run_consensus.py
line 180 at r1 (raw file):
assert ( expected_dirs == actual_dirs ), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."
We want to know that there are enough db folders. We don't care if there are also other folders.
Suggestion:
assert (
expected_dirs.issubset(actual_dirs)
), f"The directories in {db_dir} must be named {', '.join(expected_dirs)}."
crates/sequencing/papyrus_consensus/run_consensus.py
line 187 at r1 (raw file):
# Acquire lock on the db_dir lock_fd = acquire_lock(db_dir)
Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.
Suggestion:
with LockFile(db_dif) as lockfile:
crates/sequencing/papyrus_consensus/run_consensus.py
line 209 at r1 (raw file):
parser.add_argument("--base_layer_node_url", required=True) parser.add_argument("--num_validators", type=int, required=True) parser.add_argument("--db_dir", required=False, default=None)
Let's add a description:
Directory with existing DBs that this simulation can reuse.
Code quote:
parser.add_argument("--db_dir", required=False, default=None)
crates/sequencing/papyrus_consensus/src/lib.rs
line 116 at r2 (raw file):
{ // Add a short delay to allow peers to connect and avoid "InsufficientPeers" error tokio::time::sleep(std::time::Duration::from_secs(5)).await;
Let's make this part of the consensus config.
Code quote:
// Add a short delay to allow peers to connect and avoid "InsufficientPeers" error
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
a76a409
to
b97d984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 11 unresolved discussions (waiting on @matan-starkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 139 at r1 (raw file):
Previously, matan-starkware wrote…
Why did you remove this?
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, matan-starkware wrote…
Have you tested this both succeeding and failing?
Yes, I have run simulations with the following cases:
- without db_dir.
- reuse an existing db_dir with a single simulation.
- used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.
- ran multiple simulations simultaneously using the same db_dir.
crates/sequencing/papyrus_consensus/run_consensus.py
line 165 at r1 (raw file):
logs_dir = tempfile.mkdtemp() if db_dir:
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 166 at r1 (raw file):
Previously, matan-starkware wrote…
listdir triggers
FileNotFoundError
which I think is descriptive enough.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 167 at r1 (raw file):
Previously, matan-starkware wrote…
Just create here as a set.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 172 at r1 (raw file):
Previously, matan-starkware wrote…
This is not needed based on the
expected_dirs
check below.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 176 at r1 (raw file):
Previously, matan-starkware wrote…
Create as set above
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 180 at r1 (raw file):
Previously, matan-starkware wrote…
We want to know that there are enough db folders. We don't care if there are also other folders.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 187 at r1 (raw file):
Previously, matan-starkware wrote…
Using enter and exit to enable a with statement is a more pythonic way. Means you don't need to put the close explicitly here.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 209 at r1 (raw file):
Previously, matan-starkware wrote…
Let's add a description:
Directory with existing DBs that this simulation can reuse.
Done.
crates/sequencing/papyrus_consensus/src/lib.rs
line 116 at r2 (raw file):
Previously, matan-starkware wrote…
Let's make this part of the consensus config.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, 5 of 5 files at r4, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @asmaastarkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, asmaastarkware (asmaa-starkware) wrote…
Yes, I have run simulations with the following cases:
- without db_dir.
- reuse an existing db_dir with a single simulation.
- used an invalid db_dir that either doesn't exist or doesn't have the required data dirs.
- ran multiple simulations simultaneously using the same db_dir.
Great.
Have you tried running multiple simulations simultaneously correctly (different db_dir)?
crates/sequencing/papyrus_consensus/run_consensus.py
line 38 at r3 (raw file):
command = f"curl -s -X GET http://localhost:{port}/monitoring/metrics | grep -oP 'papyrus_consensus_height \\K\\d+'" result = subprocess.run(command, shell=True, capture_output=True, text=True) # returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.
This line is too long.
Code quote:
# returns the most recently decided height, or None if node is not ready or consensus has not yet reached any height.
crates/sequencing/papyrus_consensus/run_consensus.py
line 186 at r3 (raw file):
actual_dirs = {d for d in os.listdir(db_dir) if os.path.isdir(os.path.join(db_dir, d))} # Ensure the directories are named data0, data1, ..., data{num_validators - 1}
Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the else
right here where we originally created them.
crates/sequencing/papyrus_consensus/run_consensus.py
line 203 at r3 (raw file):
print( f"Output files will be stored in: {logs_dir} and data files will be stored in: {db_dir}" )
I think it's easier to read these on 2 separate lines (see how you feel actually running the script).
Suggestion:
print(f"DB files will be stored in: {db_dir}")
print(f"Logs will be stored in: {logs_dir}")
crates/sequencing/papyrus_consensus/src/config.rs
line 27 at r4 (raw file):
pub num_validators: u64, /// The delay in seconds before starting consensus to prevent InsufficientPeers error. pub consensus_delay: u64,
Can you put a Duration in a config?
Suggestion:
Duration
crates/sequencing/papyrus_consensus/src/config.rs
line 61 at r4 (raw file):
&self.consensus_delay, "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \ error.",
Suggestion:
Delay (seconds) before starting consensus to give time for network peering.
crates/sequencing/papyrus_consensus/src/lib.rs
line 109 at r4 (raw file):
start_height: BlockNumber, validator_id: ValidatorId, consensus_delay: u64,
Suggestion:
Duration
b97d984
to
1d18575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @matan-starkware)
crates/sequencing/papyrus_consensus/run_consensus.py
line 149 at r1 (raw file):
Previously, matan-starkware wrote…
Great.
Have you tried running multiple simulations simultaneously correctly (different db_dir)?
done
crates/sequencing/papyrus_consensus/run_consensus.py
line 38 at r3 (raw file):
Previously, matan-starkware wrote…
This line is too long.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 186 at r3 (raw file):
Previously, matan-starkware wrote…
Let's just delete this and the empty line above. I think it's obvious enough from he code, and we can see the
else
right here where we originally created them.
Done.
crates/sequencing/papyrus_consensus/run_consensus.py
line 203 at r3 (raw file):
Previously, matan-starkware wrote…
I think it's easier to read these on 2 separate lines (see how you feel actually running the script).
Done.
crates/sequencing/papyrus_consensus/src/config.rs
line 27 at r4 (raw file):
Previously, matan-starkware wrote…
Can you put a Duration in a config?
Done.
crates/sequencing/papyrus_consensus/src/config.rs
line 61 at r4 (raw file):
&self.consensus_delay, "The delay in seconds before starting consensus to prevent 'InsufficientPeers' \ error.",
Done.
crates/sequencing/papyrus_consensus/src/lib.rs
line 109 at r4 (raw file):
start_height: BlockNumber, validator_id: ValidatorId, consensus_delay: u64,
Done.
1d18575
to
3841618
Compare
3841618
to
647e080
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @asmaastarkware)
There hasn't been any activity on this pull request recently, and in order to prioritize active work, it has been marked as stale. |
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)