-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added fault checking on every step in the net framework #338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check in process_step
looks good to me.
Not sure whether the store_step
functionality should better be left to the individual test?
We could output steps from |
I see; returning them from |
78a4e1f
to
a457e9b
Compare
The basic design philosophy behind However, all So, you're right, the fact that |
So, to summarize, we actually need to solve three issues?
|
Thanks, @mbr. I'll get following through your points. |
@mbr, please check my response to your review comments. |
cace687
to
8a23603
Compare
8a23603
to
1caa4fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I might be missing something here. Most changes are just docs and smaller details I would be willing to merge without, another minor thing is the error style.
My biggeest question is whether the fault-checking logic is placed correctly, I feel like there could be a way that does this simpler and without altering internal APIs.
tests/net/err.rs
Outdated
@@ -29,6 +34,10 @@ where | |||
MessageLimitExceeded(usize), | |||
/// The execution time limit has been reached or exceeded. | |||
TimeLimitHit(time::Duration), | |||
/// Fault encountered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a bit nitpicky, but I would really have a slightly longer comment here. Coming back to this after not touching the tests for a week or two, messages like "The algorithm run by the node produced a DistAlgorithm::Error
while processing a message." (see above) are instantly clear, but "Fault encountered." is not. Ideally, this would mention when a fault could possibly encountered.
I believe the docs for CrankError::Crypto
and CrankError::Algorithm
could also use an extra half of a sentence.
tests/net/mod.rs
Outdated
@@ -73,6 +72,8 @@ pub struct Node<D: DistAlgorithm> { | |||
is_faulty: bool, | |||
/// Captured algorithm outputs, in order. | |||
outputs: Vec<D::Output>, | |||
/// Collected fault log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what you mean by collecting in order?
tests/net/err.rs
Outdated
/// Fault encountered. | ||
Fault(Fault<D::NodeId>), | ||
/// Threshold cryptography error. | ||
Crypto(crypto::error::Error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nondescript, I believe this is similar to having a lot of Io(io::Error)
errors (this was in the discussion about error styles we had a quarter earlier, when we adopted Nick's excellent style.
So maybe this should be a InitialKeyGeneration
error, with /// The initial key generation for threshold cryptography failed.
or similar.
|
||
message_count | ||
.store_step(step); | ||
if error_on_fault { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is the right place to perform the check that no faults occurred (I am willing to be convinced otherwise though). Performing it here means adding extra cruft to the process step method, I would assume a much more natural place is in the crank()
method itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to check for faults whenever an algorithm makes a step. This goes outside the scope of crank
. In fact, it matches calls to process_step
one-to-one.
@@ -915,12 +979,16 @@ where | |||
|
|||
// All messages are expanded and added to the queue. We opt for copying them, so we can | |||
// return unaltered step later on for inspection. | |||
self.message_count = self.message_count.saturating_add(process_step( | |||
match process_step( | |||
&mut self.nodes, | |||
receiver.clone(), | |||
&step, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a &DaStep
here, could we not leave the code as is and check for faults above or below the process_step
call?
0e00709
to
d0caa26
Compare
@@ -318,6 +343,9 @@ where | |||
message_limit: Option<usize>, | |||
/// Optional time limit. | |||
time_limit: Option<time::Duration>, | |||
/// Property to cause an error if a `Fault` is output from a correct node. By default, | |||
/// encountering a fault leads to an error. | |||
error_on_fault: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever want to turn that off?
I imagine that we'll always want any tests to fail if a correct node blames a correct node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a comment asking to have this option but I cannot find it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By disabling errors on faults you would be able to see more faults and get stalling or divergence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's really useful, but I'm fine with it, too.
tests/net/mod.rs
Outdated
/// Property to cause an error if a `Fault` is output from a correct node. By default, | ||
/// encountering a fault leads to an error. | ||
/// | ||
/// The deault setting `true` can be changed using this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Feel free to merge once @mbr's comments have been addressed.
@@ -318,6 +343,9 @@ where | |||
message_limit: Option<usize>, | |||
/// Optional time limit. | |||
time_limit: Option<time::Duration>, | |||
/// Property to cause an error if a `Fault` is output from a correct node. By default, | |||
/// encountering a fault leads to an error. | |||
error_on_fault: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's really useful, but I'm fine with it, too.
Now we have a maintenance task: the |
cf7e96e
to
74678b3
Compare
74678b3
to
3d7f516
Compare
See #120.
Tests will fail if there is a step with a fault report referring to a correct node. This PR also adds storage for the fault log in the test node struct alongside output storage.