Added fault checking on every step in the net framework #338

vkomenda · 2018-11-13T17:39:41Z

Tests will fail if there is a step with a fault report referring to a correct node. This PR also adds storage for the fault log in the test node struct alongside output storage.

afck

The check in process_step looks good to me.
Not sure whether the store_step functionality should better be left to the individual test?

vkomenda · 2018-11-14T09:17:47Z

Not sure whether the store_step functionality should better be left to the individual test?

VirtualNet::new produces Steps internally but doesn't outputs those. It calls process_step from where the step can be stored.

We could output steps from new of course if it's better.

afck · 2018-11-14T09:36:04Z

I see; returning them from VirtualNet::new sounds like the right thing to do then, but maybe there's a reason why that's not the case? (@mbr)

vkomenda · 2018-11-15T15:15:55Z

@mbr, your input would be welcome on this.

mbr · 2018-11-19T16:55:42Z

The basic design philosophy behind VirtualNet is that it makes all the steps available and everything is passed back to the test (which may choose to ignore the faults, for example, when checking for the presence of them).

However, all Steps are passed through processing functions to copy out all messages, which are handled automatically by the network. The fact that outputs are collected is just a convenience bonus.

So, you're right, the fact that new does not pass out the Steps is probably a mistake. Furthermore, collecting faults like outputs is definitely a possibility, however we should not panic on these. At most, we could add an optional panic_on_fault convenience method. Ideally though, there'd be no panic, and we'd introduce a new error for the crank function and others.

mbr · 2018-11-19T16:58:01Z

So, to summarize, we actually need to solve three issues?

Make new (and hopefully all other functions) return the generated Steps for inspection by the actual test.
Collect FaultLogs as a convenience.
Add a newoptional failure condition (might even be on by default) that causes an error if a fault is encountered.

vkomenda · 2018-11-19T17:02:42Z

Thanks, @mbr. I'll get following through your points.

vkomenda · 2018-11-20T12:06:42Z

@mbr, please check my response to your review comments.

mbr

I feel like I might be missing something here. Most changes are just docs and smaller details I would be willing to merge without, another minor thing is the error style.

My biggeest question is whether the fault-checking logic is placed correctly, I feel like there could be a way that does this simpler and without altering internal APIs.

mbr · 2018-12-04T12:36:24Z

tests/net/err.rs

@@ -29,6 +34,10 @@ where
    MessageLimitExceeded(usize),
    /// The execution time limit has been reached or exceeded.
    TimeLimitHit(time::Duration),
+    /// Fault encountered.


I know it's a bit nitpicky, but I would really have a slightly longer comment here. Coming back to this after not touching the tests for a week or two, messages like "The algorithm run by the node produced a DistAlgorithm::Error while processing a message." (see above) are instantly clear, but "Fault encountered." is not. Ideally, this would mention when a fault could possibly encountered.

I believe the docs for CrankError::Crypto and CrankError::Algorithm could also use an extra half of a sentence.

mbr · 2018-12-04T12:37:27Z

tests/net/mod.rs

@@ -73,6 +72,8 @@ pub struct Node<D: DistAlgorithm> {
    is_faulty: bool,
    /// Captured algorithm outputs, in order.
    outputs: Vec<D::Output>,
+    /// Collected fault log.


, in order.

Can you explain what you mean by collecting in order?

mbr · 2018-12-04T12:54:38Z

tests/net/err.rs

+    /// Fault encountered.
+    Fault(Fault<D::NodeId>),
+    /// Threshold cryptography error.
+    Crypto(crypto::error::Error),


This is very nondescript, I believe this is similar to having a lot of Io(io::Error) errors (this was in the discussion about error styles we had a quarter earlier, when we adopted Nick's excellent style.

So maybe this should be a InitialKeyGeneration error, with /// The initial key generation for threshold cryptography failed. or similar.

mbr · 2018-12-04T13:21:25Z

tests/net/mod.rs

-
-    message_count
+        .store_step(step);
+    if error_on_fault {


I am not sure this is the right place to perform the check that no faults occurred (I am willing to be convinced otherwise though). Performing it here means adding extra cruft to the process step method, I would assume a much more natural place is in the crank() method itself?

I want to check for faults whenever an algorithm makes a step. This goes outside the scope of crank. In fact, it matches calls to process_step one-to-one.

mbr · 2018-12-04T13:23:09Z

tests/net/mod.rs

@@ -915,12 +979,16 @@ where

        // All messages are expanded and added to the queue. We opt for copying them, so we can
        // return unaltered step later on for inspection.
-        self.message_count = self.message_count.saturating_add(process_step(
+        match process_step(
            &mut self.nodes,
            receiver.clone(),
            &step,


We have a &DaStep here, could we not leave the code as is and check for faults above or below the process_step call?

afck · 2018-12-10T11:30:49Z

tests/net/mod.rs

@@ -318,6 +343,9 @@ where
    message_limit: Option<usize>,
    /// Optional time limit.
    time_limit: Option<time::Duration>,
+    /// Property to cause an error if a `Fault` is output from a correct node. By default,
+    /// encountering a fault leads to an error.
+    error_on_fault: bool,


Do we ever want to turn that off?
I imagine that we'll always want any tests to fail if a correct node blames a correct node?

There was a comment asking to have this option but I cannot find it now.

By disabling errors on faults you would be able to see more faults and get stalling or divergence.

Not sure if that's really useful, but I'm fine with it, too.

afck · 2018-12-10T11:31:07Z

tests/net/mod.rs

+    /// Property to cause an error if a `Fault` is output from a correct node. By default,
+    /// encountering a fault leads to an error.
+    ///
+    /// The deault setting `true` can be changed using this function.


Typo: default

afck

Looks good to me.
Feel free to merge once @mbr's comments have been addressed.

afck · 2018-12-10T12:00:35Z

tests/net/mod.rs

@@ -318,6 +343,9 @@ where
    message_limit: Option<usize>,
    /// Optional time limit.
    time_limit: Option<time::Duration>,
+    /// Property to cause an error if a `Fault` is output from a correct node. By default,
+    /// encountering a fault leads to an error.
+    error_on_fault: bool,


Not sure if that's really useful, but I'm fine with it, too.

vkomenda · 2018-12-10T12:29:56Z

Now we have a maintenance task: the crossbeam dependency of the old example has to be updated or the example should be removed.

vkomenda requested review from mbr and afck November 13, 2018 17:42

afck reviewed Nov 14, 2018

View reviewed changes

vkomenda force-pushed the vk-net-fault-check1 branch 2 times, most recently from 78a4e1f to a457e9b Compare November 19, 2018 14:35

vkomenda force-pushed the vk-net-fault-check1 branch from cace687 to 8a23603 Compare November 20, 2018 15:29

vkomenda force-pushed the vk-net-fault-check1 branch from 8a23603 to 1caa4fd Compare November 28, 2018 15:03

mbr reviewed Dec 4, 2018

View reviewed changes

vkomenda force-pushed the vk-net-fault-check1 branch from 0e00709 to d0caa26 Compare December 4, 2018 14:54

afck reviewed Dec 10, 2018

View reviewed changes

afck approved these changes Dec 10, 2018

View reviewed changes

vkomenda force-pushed the vk-net-fault-check1 branch from cf7e96e to 74678b3 Compare December 10, 2018 23:28

vkomenda added 7 commits December 11, 2018 07:35

added fault checking in the net framework

db1da8a

check that the node in the fault report is not faulty

443e13b

simplified a condition

7b027eb

made error on fault a parameter of VirtualNet

922e789

updated the BA test to error on fault

bfcc728

explained errors and refactored an assignment

bb31e79

typo fix

3d7f516

vkomenda force-pushed the vk-net-fault-check1 branch from 74678b3 to 3d7f516 Compare December 11, 2018 07:40

vkomenda merged commit c1c7fff into master Dec 11, 2018

vkomenda deleted the vk-net-fault-check1 branch December 11, 2018 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added fault checking on every step in the net framework #338

Added fault checking on every step in the net framework #338

vkomenda commented Nov 13, 2018

afck left a comment

vkomenda commented Nov 14, 2018 •

edited

Loading

afck commented Nov 14, 2018

vkomenda commented Nov 15, 2018

mbr commented Nov 19, 2018

mbr commented Nov 19, 2018

vkomenda commented Nov 19, 2018

vkomenda commented Nov 20, 2018

mbr left a comment

mbr Dec 4, 2018

mbr Dec 4, 2018

vkomenda Dec 4, 2018

mbr Dec 4, 2018

mbr Dec 4, 2018

vkomenda Dec 4, 2018

mbr Dec 4, 2018

afck Dec 10, 2018

vkomenda Dec 10, 2018

vkomenda Dec 10, 2018

afck Dec 10, 2018

afck Dec 10, 2018

afck left a comment

afck Dec 10, 2018

vkomenda commented Dec 10, 2018

Added fault checking on every step in the net framework #338

Added fault checking on every step in the net framework #338

Conversation

vkomenda commented Nov 13, 2018

afck left a comment

Choose a reason for hiding this comment

vkomenda commented Nov 14, 2018 • edited Loading

afck commented Nov 14, 2018

vkomenda commented Nov 15, 2018

mbr commented Nov 19, 2018

mbr commented Nov 19, 2018

vkomenda commented Nov 19, 2018

vkomenda commented Nov 20, 2018

mbr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkomenda commented Dec 10, 2018

vkomenda commented Nov 14, 2018 •

edited

Loading