Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont trust reported last seen times #1

Conversation

teor2345
Copy link

Here are some:

  • fixes for my leap second changes
  • extra property tests we could do
  • fixes for clippy

I've commented out the extra property tests, because they depend on ZcashFoundation#2203

jvff and others added 20 commits May 25, 2021 23:43
Returning `impl IntoIterator` means that the caller will always be
forced to call `.into_iter()`, and returning `impl Iterator` still
allows them to call `.into_iter()` because it becomes the identity
function.
Due to clock skew, the peers could end up at the front of the
reconnection queue or far at the back. The solution to this is to offset
the reported times by the difference between the most recent reported
sight (in the remote clock) and the current time (in the local clock).
Times in the past don't have any security implications, so there's no
point in trying to apply the offset to them as well.
If any of the times gossiped by a peer are in the future, apply the
necessary offset to all the times gossiped by that peer. This ensures
that all gossiped peers from a malicious peer are moved further back in
the queue.

Co-authored-by: teor <[email protected]>
If an overflow occurs, the reported `last_seen` times are either very
wrong or malicious, so reject all addresses gossiped by that peer.
Use some mock gossiped peers that all have `last_seen` times in the
future and check that they all have a specific offset applied to them.
Use some mock gossiped peers that all have `last_seen` times in the
past and check that they don't have any changes to the `last_seen` times
applied by the `validate_addrs` function.
Use some mock gossiped peers where some have `last_seen` times in the
past and some have times in the future. Check that all the peers have
an offset applied to them by the `validate_addrs` function.

This tests if the offset is applied to all peers that a malicious peer
gossiped to us.
If the most recent `last_seen` time reported by a peer is exactly the
limit, the offset doesn't need to be applied because no times are in the
future.
Provides a strategy for generating arbitrary `MetaAddr` instances that
are created as if they have been gossiped by another peer.
Given a generated list of gossiped peers, ensure that after running the
`validate_addrs` function none of the resulting peers have a `last_seen`
time that's after the specified limit.
Make it clear why all peers have the time offset applied to them.

Co-authored-by: teor <[email protected]>
- Make the security impact clearer and in a separate section.
- Instead of listing an assumption as almost a side-note, describe it
  clearly inside a `Panics` section.
- A sentence was previously very weird, and it was bordering being
  incorrect. It was updated to be simpler and clearer, while also being
  more precise.

Co-authored-by: teor <[email protected]>
1. Validated MetaAddrs serialize without errors
2. Serialized bytes deserialize into the same MetaAddr
//
// Compare timestamps, allowing an extra second, to account for `chrono` leap seconds:
// See https://docs.rs/chrono/0.4.19/chrono/naive/struct.NaiveTime.html#leap-second-handling
prop_assert!(peer.get_last_seen().timestamp() <= last_seen_limit.timestamp() + 1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this change. Why should the timestamps be compared instead of the time itself?

Copy link
Author

@teor2345 teor2345 May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion here.

I made a follow-up comment about this change, but it seems like it got lost? (Maybe it was internet trouble.)

The chrono library stores dates to nanosecond precision. During a leap second, those nanoseconds go up to 2 million. We need to account for these extra nanoseconds in our calculations - in particular, we don't want Zebra to panic during a leap second. (Delaying things by a second should mostly be fine.)

So I made the random MetaAddr and datetime_full proptest times include nanoseconds in ZcashFoundation#2195.

But I actually don't think adjusting the tests is very robust. So I added commit bc90054 to normalise the time instead. (We only get nanoseconds from OS times. The serialized formats are all in seconds.)

Comment on lines 31 to 72
use zebra_chain::serialization::{ZcashDeserialize, ZcashSerialize};

// Check that malicious peers can't make Zebra send bad times to other peers
// (after Zebra's standard sanitization)
let sanitized_peer = peer.sanitize();

// Check that sanitization doesn't put times in the future
prop_assert!(sanitized_peer.get_last_seen().timestamp() <= last_seen_limit.timestamp() + 1,
"sanitized peer timestamp {} was greater than limit {}, original timestamp: {}",
sanitized_peer.get_last_seen().timestamp(),
last_seen_limit.timestamp(),
peer.get_last_seen().timestamp());

// Check that malicious peers can't make Zebra's serialization fail
let addr_bytes = peer.zcash_serialize_to_vec();
prop_assert!(addr_bytes.is_ok(),
"unexpected serialization error: {:?}, original timestamp: {}, sanitized timestamp: {}",
addr_bytes,
peer.get_last_seen().timestamp(),
sanitized_peer.get_last_seen().timestamp());

// Assume other implementations deserialize like Zebra
let deserialized_peer = MetaAddr::zcash_deserialize(addr_bytes.unwrap().as_slice());
prop_assert!(deserialized_peer.is_ok(),
"unexpected deserialization error: {:?}, original timestamp: {}, sanitized timestamp: {}",
deserialized_peer,
peer.get_last_seen().timestamp(),
sanitized_peer.get_last_seen().timestamp());
let deserialized_peer = deserialized_peer.unwrap();

// Check that serialization hasn't modified the address
// (like the sanitized round-trip test)
prop_assert_eq!(sanitized_peer, deserialized_peer);

// Check that sanitization, serialization, and deserialization don't
// put times in the future
prop_assert!(deserialized_peer.get_last_seen().timestamp() <= last_seen_limit.timestamp() + 1,
"deserialized peer timestamp {} was greater than limit {}, original timestamp: {}, sanitized timestamp: {}",
deserialized_peer.get_last_seen().timestamp(),
last_seen_limit.timestamp(),
peer.get_last_seen().timestamp(),
sanitized_peer.get_last_seen().timestamp());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm also missing something about these tests. Should each of them be separate #[proptest]s?

Also, should some of them be done in separate PRs? I mean, I don't understand how serialization and sanitization are related to the code to ensure the time limit. My intuition is that there should be a proptest to ensure that any arbitrary MetaAddr is serializable correctly and can be properly sanitized?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working through all of this myself.

Here's where I've got to today:

If we want to serialize the last seen time correctly, it needs to fit in a u32. But DateTime<Utc> returns seconds in i64. (And uses about ~56 bits internally.) So it might not uphold this invariant.

My intuition is that there should be a proptest to ensure that any arbitrary MetaAddr is serializable correctly and can be properly sanitized?

There is now. I wrote those tests in ZcashFoundation#2203 and they just got committed as 9f8b4f8 and 5cdcc52.

Sanitization truncates to the nearest 30 minutes. Since timestamp 0 is already at a 30 minute mark, it just happens to uphold the u32 range invariant. (Despite DateTime<Utc> not requiring that invariant.)

Also, should some of them be done in separate PRs? I mean, I don't understand how serialization and sanitization are related to the code to ensure the time limit.

The validation we're adding subtracts arbitrary times. This can move the last seen time outside the u32 range. So the MetaAddr appears to be valid, until we go to sanitize and serialize it. Then the entire connection gets closed due to the serialization error. (Which is a denial of service risk.)

I used these proptests to discover that error. (And similar errors with leap seconds.)

I think the best way to fix these bugs is to create a DateTime32(u32) type. (Which is part of ticket ZcashFoundation#2171.) Using a 32-bit internal representation will make sure we can't go outside that range.

Otherwise we will have to validate 32-bit datetimes every time we use or modify them. (In general, we try to use representations that automatically ensure our constraints.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go work on DateTime32 now.

Copy link
Author

@teor2345 teor2345 May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created DateTime32 and made MetaAddr use it in ZcashFoundation#2210

For the CandidateSet validation, here's how we could move forward:

  • take the last seen limit as DateTime32 (no nanoseconds or leap seconds, better proptest coverage)
    • panic if the current time is outside of DateTime32 (by 2038 the protocol should have been updated)
  • do all the calculations in chrono::DateTime (better API, no overflows or underflows because it's ~56 bits)
  • convert the times to DateTime32 for each address, returning an error if they're out of range

This fix modifies both the tests and the implementation.
@jvff jvff force-pushed the dont-trust-reported-last-seen-times branch from b19043c to 15a8ff0 Compare May 31, 2021 14:51
@teor2345
Copy link
Author

This is obsoleted by other PRs.

@teor2345 teor2345 closed this May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants