Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shielded syncing a separate command #2422

Closed
wants to merge 7 commits into from

Conversation

batconjurer
Copy link
Member

@batconjurer batconjurer commented Jan 22, 2024

Describe your changes

Previously, the scanning of MASP notes was done as part of various MASP commands. This moves that logic into a separate command so that it can be done out of band.

Indicate on which release or other PRs this topic is based on

#2363

Checklist before merging to draft

  • I have added a changelog
  • Git history is in acceptable state

@batconjurer batconjurer requested a review from grarco January 22, 2024 15:02
@batconjurer batconjurer mentioned this pull request Jan 23, 2024
Copy link
Collaborator

@murisi murisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes, I think I understand the general flow. But maybe I am missing some of the higher level considerations that motivated this approach to separating MASP synchronization into a separate command. Some questions:

  • Have alternatives to interrupts been considered? Like continuously saving/appending new transactions to a file?
  • Given interrupt usage and SyncStatus modality, will the MASP functionality be easily usable from the SDK on all platforms (including the web)?
  • How do the ShieldedContext lock guard in the SDK and SyncStatus marker type interact? Is there any overlap or redundancy here?
  • How do we handle shielded context file consistency when multiple clients (from separate processes) are running?

@@ -1471,6 +1463,22 @@ pub struct IndexedTx {
pub index: TxIndex,
}

impl PartialOrd for IndexedTx {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this ordering of IndexedTx and the one from #[derive(Ord, PartialOrd)]? Isn't the latter also lexicographic? No?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't trust the derived ord (especially if this struct changes in the future) and I wanted to be very explicit to other devs since this ordering is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave a note about it (or a test)

@batconjurer
Copy link
Member Author

batconjurer commented Jan 24, 2024

Thanks for these changes, I think I understand the general flow. But maybe I am missing some of the higher level considerations that motivated this approach to separating MASP synchronization into a separate command. Some questions:

* Have alternatives to interrupts been considered? Like continuously saving/appending new transactions to a file?

* Given interrupt usage and SyncStatus modality, will the MASP functionality be easily usable from the SDK on all platforms (including the web)?

* How do the `ShieldedContext` lock guard in the SDK and `SyncStatus` marker type interact? Is there any overlap or redundancy here?

* How do we handle shielded context file consistency when multiple clients (from separate processes) are running?
  • I considered having a daemon process that would do such a thing. I think it's more work and will be more complicated to manage multiple process reading and writing to the same shielded context. It is something I'd like eventually, but Adrian didn't consider it a prioriy.
  • This functionality will only work in the CLI. I will pull the syncing logic up into apps to reflect this. For the web, something bespoke will need to be written
  • The SyncStatus is there so the syncing and non-syncing related functions are actually implemented on separate types. This was especially helpful in making sure I had gotten rid of all the fetch calls in client code. This wasn't caught by the lock guards.
  • I hadn't considered this point because my impression was that the existing Namada trait implementations covered these problems. If that isn't the case, indeed I will need to do more work checking this. In particular, I won't allow more than one process access to a shielded context at a time. This can be relaxed later.

@@ -187,7 +187,7 @@ fn run_ledger_ibc() -> Result<()> {
}

#[test]
fn run_ledger_ibc_with_hermes() -> Result<()> {
fn drun_ledger_ibc_with_hermes() -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo here

// Describe how a Transfer simply subtracts from one
// account and adds the same to another
tx_ctx.scan_tx(*indexed_tx, *epoch, tx, stx)?;
self.unscanned.txs.remove(indexed_tx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process of removing transactions starting from the oldest and going forwards in time could be interrupted, right? If this were to happen, then the very first accepted transactions would not be in self.unscanned.txs from the point that break is called and afterwards. Right? My question is: if the client were then to again add more new unknown keys to their wallet, would fetch start the scan from the very beginning? Or would it start the rescan at the transaction that was interrupted (despite the keys being new/unknown up till this point)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the bug I mentioned above. I have fixed this. If we have new keys, we need to scan from 0 to self.last_fetched. If some of those blocks, are already in the local cache, we can skip fetching them.

std::mem::swap(&mut self.unscanned.txs, &mut txs);
let txs = ProgressLogging::new(txs, io, ProgressType::Scan);
for (indexed_tx, (epoch, tx, stx)) in txs {
if self.interrupted() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an interrupt happens here, would self.unscanned.txs be left in a state where it's missing the very first transactions? If this were the case, would this hamper the client's future ability to scan the very first transactions with viewing keys that are not yet known at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the cache is cleared out. If new keys appear later, we have to repopoulate this cache.

last_query_height,
)
.await?;
self.unscanned.txs.extend(fetched);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this line, is self.unscanned.txs supposed to contain all transactions up till self.latest_unscanned()? After this line, is self.unscanned.txs supposed to contain all transactions up till last_query_height?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.latest_unscanned() is supposed to tell us the most recent block height in the local cache. However, there is a bug here. We need to fetch from zero up to self.last_fetched since we are trying to sync up new keys with existing keys.

@murisi
Copy link
Collaborator

murisi commented Jan 24, 2024

  • I considered having a daemon process that would do such a thing. I think it's more work and will be more complicated to manage multiple process reading and writing to the same shielded context. It is something I'd like eventually, but Adrian didn't consider it a prioriy.

I was thinking that there could be alternatives to interrupts other than daemon processes. Like for instance, we could do things as before but atomically commit the ShieldedContext to file storage (using ShieldedUtils::save) whenever we reach consistent states (like a transaction fetched into the context, or a transaction scanned into the context). This way if Ctrl-C is pressed or there's an OS crash, the last saved state is still preserved. I say this because reasoning about interrupts, when they can occur, how to leave/save the context in a consistent state, how to maintain correctness as more functionality is added to the shielded pool, and how to integrate it into web clients and other contexts may be more challenging than a model without channels and receivers.

@murisi
Copy link
Collaborator

murisi commented Jan 24, 2024

  • The SyncStatus is there so the syncing and non-syncing related functions are actually implemented on separate types. This was especially helpful in making sure I had gotten rid of all the fetch calls in client code. This wasn't caught by the lock guards.

I see, this is a valid approach. That being said, though it may be undesirable in to do a sync just before displaying balances in our client. I wouldn't say that it is necessarily illogical (to the point of being a type error) for users of the SDK to interleave syncing and non-syncing operations for their own applications.

@murisi murisi requested a review from tzemanovic January 24, 2024 17:12
This was referenced Jan 25, 2024
@murisi
Copy link
Collaborator

murisi commented Jan 29, 2024

When working with you on the note scanning algorithm, I noticed two things:

  1. The SyncStatus caused friction in our SDK usage when we were trying to save the ShieldedContext. We eventually worked around this issue by cloning the ShieldedContext before saving it (which should not be necessary since Borsh only requires a ShieldedContext reference to do serialization). We also had to introduce a PhantomData<SyncStatus> since the SDK code does not actually depend on the sync status.
  2. We have essentially moved the interruption logic out of the SDK and into the apps crate. This means that the SDK no longer really deals with partiatially completed synchronizations in its interface.

In light of the above, it may be worth reconsidering if and how we do SyncStatus. If we are doing it for aesthetic reasons, then it should be fine. However if we are doing it to enforce correctness, then we should establish (in the absence of interrupts) what sequence of save, fetch, load, and gen_shielded_transfer operations will corrupt the internal state of the ShieldedContext (and render it incorrect/unusable) and therefore require the overhead of the type system to prevent. My general fear is that we may be ossifying the apps crate's latest ShieldedContext usage pattern (synchronization subcommand completely separated from other MASP actions) into the SDK unnecessarily, thereby creating friction for other SDK use cases (for instance a non-interactive program that repeatedly fetches and does useful shielded work).

If the SyncStatus must remain, then we should also consider moving it into the apps crate where such information may be more relevant. Or alternatively, we could move these SyncStatus API changes into a separate PR since this partitioning of ShieldedContext (rather than the creation of a separate syncing command) seems to form the bulk of this PR's diff. Doing this could make reviewing the sync subcommand changes easier. That being said, I'm happy to review whichever approach you decide upon.

@brentstone
Copy link
Collaborator

Closing since this has already been done.

@brentstone brentstone closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants