Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Logging and persist async operation for Sapling migration #4002

Merged
merged 3 commits into from
May 13, 2019

Conversation

Eirik0
Copy link
Contributor

@Eirik0 Eirik0 commented May 9, 2019

Currently zcashd will automatically remove the last async migration operations when it reaches the height where it sends the transactions it just made. This is not in alignment with other async operations, which are not removed until a node is restarted or a user calls z_getoperationresult. This PR removes the calls to pop the operations so that they can be accessed and reviewed later. In this PR I also correct the operation's amount_migrated field to exclude the transaction fee (this field existed for debugging purposes, but should be consistent with z_getmigrationstatus), and have included the list of migration txids in the operation's result (this is similar to the async rpcs such as z_sendmany).

Documentation: none needed.

Test plan:

  • After migration transactions have been created, list the opids and call z_getoperationresult on them.
  • Check that the operations' amount_migrated fields exclude the fee.

@Eirik0 Eirik0 changed the title Do not automatically remove async migration operations and return txids Add Logging and persist async operation for Sapling migration May 9, 2019
Copy link
Contributor

@daira daira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut(ACK+cov)

@zcash zcash deleted a comment from Eirik0 May 10, 2019
@@ -67,6 +67,7 @@ void AsyncRPCOperation_saplingmigration::main() {
}

bool AsyncRPCOperation_saplingmigration::main_impl() {
LogPrint("zrpcunsafe", "Beginning AsyncRPCOperation_saplingmigration. id=%s\n", getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s should be at the start of the log message, to follow the convention used by other async operations. Also makes it easier to visually identify when examining log file or perform parsing based on opid.

@@ -126,21 +140,30 @@ bool AsyncRPCOperation_saplingmigration::main_impl() {
builder.AddSaplingOutput(ovkForShieldingFromTaddr(seed), migrationDestAddress, amountToSend - FEE);
CTransaction tx = builder.Build().GetTxOrThrow();
if (isCancelled()) {
LogPrint("zrpcunsafe", "%s: Canceled. Stopping.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime error will occur. Format specifier without any data to output. I guess you want a getId() here.

int noteIndex = 0;
CCoinsViewCache coinsView(pcoinsTip);
do {
CAmount amountToSend = chooseAmount(availableFunds);
auto builder = TransactionBuilder(consensusParams, targetHeight_, pwalletMain, pzcashParams, &coinsView, &cs_main);
LogPrint("zrpcunsafe", "%s: Beginning creating transaction with Sapling output amount=%s\n", getId(), amountToSend - FEE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible runtime error here. Format specifier is %s, but passing in CAmount. Missing FormatMoney() here?

@@ -108,6 +113,15 @@ bool AsyncRPCOperation_saplingmigration::main_impl() {
}
availableFunds -= fromNoteAmount;
for (const CSproutNotePlaintextEntry& sproutEntry : fromNotes) {
std::string data(sproutEntry.plaintext.memo().begin(), sproutEntry.plaintext.memo().end());
LogPrint("zrpcunsafe", "%s: Adding Sprout note input (txid=%s, vjoinsplit=%d, ciphertext=%d, amount=%s, memo=%s)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ciphertext= is incorrect, as the uint8_t index of the joinsplit output is passed in here. Should be jsoutindex= (and in other asyncrpcoperation classes, where this line was copied from).

asyncrpcoperation_mergetoaddress.cpp:648:            LogPrint("zrpcunsafe", "%s: spending note (txid=%s, vjoinsplit=%d, ciphertext=%d, amount=%s, height=%d, confirmations=%d)\n",
asyncrpcoperation_saplingmigration.cpp:117:            LogPrint("zrpcunsafe", "%s: Adding Sprout note input (txid=%s, vjoinsplit=%d, ciphertext=%d, amount=%s, memo=%s)\n",
asyncrpcoperation_sendmany.cpp:787:            LogPrint("zrpcunsafe", "%s: spending note (txid=%s, vjoinsplit=%d, ciphertext=%d, amount=%s, height=%d, confirmations=%d)\n",
asyncrpcoperation_sendmany.cpp:1051:        LogPrint("zrpcunsafe", "%s: found unspent Sprout note (txid=%s, vjoinsplit=%d, ciphertext=%d, amount=%s, memo=%s)\n",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eirik0 commit 5600562 is missing the fix for mergetoaddress and one (of the two) in sendmany.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and squashed in to the same commit.

++numTxCreated;
amountMigrated += amountToSend;
amountMigrated += amountToSend - FEE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the FEE excluded previously, but now included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to prevent confusion between this value and the amount_migrated returned by z_getmigrationstatus. It probably should have excluded the FEE originally.

@@ -604,7 +604,7 @@ void CWallet::RunSaplingMigration(int blockHeight) {
// height N-5
if (blockHeight % 500 == 495) {
std::shared_ptr<AsyncRPCQueue> q = getAsyncRPCQueue();
std::shared_ptr<AsyncRPCOperation> lastOperation = q->popOperationForId(saplingMigrationOperationId);
std::shared_ptr<AsyncRPCOperation> lastOperation = q->getOperationForId(saplingMigrationOperationId);
if (lastOperation != nullptr) {
lastOperation->cancel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the base class cancel() will be invoked. Is that the intended behaviour? Will the cancel state be set correctly i.e. is the operation currently in READY state to transition to CANCEL state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the operation will still exist in an internal job map used by AsyncRPCQueue. That means, the queue will still try and process the operation. If the cancel state has been set correctly, this is fine. If not, the operation will be executed. Popping the operation removes the operation from the internal job map, which means the queue will have nothing to execute.

/**
 * Return the operation for a given operation id and then remove the operation from internal storage.
 */
std::shared_ptr<AsyncRPCOperation> AsyncRPCQueue::popOperationForId(AsyncRPCOperationId id) {
    std::shared_ptr<AsyncRPCOperation> ptr = getOperationForId(id);
    if (ptr) {
        std::lock_guard<std::mutex> guard(lock_);
        // Note: if the id still exists in the operationIdQueue, when it gets processed by a worker
        // there will no operation in the map to execute, so nothing will happen.
        operation_map_.erase(id);
    }
    return ptr;
}

...
void AsyncRPCQueue::run(size_t workerId) {
   ...
        if (!operation) {
            // cannot find operation in map, may have been removed
        } else if (operation->isCancelled()) {
            // skip cancelled operation
        } else {
            operation->main();
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the base class cancel() will be invoked. Is that the intended behaviour? Will the cancel state be set correctly i.e. is the operation currently in READY state to transition to CANCEL state?

The intent of this is to cancel the operation if it is already running, this however does not work, and the fix does not seem to be simple (see #3988). What will currently happen is if a node is not finished generating migration transactions by the time we try to send them, then we will send what we have, but it will continue to generate transactions after the fact. These extra transactions will then be discarded in the next round of migrations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the operation will still exist in an internal job map used by AsyncRPCQueue. That means, the queue will still try and process the operation. If the cancel state has been set correctly, this is fine. If not, the operation will be executed. Popping the operation removes the operation from the internal job map, which means the queue will have nothing to execute.

This has been changed from pop to get

@Eirik0 Eirik0 requested a review from bitcartel May 10, 2019 19:30
@Eirik0 Eirik0 force-pushed the migration-status-info branch from 5600562 to 7f3b00b Compare May 10, 2019 21:03
@LarryRuane
Copy link
Collaborator

Suggestion, we can probably catch many of the LogPrint() format string errors with this modification:

$ git diff
diff --git a/qa/rpc-tests/sprout_sapling_migration.py b/qa/rpc-tests/sprout_sapling_migration.py
index cd67f9ea8..3f9c0c3d3 100755
--- a/qa/rpc-tests/sprout_sapling_migration.py
+++ b/qa/rpc-tests/sprout_sapling_migration.py
@@ -45,9 +45,10 @@ class SproutSaplingMigration(BitcoinTestFramework):
         # Add migration parameters to nodes[0]
         extra_args[0] = extra_args[0] + [
             '-migration',
-            '-migrationdestaddress=' + SAPLING_ADDR
+            '-migrationdestaddress=' + SAPLING_ADDR,
+            '-debug=zrpcunsafe',
         ]
-        assert_equal(4, len(extra_args[0]))
+        assert_equal(5, len(extra_args[0]))
         assert_equal(2, len(extra_args[1]))
         return start_nodes(4, self.options.tmpdir, extra_args)
 

With this change, running the test (without Eirik's latest commit) generates the expected error:

=== Running testscript sprout_sapling_migration.py ===
Initializing test directory /tmp/testQ9vGJ_
Running test using '-migrationdestaddress'...
Mining blocks...
waiting for async operation opid-289167a8-8aba-471f-a6f2-6405b54497ee
Mining to block 494 % 500...
migration operation: [{u'status': u'failed', u'creation_time': 1557521612, u'id': u'opid-0d6361c0-108c-431c-a703-9cb85139bcbc', u'target_height': 500, u'error': {u'message': u'runtime error: tinyformat: Too many conversion specifiers in format string', u'code': -1}, u'method': u'saplingmigration'}]
waiting for async operation opid-0d6361c0-108c-431c-a703-9cb85139bcbc
Assertion failed: (left == right)
  left: <None>
 right: <runtime error: tinyformat: Too many conversion specifiers in format string>
  File "/home/larry/zcash/qa/rpc-tests/test_framework/test_framework.py", line 121, in main
    self.run_test()
  File "/home/larry/zcash/qa/rpc-tests/sprout_sapling_migration.py", line 162, in run_test
    self.run_migration_test(self.nodes[0], sproutAddr0, SAPLING_ADDR, 500)
  File "/home/larry/zcash/qa/rpc-tests/sprout_sapling_migration.py", line 86, in run_migration_test
    result = wait_and_assert_operationid_status_result(node, operationstatus[0]['id'])
  File "/home/larry/zcash/qa/rpc-tests/test_framework/util.py", line 412, in wait_and_assert_operationid_status_result
    assert_equal(in_errormsg, errormsg)
  File "/home/larry/zcash/qa/rpc-tests/test_framework/util.py", line 362, in assert_equal
    raise AssertionError("(left == right)%s\n  left: <%s>\n right: <%s>" % (message, str(expected), str(actual)))
Stopping nodes
Cleaning up
Failed
!!! FAIL: sprout_sapling_migration.py !!!

The only downside is that more lines get written to the test's /tmp/testQ9vGJ_/node0/regtest/debug.log but that temp directory gets deleted anyway at the end of the test.

I'd even suggest enabling all debug logging (for all RPC tests), and run the full test suite -- I bet we'd uncover many more of these format string bugs. I may just try that.

@Eirik0
Copy link
Contributor Author

Eirik0 commented May 10, 2019

@LarryRuane I like your suggestion. I will add this.

@Eirik0 Eirik0 force-pushed the migration-status-info branch from 7f3b00b to 42daf46 Compare May 10, 2019 21:20
@Eirik0
Copy link
Contributor Author

Eirik0 commented May 10, 2019

The error found by @LarryRuane was in the TransactionBuilder that is now fixed too.

Copy link
Contributor

@bitcartel bitcartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK with sprout-to-sapling test passing.

@bitcartel
Copy link
Contributor

I rebased locally and there were no conflicts.

Copy link
Collaborator

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK (well... some testing, only ran the python test)

Copy link
Contributor

@daira daira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-utACK

@mms710 mms710 added this to the v2.0.5 milestone May 10, 2019
@Eirik0 Eirik0 force-pushed the migration-status-info branch from 42daf46 to 23d9826 Compare May 10, 2019 23:28
@Eirik0
Copy link
Contributor Author

Eirik0 commented May 13, 2019

@zkbot r+

@zkbot
Copy link
Contributor

zkbot commented May 13, 2019

📌 Commit 23d9826 has been approved by Eirik0

@zkbot
Copy link
Contributor

zkbot commented May 13, 2019

⌛ Testing commit 23d9826 with merge 4880676...

zkbot added a commit that referenced this pull request May 13, 2019
Add Logging and persist async operation for Sapling migration

Currently zcashd will automatically remove the last async migration operations when it reaches the height where it sends the transactions it just made. This is not in alignment with other async operations, which are not removed until a node is restarted or a user calls `z_getoperationresult`. This PR removes the calls to pop the operations so that they can be accessed and reviewed later. In this PR I also correct the operation's `amount_migrated` field to exclude the transaction fee (this field existed for debugging purposes, but should be consistent with `z_getmigrationstatus`), and have included the list of migration txids in the operation's result (this is similar to the async rpcs such as `z_sendmany`).

Documentation: none needed.

Test plan:
* After migration transactions have been created, list the opids and call `z_getoperationresult` on them.
* Check that the operations' `amount_migrated` fields exclude the fee.
Copy link
Contributor

@daira daira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereviewed from scratch after force push. utACK.

@zkbot
Copy link
Contributor

zkbot commented May 13, 2019

☀️ Test successful - pr-merge
Approved by: Eirik0
Pushing 4880676 to master...

@zkbot zkbot merged commit 23d9826 into zcash:master May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants