Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIP-555: Network upgrade logic optimization #555

Closed
jwrct opened this issue Jun 9, 2023 · 8 comments
Closed

TIP-555: Network upgrade logic optimization #555

jwrct opened this issue Jun 9, 2023 · 8 comments

Comments

@jwrct
Copy link
Contributor

jwrct commented Jun 9, 2023

Tip: 555
Title: Network upgrade logic optimization
Author: [email protected]
Status: Final
Type: Standards Track
Category: core
Date: 2023-06-09

Simple Summary

This TIP is to specifically explain the current problems and solutions in the network upgrade logic in java-tron.

Abstract

The network upgrade strategy currently adopted by java-tron is to complete the upgrade after reaching the specified network upgrade time and the minimum percentage of upgraded SRs. On this basis, specific business logic upgrades can also be implemented through proposals. After a SR produces a block, the version (java-tron version currently used by the SR) field will be filled in the block header. Nodes will record the SR's version when processing the block, and then confirm whether the particular version meets the upgrade conditions.

Motivation

Currently, the network upgrade logic adopted by java-tron has some problems in certain scenarios. For example, in the scenario where the number of SRs changes, the upgraded version may roll back, or the state array may be accessed out of bounds; the lower version cannot be upgraded to the higher version, etc. In order to avoid abnormalities during network upgrades, it is necessary to solve these problems.

Rationale

The current network upgrade implementation logic has the following problems:

Percent Calculation Problem

When calculating the percentage of upgraded SRs, the denominator uses the number of current SR lists. If the number of SR lists changes from 5 to 6, the calculation results for historical versions may be wrong. For example, as shown in Figure 1, assuming a percentage of 80%, versions 1 and 2 have been upgraded. After the status of the non-upgraded version is reset during the maintenance period, the result becomes the picture on the right. The states of versions 1 and 2 have also been reset, because (4 / 6 < 80%). Reset results are not as expected.
image

Version Roll Back

case 1

If an SR actively downgrades the local version, other nodes will perform downgrading when processing the blocks produced by this SR, and reset the bit corresponding to the status of the not yet upgraded version that is greater than this SR’s version. The code is as the following,

private void downgrade(int version, int slot) {
    for (ForkBlockVersionEnum versionEnum : ForkBlockVersionEnum.values()) {
      int versionValue = versionEnum.getValue();
      if (versionValue > version) {
        byte[] stats = manager.getDynamicPropertiesStore().statsByVersion(versionValue);
        if (!check(stats) && Objects.nonNull(stats)) {
          stats[slot] = VERSION_DOWNGRADE;
          manager.getDynamicPropertiesStore().statsByVersion(versionValue, stats);
        }
      }
    }
  }
  
  private boolean check(byte[] stats) {
    if (stats == null || stats.length == 0) {
      return false;
    }
    for (int i = 0; i < stats.length; i++) {
      if (check[i] != stats[i]) {
        return false;
      }
    }
    return true;
  }

The check method is a full SR check. Using this method to check the version upgraded by percentage will cause problems. For example, the state array of version 2 is (1, 1, 0, 1, 1). Assuming that the passing percentage is 80%, then version 2 has been upgraded. If the second SR is downgraded to version 1, the state array of version 2 will change to (1, 0, 0, 1, 1), which causes the upgraded version to be downgraded. This is not what should be expected.

case 2

After the maintenance period is processed, an SR is added at the end of the SR list. The version of the newly added SR has an upgraded version but not the latest one. Afterward, when processing the newly added SR block, the state array of its version will be reset since the length of the array does not match the SR list length, causing the upgraded version to become a not upgraded one. The code looks like this:

  byte[] stats = manager.getDynamicPropertiesStore().statsByVersion(version);
  if (Objects.isNull(stats) || stats.length != witnesses.size()) {
    stats = new byte[witnesses.size()];
  }

State array access out of bounds

After the maintenance period is processed, an SR is added at the end of the SR list, and if the version of the newly added SR is not the latest version, then an exception of out-of-bounds access to the state array will occur during the downgrade process. As shown in the example in Figure 2.
image
A new SR5 is added, and its version is v2. When processing the SR5 version, the corresponding state in the state array of the version higher than v2 will be reset during the downgrade process. Since the length of the state array of v3 is 4, the array access would be out of bounds.

Historical versions cannot be upgraded

When dealing with block versions, when the version meets the upgrade conditions, the historical version lower than this version also needs to be upgraded, the code is as follows:

if (check(stats)) {
    upgrade(version, stats.length);
    return;
}

The check method is a full SR check. For example, if the state array of version v2 is (1, 1, 0, 1, 1), it has been upgraded, then the non-upgraded version lower than v2 should be upgraded, but the result is not what should be expected.

Specification

The upgraded version of our specification should not be allowed to be downgraded.

Implementation

For the problems described above, I propose the following solutions:

Percent Calculation Problem

When calculating the percentage of upgraded SRs, the denominator uses the size of state array.

Version Roll Back

case 1

During the downgrade process, use the pass method instead of the check method to determine whether the version has been upgraded. That is, it is judged whether the version is upgraded according to the rules corresponding to the version number.

case 2

The state array of the upgraded version is not allowed to be reset.

State array access out of bounds

In order to avoid a series of problems caused by different lengths of state arrays, the reset operation can be placed after the consensus module processing (new SR), which solves the problem of state array access out of bounds. The code modification looks like this:

    boolean flag = chainBaseManager.getDynamicPropertiesStore().getNextMaintenanceTime()
        <= block.getTimeStamp();
    if (flag) {
      proposalController.processProposals();
    }
    if (!consensus.applyBlock(block)) {
      throw new BadBlockException("consensus apply block failed");
    }
    if (flag) {
      chainBaseManager.getForkController().reset();
    }

Historical versions cannot be upgraded

Use the pass method (according to the rules corresponding to the version number to determine whether the version has been upgraded) to judge whether the version meets the upgrade condition, and upgrade the version that has not completed the upgrade history after the upgrade condition is met.

@jwrct jwrct changed the title TIP-: Optimize hard fork upgrade TIP-555: Optimize hard fork upgrade Jun 9, 2023
@jwrct
Copy link
Contributor Author

jwrct commented Jun 9, 2023

For specific implementation details, please refer to the PR: feat(hardFork): optimize hard fork logic by chengtx01 · Pull Request #5247 · tronprotocol/java-tron

@jwrct jwrct changed the title TIP-555: Optimize hard fork upgrade TIP-555: Hard fork logic optimization Jun 13, 2023
@jwrct jwrct changed the title TIP-555: Hard fork logic optimization TIP-555: Network upgrade logic optimization Jun 13, 2023
@Jamestepfoward
Copy link

It's quite complex, basically, you are describing four potential problems and giving solutions: 1. the percent calculation of SRs if number of SRs change, 2. the SRs may rollback its node version, 3. state arrays out of bound, 4. in some situation, some nodes that deserves upgraded did not upgrade.

Then I have some questions: a. will the number of SRs change actually? It is 27 now, and it is not a chain parameter. b. is problem 2, 3, 4 all caused by SR rollback its node version? c. would you explain the check method and pass method further?

@ufmannk
Copy link

ufmannk commented Jun 14, 2023

The network upgrade logic is too complicated.

@souppopnix
Copy link

It's quite complex, basically, you are describing four potential problems and giving solutions: 1. the percent calculation of SRs if number of SRs change, 2. the SRs may rollback its node version, 3. state arrays out of bound, 4. in some situation, some nodes that deserves upgraded did not upgrade.

Then I have some questions: a. will the number of SRs change actually? It is 27 now, and it is not a chain parameter. b. is problem 2, 3, 4 all caused by SR rollback its node version? c. would you explain the check method and pass method further?

I have the same question about SR number change, though I think the solution seems rational.

For other issues, one SR downgrades or does not have latest version will cause all SRs reset to the previous version? This should be fixed.

@jwrct jwrct closed this as completed Jun 14, 2023
@jwrct jwrct reopened this Jun 14, 2023
@jwrct
Copy link
Contributor Author

jwrct commented Jun 14, 2023

It's quite complex, basically, you are describing four potential problems and giving solutions: 1. the percent calculation of SRs if number of SRs change, 2. the SRs may rollback its node version, 3. state arrays out of bound, 4. in some situation, some nodes that deserves upgraded did not upgrade.

Then I have some questions: a. will the number of SRs change actually? It is 27 now, and it is not a chain parameter. b. is problem 2, 3, 4 all caused by SR rollback its node version? c. would you explain the check method and pass method further?

At present, the number of SRs on the main network will not change, but it is likely to change on the private network. Only case 1 of problem 2 is caused by SR rollback node version. The check method means that all SRs have been upgraded before the version upgrade is considered complete, while the pass method judges whether the version upgrade is complete according to the rules corresponding to the version, for example, the number of upgraded SRs reaches 80% of all SRs.

@jwrct
Copy link
Contributor Author

jwrct commented Jun 14, 2023

It's quite complex, basically, you are describing four potential problems and giving solutions: 1. the percent calculation of SRs if number of SRs change, 2. the SRs may rollback its node version, 3. state arrays out of bound, 4. in some situation, some nodes that deserves upgraded did not upgrade.
Then I have some questions: a. will the number of SRs change actually? It is 27 now, and it is not a chain parameter. b. is problem 2, 3, 4 all caused by SR rollback its node version? c. would you explain the check method and pass method further?

I have the same question about SR number change, though I think the solution seems rational.

For other issues, one SR downgrades or does not have latest version will cause all SRs reset to the previous version? This should be fixed.

Yes, this problem will be fixed in the next version.

@jwrct
Copy link
Contributor Author

jwrct commented Jun 16, 2023

The network upgrade logic is too complicated.

@ufmannk Actually, the core logic behind the network upgrade of Java-Tron is relatively easy to understand. The network upgrade is considered complete when the majority of SRs have completed their upgrade, and then specific protocol upgrades are controlled through proposals. Unfortunately, some special scenarios were not handled properly in the past, but this optimization will solve these issues, which is the focus of this article. Looking at the network upgrade of Java-Tron from the perspective of this TIP may give the impression that it is complex.

@jwrct
Copy link
Contributor Author

jwrct commented Jul 3, 2023

Close this issue as it is implemented by GreatVoyage-v4.7.2.
Check TIP detail at TIP-555
Check implementation PR at tronprotocol/java-tron#5247, tronprotocol/java-tron#5268

@jwrct jwrct closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants