Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LocalNet] Execution Height and Connection Errors #433

Closed
6 tasks
gokutheengineer opened this issue Jan 8, 2023 · 5 comments
Closed
6 tasks

[LocalNet] Execution Height and Connection Errors #433

gokutheengineer opened this issue Jan 8, 2023 · 5 comments
Assignees
Labels
bug Something isn't working - expected behaviour is incorrect consensus Consensus specific changes core Core infrastructure - protocol related

Comments

@gokutheengineer
Copy link
Contributor

gokutheengineer commented Jan 8, 2023

Objective

I am experiencing the following unexpected behaviors when running LocalNet:

Issue 1. When LocalNet is run on main branch with the current commands in the README, and running the fresh LocalNet after executing the first TriggerNextView command and running PrintNodeState the height is printed as 2, which is supposed to be 1.

Issue 2. Also, there is one other weird error I am receiving, but I am not sure if it is only on my setup. After doing few iterations of TriggerNextView, PrintNodeState, and then ResetToGenesis one or some of the nodes refuse connections.

Origin Document

Issue 1:
Screen Shot 2023-01-08 at 11 33 45

Issue 2:
Screen Shot 2023-01-08 at 15 57 52

Goals

  • Identify the reason of the behavior and fix the problem (if exists)
  • Enable smooth execution of LocalNet

Deliverable

  • LocalNet runs as expected

Non-goals / Non-deliverables

  • Change business logic

General issue deliverables

  • Update the appropriate CHANGELOG(s)
  • Update any relevant local/global README(s)
  • Update relevant source code tree explanations
  • Add or update any relevant or supporting mermaid diagrams

Testing Methodology


Creator: @gokutheengineer
Co-Owners:

@gokutheengineer gokutheengineer added the bug Something isn't working - expected behaviour is incorrect label Jan 8, 2023
@gokutheengineer gokutheengineer self-assigned this Jan 8, 2023
@Olshansk
Copy link
Member

Olshansk commented Jan 9, 2023

Issue 1. When LocalNet is run on main branch with the current commands in the README, and running the fresh LocalNet after executing the first TriggerNextView command and running PrintNodeState the height is printed as 2, which is supposed to be 1.

Check the state and you can also do make db_drop && make docker_wipe_nodes to clear the state and restart local net from scratch.

Inside of consensus/module.go we have this function:

// TODO: Populate the entire state from the persistence module: validator set, quorum cert, last block hash, etc...
func (m *consensusModule) loadPersistedState() error {
	persistenceContext, err := m.GetBus().GetPersistenceModule().NewReadContext(-1) // Unknown height
	if err != nil {
		return nil
	}
	defer persistenceContext.Close()

	latestHeight, err := persistenceContext.GetLatestBlockHeight()
	if err != nil || latestHeight == 0 {
		// TODO: Proper state sync not implemented yet
		return nil
	}

	m.height = uint64(latestHeight) + 1 // +1 because the height of the consensus module is where it is actively participating in consensus

	m.nodeLog(fmt.Sprintf("Starting consensus module at height %d", latestHeight))

	return nil
}

@Olshansk Olshansk moved this to In Progress in V1 Dashboard Jan 9, 2023
@Olshansk
Copy link
Member

Olshansk commented Jan 9, 2023

Issue 2. Also, there is one other weird error I am receiving, but I am not sure it is only in my setup. After doing few iterations of TriggerNextView, PrintNodeState, and then ResetToGenesis one or some of the node refuses connections.

Quoting @deblasis from discord:

@gokutheengineer
Copy link
Contributor Author

gokutheengineer commented Jan 10, 2023

make db_drop && make docker_wipe_nodes

This command doesn't solve this issue, because I execute make docker_wipe before restarting the local net every time, which deletes the containers, images and volumes. Such that, as expected, running make db_drop returns

Error: No such container: pocket-db
make: *** [db_drop] Error 1

Therefore, I believe the persistence module should not be loading from an existing state. I will investigate further to see what might be going wrong. I also tried executing make db_drop && make docker_wipe_nodes first and make docker_wipe later but it didn't help, as expected.

Could you please confirm you are able to run Local Net on the latest main branch? If that is the case, then something is wrong with my local env.

@gokutheengineer
Copy link
Contributor Author

@Olshansk I think this can be closed, since the following issue are dedicated to problems outlined in issue: #441 (for errors starting at height 2), and PRs #403 and #425 (connection issues) which are already merged. Wdyt?

@Olshansk
Copy link
Member

Closed per @gokutheengineer's comment above 👌

@github-project-automation github-project-automation bot moved this from In Progress to Done in V1 Dashboard Jan 27, 2023
@Olshansk Olshansk added core Core infrastructure - protocol related consensus Consensus specific changes labels Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working - expected behaviour is incorrect consensus Consensus specific changes core Core infrastructure - protocol related
Projects
Status: Done
Development

No branches or pull requests

2 participants