Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: use a queue of open ports for tests #14893

Merged
merged 13 commits into from
Feb 6, 2023
Merged

Conversation

facundomedica
Copy link
Member

@facundomedica facundomedica commented Feb 2, 2023

Description

In this PR we first store a list of 200 free ports and then use them one by one.

Closes: #14837


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

server/api/server.go Outdated Show resolved Hide resolved
@facundomedica
Copy link
Member Author

After some experimentation there are a couple of ways we can improve this:

  1. Modify FreeTCPAddr to return a closing function, so we keep them open and then close them all together right before calling startInProcess(). This alone should improve the situation given that FreeTCPAddr gets called 5 times for the first validator and then 2 more for every other validator.
  2. Stop using FreeTCPAddr whenever possible. We should be able to pass port=0 and read back the chosen port. This would require some changes both in the SDK and in Comet (You can get the port back from Comet but it's never updated, it will always return 0).

Let's give the first option a go and see how that goes.

@github-actions github-actions bot added C:CLI C:x/genutil genutil module issues labels Feb 3, 2023
@facundomedica
Copy link
Member Author

sync.Pool doesn't seem to be useful here, elements in the pool get deleted before they are used (probably GC'ed) and the New() function would lead us to the same issue.

The solution here is pretty simple, just a buffered channel. Also there's a mutex that only allows to run a single network at a time, so that means port collision happens during the same execution and not between multiple executions of network.New.

Some tests use t.Parallel() along with network.New() but these are incompatible as t.Parallel() will be pretty much ignored due to the mutex.

I've tried removing the mutex lock, but it looks like it can cause some issues with almost-flaky tests (usually timing issues). Would be useful to investigate if we can run multiple tests/networks at the same time to improve our CI times (maybe add some limit in the amount to avoid saturating resources).

@facundomedica facundomedica changed the title refactor!: remove as many usages of FreeTCPAddr as possible test: use a queue of open ports for tests Feb 5, 2023
@facundomedica facundomedica marked this pull request as ready for review February 5, 2023 19:20
@facundomedica facundomedica requested a review from a team as a code owner February 5, 2023 19:20
@github-prbot github-prbot requested review from a team, julienrbrt and atheeshp and removed request for a team February 5, 2023 19:20
@julienrbrt julienrbrt added the backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release label Feb 6, 2023
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK.

@tac0turtle
Copy link
Member

We should remove network.new. This will happen with the integration and e2e framework work.

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UtAck. Nice solution

@facundomedica facundomedica enabled auto-merge (squash) February 6, 2023 13:09
@facundomedica facundomedica merged commit 5e57be0 into main Feb 6, 2023
@facundomedica facundomedica deleted the facu/remove-freetcpaddr branch February 6, 2023 13:29
mergify bot pushed a commit that referenced this pull request Feb 6, 2023
Co-authored-by: Marko <[email protected]>
(cherry picked from commit 5e57be0)

# Conflicts:
#	testutil/network/network.go
#	testutil/network/util.go
#	x/genutil/client/cli/init_test.go
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't how I imagined the solution to look like honestly. IMO FreeTCPAddr should be the thing that is protected against collisions.

@facundomedica
Copy link
Member Author

@alexanderbez wdym? That we should move the port pool inside of FreeTCPAddr?

@alexanderbez
Copy link
Contributor

@alexanderbez wdym? That we should move the port pool inside of FreeTCPAddr?

https://go.dev/play/p/wHZzDm_ccZK

@prettymuchbryce
Copy link
Contributor

prettymuchbryce commented Feb 7, 2023

Hey all. I just wanted to point out something non-obvious about the way go test works, and why using any package-level locking mechanism such as waitgroups, mutexes, etc doesn't solve this problem.

When go tests are invoked with go test ./..., each package tested which imports testutil/network will initialize the testutil/network package as a separate instance. Therefore, package level locks like this one don't work. Each package's tests will have a reference to a different lock. You might assume the package itself is "global" across the test run, and that it will only be initialized once, but this is not the case.

I'd encourage you to test this out yourself to see what I mean. For example, add a Println into the init function added in this PR and then run your test suite via go test ./... -v (or add the -v flag here and run make test) to see how many times the init function is called. Also note that portPool will be empty at the beginning of each init call.

For this reason I don't think the current solution solves the issue, as the init function will be called multiple times in parallel by different packages during the test run. Each of those init functions retrieve 200 ports, and then immediately release them, leaving them available for other packages running in parallel to acquire the same ports.

As a workaround for these issues, we ended up adding a file-level lock around network.New.

@alexanderbez
Copy link
Contributor

Intersting point @prettymuchbryce you're right. I didn't realize Go handles tests this way. Indeed this makes both my suggestion to the PR and the original PR itself useless.

Is there a cleaner way to guarantee package-level synchronization across tests?

@facundomedica
Copy link
Member Author

As a workaround for these issues, we ended up adding a file-level lock around network.New.

I think this would be a good solution, I'll close the backport to v0.47 for now and reopen #14837. I can implement it although would be cool to see your solution @prettymuchbryce

Is there a cleaner way to guarantee package-level synchronization across tests?

I don't have an answer for this, will do some research. The obvious answer right now is the filesystem, either directly or through a db.

@prettymuchbryce
Copy link
Contributor

I can implement it although would be cool to see your solution @prettymuchbryce

Our solution uses file-level locks. This solution works, but the disadvantage is that it could potentially leave temporary files on the system if the process is killed or crashes during the test.

import (
  "github.com/gofrs/flock"
)

var fileLock = flock.New("/tmp/test-cosmos-network.lock")

func New(t *testing.T, configs ...network.Config) *network.Network {
	err := fileLock.Lock()
	require.NoError(t, err)
	defer func() {
		err := fileLock.Unlock()
		require.NoError(t, err)
	}()

	c := make(chan os.Signal, 1)
	signal.Notify(c, os.Interrupt, syscall.SIGTERM)
	go func() {
		for range c {
			fileLock.Close()
			return
		}
	}()

	cfg := network.DefaultConfig(app.NewTestNetworkFixture)
	net, err := network.New(t, t.TempDir(), cfg)
	if err != nil {
		panic(err)
	}

	t.Cleanup(net.Cleanup)
	return net
}

This is why I suggested the approach of listening on port 0 with the gRPC server directly, rather than the existing 2-step process which reserves a port, and then immediately releases it.

@alexanderbez
Copy link
Contributor

As a workaround for these issues, we ended up adding a file-level lock around network.New.

Why have you not opted for the zero-port allocation yourself?

@facundomedica
Copy link
Member Author

I've explored passing the 0 as a port. The issues with that are:

  • You need to expose a new method or attribute to the server structs that return the chosen port, this is pretty easy on Cosmos SDK
  • Modify CometBFT to do the same. There are ways of getting the host and port but they are not "dynamic", meaning they just return whatever you initialized it with. The solution here would be to update those with whatever the listener returns.

So it's doable, but a bit cumbersome. Would it make sense to go after this?

tsenart pushed a commit to meka-dev/cosmos-sdk that referenced this pull request Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release C:CLI C:x/genutil genutil module issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FreeTCPAddr returns duplicate ports, results in bind: address already in use errors
5 participants