Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeds up fate lock acquisition #5262

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

keith-turner
Copy link
Contributor

Stores the lock data for fate locks in the zookeeper node name instead of the zookeeper data for the node. Ran some local performance test with hundreds of fate operations and saw lock times go from 750ms to 15ms.

fixes #5181

Stores the lock data for fate locks in the zookeeper node name instead
of the zookeeper data for the node.  Ran some local performance test
with hundreds of fate operations and saw lock times go from 750ms to
15ms.

fixes apache#5181
}
}

// TODO change data arg from byte[] to String.. in the rest of the code its always a String.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a follow on issue for this and remove the TODO before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TODO and #5264. In the TODO I was thinking of using String, but realized using more concrete types would be better.

@ctubbsii ctubbsii modified the milestones: 2.1.4, 4.0.0 Jan 15, 2025
List<FateLock.FateLockNode> lockNodes =
FateLock.validateAndWarn(fLockPath, zr.getChildren(fLockPath.toString()));

lockNodes.sort(Comparator.comparingLong(ln -> ln.sequence));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return type be a sorted set instead of a list that you sort later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to a sorted set in 3f25ce0


List<TabletId> tabletIds;
// start a compaction on each tablet
try (var tablets = client.tableOperations().getTabletInformation(table1, new Range())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this test initially to check performance, but realized its a good test of many compaction on the same table w/ different config also.

I did not initially see a performance difference w/ this test. Looking into it the reason was the compaction were finishing too quickly, so only 8 or 9 compaction fate operations were active at a time. For the performance bug to be observed needed many more active fate operations. Locally I modified the test to kill the compactor processes and start 1K compaction fate operations. With those changes I could see orders of magnitude diffs for fate lock acquisition time.

Comment on lines 83 to 86
Preconditions.checkArgument(nodeName.startsWith(PREFIX) && nodeName.charAt(len - 11) == '#',
"Illegal node name %s", nodeName);
sequence = Long.parseLong(nodeName.substring(len - 10));
lockData = nodeName.substring(PREFIX.length(), len - 11);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, while this is probably unlikely to happen, the sequential numbering can overflow causing a negative to occur, which I think would cause a problem with this parsing instead of splitting on #.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handing and test for negative seq numbers in 2309b9b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of FateLock code for the case when there are lots of locks.
3 participants