Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reference count in pages index #10634

Closed

Conversation

sopel39
Copy link
Contributor

@sopel39 sopel39 commented May 17, 2018

This addresses retained size accounting issue when deserialized pages are directly passed to PagesIndex.
This could happen in Order By operator, but also in other operators using pages index when task_concurrency=1.

Related to: #10337

for (int i = 0; i < channels.length; i++) {
Block block = page.getBlock(i);
if (eagerCompact) {
block = block.copyRegion(0, block.getPositionCount());
}
channels[i].add(block);
pagesMemorySize += block.getRetainedSizeInBytes();

block.retainedBytesForEachPart((object, size) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is that this is unbelievably expensive, which is why we only did this very sparingly. Did you do some performance analysis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should avoid using retainedBytesForEachPart as much as possible. The ultimate goal is to remove this interface and the reference count map if possible. Memory accounting dedup should happen somewhere else, or there is an elegant way to do so. Also, if we have to use this interface, run benchmarks with huge inputs (e.g., 1 billion distinct blocks or something like that) to show the effect.

Copy link
Contributor Author

@sopel39 sopel39 May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification. I was wondering if there are some issues with using retainedBytesForEachPart method.

Did you do some performance analysis?

I will add some benchmarks of building/compacting PagexIndex with pages containing high number of channels.

Also, if we have to use this interface, run benchmarks with huge inputs (e.g., 1 billion distinct blocks or something like that) to show the effect.

This ReferenceCountMap exists only for the duration of the page processing (it should be the same in compact() too), so the overhead factor should be linear with the number of page channels. I will add such benchmarks.

@stale
Copy link

stale bot commented Apr 3, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants