Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spatial index memory while building R-Tree #13251

Merged
merged 1 commit into from
Aug 22, 2019

Conversation

viczhang861
Copy link
Contributor

@viczhang861 viczhang861 commented Aug 19, 2019

Update memory usage during construction of rtree

Before this change: large spatial query fail with internal communication error
After this change : same query failed with message "Query exceeded distributed user memory limit of 5TB"

== RELEASE NOTES ==

General Changes
-----------------
* Improve memory tracking for geospatial query indexing and prevent worker node down due to out of memory

// update memory size when 1000 more leaf nodes are inserted
if (position % 1000 == 999) {
localUserMemoryContext.setBytes(estimatedLeafNodesSize);
}
}

rtree.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the trailing geometries after the last localUserMemoryContext, and the cost of the built rtree? For the JTS rtree this is quite significant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead of non-leaf node is about 10% of leaf nodes. After SRTree is built, a final size check is already done on all nodes at all levels, which includes trailing geometries.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viczhang861 Vic, thanks for making this change. Updating memory usage while building the R-Tree will help avoid JVM OOMs and allow for proper memory cap enforcement. I'm curious how this is done for regular hash table. CC: @nezihyigitbasi

The commit title Monitor spatial index memory usage is misleading because this change doesn't do anything about the monitoring. I enabled the monitoring logic by starting to send timely updates to memory usage. I'd change the commit message to:

Update memory usage while building R-Tree

estimatedLeafNodesSize = estimatedLeafNodesSize + ENVELOPE_INSTANCE_SIZE + geometryWithPosition.getEstimatedMemorySizeInBytes();

// update memory size when 1000 more leaf nodes are inserted
if (position % 1000 == 999) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to update the memory every so many bytes (or megabytes) rather than every so many objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 100 MB? for a cluster with 600 nodes, this means > 60GB total user memory change per update.

@mbasmanova
Copy link
Contributor

@viczhang861 Vic, is there a way to test this change in a unit test? CC: @nezihyigitbasi

@viczhang861 viczhang861 force-pushed the spatial-query-memory branch from 718e21f to 0ed7ee1 Compare August 19, 2019 18:58
@viczhang861 viczhang861 changed the title Monitor spatial index memory Update spatial index memory while building R-Tree Aug 19, 2019
@viczhang861 viczhang861 requested a review from mbasmanova August 19, 2019 20:11
Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will end up doing what we want in terms of aborting if there isn't enough memory due to how setBytes works.

This is how I had to fix it https://github.com/prestodb/presto/compare/master...aweisberg:fix_spatial_index_accounting?expand=1

@arhimondr Can probably confirm, but basically you have to yield to the driver loop with the memory accounted for as each page is processed in order to get the query to halt if it uses too much memory.

@viczhang861
Copy link
Contributor Author

@aweisberg This diff was tested in a real cluster, I can share you query ids offline if you are interested

@aweisberg
Copy link
Contributor

I'm also working on the memory accounting for the hash table associated with PagesIndex. It's not as bad because it tends to be smaller, but it's also built in one giant go with no memory accounting.

@mbasmanova
Copy link
Contributor

you have to yield to the driver loop with the memory accounted for as each page is processed in order to get the query to halt if it uses too much memory.

@nezihyigitbasi Nezih, could you clarify how memory accounting/limit enforcement works?

@aweisberg
Copy link
Contributor

aweisberg commented Aug 20, 2019

@viczhang861 Can you try 20190806_050942_00000_yrda6 and see how it does?

@viczhang861
Copy link
Contributor Author

you have to yield to the driver loop with the memory accounted for as each page is processed in order to get the query to halt if it uses too much memory.

I don't understand what this means, but let me explain why this PR is working.

SpatialIndexBuilderOperator updates memory when each page is added (query will be killed if memory budget exceeds at this step) or when R-Tree build is done (query could also be killed at this step). However, memory is not updated when R-Tree is built. Memory will increase significantly when data is deserialized and geometry objects are created by JTS library.

We don't need to worry about setBytes, which is already doing the right thing, otherwise, query will not be killed in above mentioned steps. For this reason, unit test on behaviors of calling setBytes is not the scope of this PR.

@viczhang861
Copy link
Contributor Author

@aweisberg This is exactly the query I tested. Query id after fixing: 20190819_051830_00005_hpdfs

@aweisberg
Copy link
Contributor

I don't follow why this works. Doesn't mean that it doesn't though.

That query builds a 180+ gigabyte tree on one node that causes an OOM. Calling setBytes won't fix that because it should still go ahead and build the entire tree and OOM the worker. Even if the coordinator detects the query exceeding the limits and returns an error to the client saying it killed the query (is killing asynchronous?) we still want to avoid OOMing the worker.

The coordinator can't kill the query as it is building the tree. The operator is in a loop that won't be interrupted as far as I can tell.

I checked the logs and couldn't really tell if the workers had OOMed. There are eventually a spate of worker restarts but exit code is 0 and there is no message. This is not typical of what I have seen for OOMs, but maybe this cluster was configured differently?

If it does work, great! I just want to make sure we aren't missing something because it's an unexpected result to me.

@viczhang861
Copy link
Contributor Author

viczhang861 commented Aug 20, 2019

@aweisberg I monitored heap memory usage on worker nodes before and after fix, the worker nodes didn't crash after fix. I assume that is the result of killing the query due to "Exceeded global user memory limit of ***"

@viczhang861
Copy link
Contributor Author

@aweisberg Another good signal is that worker didn't restart after the fix.

GeometryWithPosition geometryWithPosition = new GeometryWithPosition(ogcGeometry, partition, position);
rtree.insert(getJtsEnvelope(ogcGeometry, radius), geometryWithPosition);

estimatedMemorySizeInBytes += ENVELOPE_INSTANCE_SIZE + geometryWithPosition.getEstimatedMemorySizeInBytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an array of references collecting the geometries inside the rtree. It's not 100% accurate, but can you add size of an object reference each time as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be more specific about which array ? I only include leaf nodes into consideration, the total size will be off by 11%.

Copy link
Contributor

@aweisberg aweisberg Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to be as precise as we can?

The rtree before you invoke build is wrapping what you provide in an ItemBoundable and stashing it in a big array which it then discards when the rtree is built.

This is a slightly more accurate accounting https://github.com/aweisberg/presto/blob/fix_spatial_index_accounting/presto-main/src/main/java/com/facebook/presto/operator/PagesSpatialIndexBuilder.java#L143

Copy link
Contributor Author

@viczhang861 viczhang861 Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11% more nodes are generated when insertion is done. I will add ItemBoundable. What is the 8-bytes array? reference to new element in ItemBoundable array?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What file/line in the rtree library generates more nodes on insertion? When I looked at the rtree code it just wraps it and stores it in an array.

Do you mean it uses more after it builds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@viczhang861 viczhang861 Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it uses more after it builds?

Yes, when insertion is done, parent nodes will be created after final build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// update memory size when estimated change is more than 100 MB
if (estimatedMemorySizeInBytes - lastUpdatedMemorySizeInBytes >= 100_000_000) {
localUserMemoryContext.setBytes(estimatedMemorySizeInBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dropping the memory used by the PagesIndex itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I included PagesIndex in

@viczhang861 viczhang861 force-pushed the spatial-query-memory branch from 0ed7ee1 to 687d496 Compare August 20, 2019 18:50
@viczhang861 viczhang861 requested a review from aweisberg August 20, 2019 20:03
Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viczhang861 Looks good overall % some comments.

GeometryWithPosition geometryWithPosition = new GeometryWithPosition(ogcGeometry, partition, position);
rtree.insert(getJtsEnvelope(ogcGeometry, radius), geometryWithPosition);

// 8-bytes for reference to the new element in itemBoundables array of STRtree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail of the STRtree and therefore shouldn't be used here. How about just tracking size of the envelope and geometry?

private static final int MEMORY_USAGE_UPDATE_INCREMENT_BYTES = 100 * 1024 * 1024;   // 100 MB

            addedSizeInBytes += ENVELOPE_INSTANCE_SIZE + geometryWithPosition.getEstimatedMemorySizeInBytes();

            if (addedSizeInBytes >= MEMORY_USAGE_UPDATE_INCREMENT_BYTES) {
                localUserMemoryContext.setBytes(recordedSizeInBytes + addedSizeInBytes);
                recordedSizeInBytes += addedSizeInBytes;
                addedSizeInBytes = 0;
            }

@viczhang861 viczhang861 force-pushed the spatial-query-memory branch from 687d496 to e107042 Compare August 22, 2019 14:48
@viczhang861 viczhang861 force-pushed the spatial-query-memory branch from e107042 to f4367da Compare August 22, 2019 15:36
@viczhang861
Copy link
Contributor Author

@mbasmanova Thanks for review, all comments addressed and rebased to be compatible with FlatBush Rtree

@viczhang861 viczhang861 requested a review from mbasmanova August 22, 2019 15:40
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viczhang861 Looks good to me.

@mbasmanova mbasmanova merged commit 8492bfa into prestodb:master Aug 22, 2019
@arhimondr
Copy link
Member

Agree with @aweisberg , an Operator should return control to the engine between memory usage updates to ensure the memory manager works. No idea how it works.

@nezihyigitbasi , you have the most expertise in the memory management. Your input here will be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants