-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose spill configuration to users #2119
Comments
We shouldn't actually need to configure this for scalar indices. It is a bug that we are getting a The bug is because I had incorrectly assumed that
Unfortunately, it turns out that |
I'm getting a Resource Exhausted error when trying to construct a scalar index on a large dataset (>1TB) despite have more than enough RAM and ample disk space. Is this the same issue?
|
Yes, that looks like the same issue. The root cause is unfortunately an upstream issue: apache/datafusion#10073 It will give you this error no matter what size the spill pool has been configured to. The only work around right now is to bypass spilling completely by setting |
Thanks @westonpace! I just found this workaround independently in the lance docs, and it works for me. FYI s a new user to lancedb its a little tricky to figure out what part of the lance documentation carries over to lancedb tables, so I've had to do a fair bit of guess and check. (Another example of this is Thanks again for the work around! |
You can configure the memory limit for spilling when making vector indices (#1702) but not for scalar indices (#2043). We should make them both configurable from the same place.
The text was updated successfully, but these errors were encountered: