-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow marking an index as cold and moving its in-memory items to disk #23546
Comments
While this sounds like an appealing idea there are quite a bit of caveats attached to it. Essentially what this request is about is to make opening an index reader in a lazy fashion once it's requested. Yet, if you take a step closer what that means is that we need to load the in-memory datastrucutres once the reader is accessed. Lets say you have 2k lazy indices in your cluster and you call |
@s1monw, the problem is that the current way to deal with lots of cold indices is to close/open them. So if I have a custom built auto-opener, then I can still have the issue where lots of indices are opened at the same time and run out of memory. So I and anyone who would use this feature already have to deal with that case. Anyone who would use this feature would have to deal with that issue and you can put "caveats" around it. But the open/close mechanism is more likely to cause issues in the long run because it probably takes longer to open an index versus warm up a lazy reader, the cluster goes red while the index is opening (and might not even recover if there are allocation issues), and closed indices don't automatically close when a node goes down. |
I disagree on this statement. Lemme explain, when you open / close an index that is and admin like operation that isn't executed by accident and it also won't apply wildcards. It's also executed by a privileged user (I hope) such that there is more control over this in general. Yet, if we allow this to be executed by
we are currently in the process of designing a better open / close implement that is also replicated etc. that might also make it easier to allocate these indices and will prevent the cluster from blinking. (go red for a while) I think we need to design that is safe and gives you a less hard time here. Yet, we are not there yet. |
@s1monw, I think what you are missing is that for those of us that need either this feature or #10869 will implement our own open/close mechanism that can be executed by accident and can be applied to wildcards. We need to be able to have lots of data available in ES without requiring a ton of RAM. So, the only mechanism is to close/open the indices and to do it automatically when searches are made. In the end, we need a way to reduce the RAM usage for really cold indices but still allow those indices to be easily searchable. |
I agree but we can't just add some risky feature just fix a problem. we need a sustainable solution that deals with it. This suggestion has problems that I am not willing to sign up for as a compromise. If we make compromises here they need to be safe! |
I will reopen this to use it as a discuss area for now. |
how about add a breaker for in-memory datastructures when open index reader? |
If we do this, I'd like to make sure we are addressing an actual problem rather than the symptom of some mis-configuration (eg. too many shards in a cluster). Also how much memory are we talking about, and where is it spent? Maybe the right fix is to make Lucene indices more memory-efficient. |
@jpountz, improving memory usage of Lucene could work for me as well. This is really a cost issue. It is really cheap to add more disk for cold storage indices (we have it over iSCSI which is slow but these are indices that only used once a day at most). But to add additional RAM, it costs a bit. https://discuss.elastic.co/t/why-is-my-heap-usage-always-high/45017/9 is my discuss question to see if there is something I can do in my configuration. The last reply from Mark Walkom is that I just need to add new nodes and that increases our overhead. |
To me the issue is that there are 17745 shards for only 5.6 billion docs. I'd recommend using the rollover and shrink APIs to manage indices rather than daily indices in order to better utilize resources. |
@jpountz, you need to read my last comment there. We are now doing rollover that is based on size instead of daily. As of today, we have 7300 shards with 55 billion docs that takes up 180TB of disk. |
180TB for 7300 shards means that shards are only 25GB on average, I think you could still aim at larger shards. That said, I agree this is the kind of scale that makes keeping these indices open quite costly. I don't have good ideas how to improve this, but I don't think opening indices on demand is a good solution. |
@jpountz, yeah, we've been working on that. All of our most recent indices should be averaging around 40GB per shard. It takes a while to combine older indices. |
@s1monw, is the design for the better open / close mechanism in 5.x or 6.x? We are trying to deal with opening/closing indices and handling rebalancing and node removal better. That seems to be the only way to accomplish this unless there has been any more discussion about this? |
This is similar to #10869 but instead of closing/reopening indices, it would be nice to just have it flush its in-memory items to disk. Looking at the output of /{index}/_segments?verbose=true shows a lot of BlockTreeItems, FST, and postings that are in-memory. For a really old index, having the ability to reduce its in-memory footprint while still leaving it open would be beneficial. Going through a close/reopen process causes the cluster to go red a lot and if a node goes down, we have to make sure that any closed indices on that dead node are re-opened so that the replicas get moved to other nodes.
This could be a call like /_cache/clear that we have to run periodically. So the index itself doesn't have to worry about whether it is "cold". If the new call is run, the index moves the in-memory data to disk and the next time a query hits it, it brings the data back to memory. Then later, we would manually run the call again to move it back to disk.
The text was updated successfully, but these errors were encountered: