-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate and remove Multiple Data Paths #71205
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
This commit adds a node level deprecation log message when multiple data paths are specified. relates elastic#71205
@rjernst |
This commit adds a node level deprecation log message when multiple data paths are specified. relates #71205
This commit adds a node level deprecation log message when multiple data paths are specified. relates elastic#71205
This commit adds a deprecation note to the multiple data paths doc. It also removes mention of multiple paths support in the setup settings table. relates elastic#71205
This commit adds a deprecation note to the multiple data paths doc. It also removes mention of multiple paths support in the setup settings table. relates #71205
This commit adds a deprecation note to the multiple data paths doc. It also removes mention of multiple paths support in the setup settings table. relates #71205
@willemdh When removing features, we carefully weigh the benefit and ubiquitousness of a feature versus the cost of continuing to support that feature. In this case, multiple-data-paths is a feature that has a very high cost (numerous bugs and deficiencies in its design), relatively few users, and most importantly, better alternatives that are standard outside of Elasticsearch. While we understand it can be frustrating to need to change deployment methodologies, we believe that effort will ultimately improve these deployments of Elasticsearch by relying on industry standards, thereby allowing developers to focus time on other improvements like frozen tier, index lifecycle management, or ARM hardware support (these are just a few random improvements we recently worked on). Please do note that MDP is only being deprecated here, and won't be removed until 8.0, so there is still plenty of time to work on a planned migration away from this legacy feature. |
Thanks for taking the time to answer my concern @rjernst , but we are still not happy about this. As a 5+ year Elastic user, this decision (and a few others I'm not going into) is making me wonder what are Elastics other plans for the future. What other core functionalities will be deprecated...? If you would have said deprecated by Elastic 9, I'd understand, but Elastic 8 is not that far away imho. It's already barely doable to keep up with the large amount of breaking changes that come with every major update, because not updating is not an option, considering almost every major release had included security bug fixes that were not always backported, and new security bugs. Also, if I remember right last year an Elastic support engineer advised us to use mdp on our nvme disks (please don't ask me to look that up..., our older nodes do not have mdp, at a certain point we switched from SSD to NVME and we raised that question somewhere, as our physical raid controllers did not support nvme) These physical nodes are supposed to run for like 5 years, and now wll have to redeploy in less then a year, imho a waste of time, as mdp worked fine here afaik. These nodes have up to 10 TB data each, redeploying takes careful planning and timing. Why not keep it deprecated during Elastic 8 and log a deprecation warning during the full Elastic 8 lifecycle? Even now I cannot find anything in https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#path-settings about mdp being deprecated or advice not to use it. The warnign about rebalancing does not count imho, as when you use ILM on all data, shards are being moved to other nodes anyway after x days.. Please please Elastic can you think just a little more on the impact on your (paying) customers when you decide to change / break things that worked for years.... Just so you know I am not the only one who thinks one of the greatest disadvantages of Elastic is the rate things are changing / breaking. |
Hi @willemdh, thanks for sharing your thoughts on this topic. We work in the open and we expect our users to be an active part of the discussion. As @rjernst mentioned, the maintenance cost of this feature and managing confusion for our users has a pretty high cost to the team, and we performed a serious analysis on how our users (and paying customers) will be affected by this change. I also want to be sure that you are aware that we do provide security bugfixes for "the latest minor of the previous major" (6.8 today, 7.latest when 8.0 will be out), so running Elasticsearch 7.x after 8.0 will be released will not expose you to security risks. We suggest to plan the upgrade as soon as possible, but we don't want to put our users at risk if they need more time to move to the new version. If you feel that there is a security bugfix that has not been backported, feel free to open an issue and we'll go through it case by case. We'll monitor this issue, and if we see an unexpected strong pushback from a relevant number of users, we're still in time to reconsider our decision before 8.0. |
into errors. It effectively removes support for multiple data paths. relates elastic#71205
This commit converts the deprecation messages for multiple data paths into errors. It effectively removes support for multiple data paths. relates #71205
* Adjust Multiple data paths deprecation to warning (#79463) Since Multiple Data Paths support has been added back to 8.0, the deprecation is no longer critical, as the feature will not be immediately removed after 7.16. This commit adjusts the deprecation level to indicate it is a warning, intead of a critical deprecation that would otherwise need to be addressed before upgrading. relates #71205 * fix message
Leaving a comment to echo what other people have said on this issue. We use MDP in our clusters and need it to handle our current load. Removing this feature is definitely something that is going to cause us issues going forward. |
This is 50% great news. Thank you Elastic for considering this. I can safely upgrade my cluster to 7.16.2 with multiple data path on clusters. Thank you again. |
If I understand this correctly, I won't be able to attach more than one disk per Elasticsearch node in the future. If that's right, it'll cause serious problems for me, since in GCP I have limited (and costly) options. To have a 30GB spare memory for Java heap, I need a fairly big instance. Then to keep my VM costs down, I'm attaching multiple 2TB disks that are housing my data, since it'd skyrocket my expenses to create a hot-warm-cold cluster. I need to cap the disk at 2TB because after that I/O performance will drop. So with that change, I either suffer from I/O rate deterioration or have to create a much expensive cluster. |
https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#path-settings so given that MDP are still mentioned in the 8.0 docs (albeit as "deprecated" and slated for removal), does this feature still work in 8.0? If so, when is it slated for removal, or have there been any more recent discussions about its fate? I have always been a huge fan and vocal proponent of ElasticSearch, but it feels to me like the one-sided decision to remove this feature is creating a lot of stress and confusion for lots of users which is very sad to see. |
Yes, MDP is available (as a deprecated feature) in Elasticsearch 8.0. |
@bytebilly 2 weeks earlier I posted if I understand this topic correctly, I'm still not sure if my interpretation is correct. Could you please elaborate on alternative options where one needs to house hundreds of TBs of data but can't attach more than one disk to an instance? |
@st4r-fish you will not be able to specify multiple paths to handle data. If you want to discuss more about your scenario and possible solutions, I suggest you to engage on https://discuss.elastic.co/ (or to open a support case in case you have a support account), since those are the best places to get technical support from both our engineers and our community. |
Come on. You're phrasing it like it won't induce any significant change. It's totally false. LVM the way you'd suggest it is like a gigantic RAID0. When one has 3 disks, this could be considered, but on 24x4TB disks machines, it's plain suicide. And for 24x4TB disks, any RAID5/6/X method is just a disaster to happen (hello, URE). So you'll need a lot of split and parity and you'll lose quite some raw space. MDP as a logical JBOD is unique as it's not possible to achieve it's feature with any """"industry standard"""" methods you're trying to shape as an alternative. MDP is the only set where losing a disk is losing data if one doesn't have replication, but not losing all data, without having to pay one overly expensive license per disk. Of course, if you aim at selling one dockerish license per disk, this deprecation and removal of feature is quite "smart". Still waiting for real technical solutions, and, in the meantime, pondering whether we should move towards an alternative to elastic. |
I'd have thought the Github issue that's literally tracking the removal of MDP would be the best place to discuss this. It's not a "Technical support" issue. It's a "Why are you doing this to us?" issue. |
"So why are you doing this to us?" I see no reason why this feature 'has' to be removed..? Imho they could easily add a note to the docs that MDP is less supported instead of deprecating it completely. |
I have 10 data node, each with 4x2TB SSD. if I raid them then loosing one disk means whole node is lost. but if I use individual disk loosing one disk is very negligible. |
Hi @bytebilly , Unfortunately, the community isn't eager to discuss this scenario: Thank you! |
Then don't use RAID, use a Union Filesystem. Elastic has historically handled this internally by using the idea that "one node == one filesystem"; As far as I can tell MultipleDataPaths is an abstraction that basically presented multiple nodes (one per disk) to the problem. Shards in Elasticsearch have always been bound to a single "datapath" (go ahead and create an index with 1 shard and 0 replicas on a stand-along node with 4 disks, only 1 will ever get used!) Nothing is stopping you from re-using the old logic: you can start 4 copies of Elasticsearch on one node each with a single data path, then define an index that shards across all of those disks. Obviously this has complexity and "concerns" (which is what I think Elastic is looking to rid themselves of here!). A proper Union Filesystem would let you work around the MDP depreciation AND allow you to span a single shard across multiple disks. (This is configurable at the FS level, based on how the routing rules work) by creating a "spanned volume" that places files (in configurable grouping levels) on specific disks. Using (for example) mergerfs, the branches feature could be used to group a "shard" to a single disk, effectively replicating the prior depreciated behaviour. This approach LACKS the ability to define which disk the shard is allocated to, which is the advantage of running multiple instances of ES (you can then setup shard allocation strategies that limit what data lives where / etc) IMO: MDP depreciation is a good thing. It removes complexity from a block that can be implemented better by other, more focused, specialized products and allows the project to move forward with new features that are being held back supporting "broad" features like this. IMO: what would be even better was if an alternative replacement idea was well documented such that the community had a "common replacement path" for a widely used feature, but I get that doing so would effectively be a "hot potato" topic that few would want to take or maintain over multiple years. |
"you can start 4 copies of Elasticsearch on one node" ?? What kind of a workaround is that lol. Yes let's pay 4 times the amount we pay now, just because we want to remove some complexity for a feature which has been running perfectly fine for years...... |
Maybe you should give this a second thought. First except overlayfs, mergerfs and aufs, there are not so much maintained union fs, and overlay FS isn't suited to present a jbod as one filesystem except if you feature it as a RAID0, which means loosing all data when you lose one disk.
Nope. MDP is just "oh let's use all these folders instead of just one".
Indeed, but as you should know, generally any index has multiple shards and a cluster isn't made of just one index.
I guess you'd be happy to pay for the three additional licenses?
Either it's an overly complex thing that will be a mess, or it's basically a RAID0. Both are quite a huge waste of time.
Rebuilding a mergerFS when one disk fails is quite tedious (you have to extract the data from all remaining disks, recreate a mergerFS with the dead disk replaced, and reput the data on it) - and don't talk me about SnapRAID -, compared to replace one disk and let elastic rebalance the data, which elastic will still have to do if a disk dies in a mergerFS.
Still waiting for a relevant product here. And still waiting for an explaination about how it's not a huge waste of time for the one setting their cluster up, with much maintenance cost added in the battle. |
While I don't see any suggested solution that wouldn't cause multiplying the costs or leave I/O performance in the dust (large disks), in the Elastic forum it seems that "multiple data paths are not supported". |
We already raise this issue. but as a company Elastic has already decided to go that route. |
In talking to our ES Sales rep, I learned that they license nodes based on memory size, each node being worth 64GB of RAM (you should definitely double-check with your sales rep). If you currently have a single server running one instance of ES at 30 GB of heap, you could split it out into number of instances and split up the heap size based on the number of drives attached to that server. This is what we're planning on doing. |
That's nice. However, I read suggestions like this multiple times:
Since I have 10*2TB disks per ES instance and the CPU is already clocking around 80% (indexing rate ~140K/s). This means I can't do anything about that since I can't customize the VM resources ($$$). As I said earlier, I'd either have to attach larger disks (that'd cause a significant drop in I/O performance due to throttling) or spin up multiple smaller VMs resulting in wasting money on a problem I didn't have in the past 5 years. I can't even go to hot-warm-cold, since I need the data being indexed and there are 320 (*0.8) CPUs already doing that. That's why I'd like to see a solution that wouldn't cause performance issues (I love how fast Elasticsearch is currently) and won't need to pay 3-10 times more than I'm paying right now for analytics (e.g., managed cloud is waaay to expensive for me). |
I have huge cluster like that. and now I can't upgrade to 8.x and someone up in food chain just told me if we can't upgrade look for alternative. and so we are already checking few other technology. can't say anything more then that. |
@elasticforme it looks like it is still possible to use MDP in version See logs from test cluster having 2 20GB volumes configured.
|
Seems that the critical alert in the upgrade assistant was a bug and should have been a warning... At least that gives us some extra time to move away from it.. |
There isn't much going on there, nor here. I'm fine with y'all saying that we have to solve it ourselves, but I need to know if Elastic is going to help our case or not. As previously discussed most of us would need to invest thousands of dollars to be able to scale with this change which is a no-go. Moving to another solution needs planning and testing (time&money too) but needs to be set in motion to be able to do that before EOL. Not sure if you moved hundreds of terabytes of data but it isn't a walk in the park. |
One thing is for sure, elastic will lose some clients if they decide to keep putting their heads in the sand singing "lalalalala industrial standards lalalalala". |
It's a shame, I was hoping that the Multiple Data Paths feature would have solved my low disk space issue in my cluster.
But in my case I was wondering if this other solution is possible:
|
I would not bet that the matter is closed on the customer's side. I think some day elastic will again push to drop the feature despite the lack of existence of any equivalent feature in the "industry". |
Multiple Data Paths (MDP) is a pseudo-software-RAID-0 feature within Elasticsearch allowing multiple paths to be specified in the path.data setting (which usually point to different disks). Although it has been used in the past as a simple way to run a multi-disk setups, it has long been a source of user complaints due to confusing or unintuitive behavior. Additionally, the implementation is complex, and not well-tested nor maintained, with practically no benefit over spanning the data path filesystem across multiple drives and/or running one node for each data path.
We have long advised against using MDP, and are now ready to deprecate and remove it. This is a meta-issue to track that work.
Remove documentation from 8.0Block MDP in 8.0Remove MDP from 8.0The text was updated successfully, but these errors were encountered: