Keep write support for old codecs? [LUCENE-9234] #10274

asfimport · 2020-02-18T18:15:23Z

Currenty we maintain read/write support for the latest codec in lucene/core, and read-only support for codecs of previous versions (up to {N-1}.0}) in lucene/backward-codecs. We often keep write support in test-framework for testing purposes only.

This raises challenges for Elasticsearch with regard to rolling upgrades: we have some users who index very large amounts of data on clusters that are quite large, so that rolling upgrades take significant time. Meanwhile, several indices may be created.

Allocating indices when the cluster has nodes of different versions requires care as Lucene indices created on nodes with a newer version cannot be read by the nodes running the older version. It is possible to force primary replicas to be allocated on the older nodes, but this brings other problems like availability, uneven disk usage across nodes, or moving a lot of data around.

If Lucene could write data using the minimum version that exists in the cluster, this would avoid this problem as the written data could be read by any node of the cluster. I understand this change would not come for free, especially when it comes to testing as we'd need to make sure that older Lucene versions can read indices created by this "compatibility mode".

I'd be curious to understand whether this is a problem for Solr too, if not how this problem is being handled, and maybe whether there are other problems that you have encountered that would also benefit from the ability to write data with an older format.

Migrated from LUCENE-9234 by Adrien Grand (@jpountz), resolved Oct 14 2020

asfimport · 2020-02-18T20:20:54Z

Robert Muir (@rmuir) (migrated from JIRA)

I don't think we should do this. Having to write not just N but N-1 and support reads for those writes later is too much.
This seems to be a situation where the distributed systems are too lazy to have enough nodes or the correct logic.

I understand this change would not come for free, especially when it comes to testing

If this is the case, then perhaps offer a "concrete bargain" from the distributed systems side. Personally I assume they are just lazy, and trying to force the work on lucene (simply look at their tests for inconclusive proof of this!). So I would like to know what they would be willing to tradeoff in return. For example, solr tests running successfully in 5 minutes on my machine?

asfimport · 2020-02-18T22:49:26Z

Tomas Eduardo Fernandez Lobbe (@tflobbe) (migrated from JIRA)

I'd be curious to understand whether this is a problem for Solr too, if not how this problem is being handled, and maybe whether there are other problems that you have encountered that would also benefit from the ability to write data with an older format.

Yes, this is the same problem in Solr in my experience. For the existing collection, things are hidden a bit by the fact that newly elected leaders tend to be the oldest active replica (because of how leader election works) but this is in no way guaranteed. For new collection, I guess one could use placement rules to define where the replicas should land, but as you said, this creates imbalances. Certainly having a Solr cluster with more than one version version is a recipe for problems.

asfimport · 2020-02-21T18:10:40Z

Adrien Grand (@jpountz) (migrated from JIRA)

I think this option is appealing because it doesn't require direct trade-offs from the users, but it definitely has a big maintenance/test cost.
Thanks @tflobbe for checking!

asfimport · 2020-02-24T18:13:01Z

David Smiley (@dsmiley) (migrated from JIRA)

I tend to agree with Rob. Distributed systems on top of Lucene should be able to cope with the status quo, and this may mean more work for replica placement to consider the version if this wasn't thought of in the past. And a truly big/hard-core user could do some relatively basic Lucene re-packaging to ship the previous version if they were sufficiently motivated to care. Not all big search users would even care about this since a re-index or backup/restore may be feasible (it is where I work).

asfimport closed this as completed Oct 14, 2020

jpountz mentioned this issue Jun 23, 2023

Support writes with previous major lucene versions #12391

Open

dreamer-89 mentioned this issue Jun 27, 2023

[Segment Replication] Support for mixed cluster versions (Rolling Upgrade) opensearch-project/OpenSearch#3881

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep write support for old codecs? [LUCENE-9234] #10274

Keep write support for old codecs? [LUCENE-9234] #10274

asfimport commented Feb 18, 2020

asfimport commented Feb 18, 2020

asfimport commented Feb 18, 2020

asfimport commented Feb 21, 2020

asfimport commented Feb 24, 2020

Keep write support for old codecs? [LUCENE-9234] #10274

Keep write support for old codecs? [LUCENE-9234] #10274

Comments

asfimport commented Feb 18, 2020

asfimport commented Feb 18, 2020

asfimport commented Feb 18, 2020

asfimport commented Feb 21, 2020

asfimport commented Feb 24, 2020