-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost fields and postings/points [LUCENE-10357] #11393
Comments
I've just seen an occurrence of this, |
@jpountz I am looking into fixing the points issue that I mentioned above. We could for instance ensure that |
It's a bit more complicated than that. Callers indeed cannot call |
Introduction of dynamic pruning for string sorts (apache#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is not null if the field is indexed according to field infos. This commit fixes the issue and adds tests for ghost fields across all sort types. Hopefully we can simplify and remove the null check in the future when we improve handling of ghost fields (apache#11393).
getPointValues may currently return null for unknown fields or fields that don't index points. It can happen that a field no longer has points for any document in a segment after delete+merge, which causes field info to think that the field is there and has points, yet when calling getPointValues null is returned. With this change, we prevent getPointValues from returning null for ghost fields, it will instead return an empty instance of PointValues. Relates to apache#11393
getPointValues may currently return null for unknown fields or fields that don't index points. It can happen that a field no longer has points for any document in a segment after delete+merge, which causes field info to think that the field is there and has points, yet when calling getPointValues null is returned. With this change, we prevent getPointValues from returning null for ghost fields, it will instead return an empty instance of PointValues. Relates to apache#11393
FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates apache#11393
FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates apache#11393
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is not null if the field is indexed according to field infos. This commit fixes the issue and adds tests for ghost fields across all sort types. Hopefully we can simplify and remove the null check in the future when we improve handling of ghost fields (#11393).
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is not null if the field is indexed according to field infos. This commit fixes the issue and adds tests for ghost fields across all sort types. Hopefully we can simplify and remove the null check in the future when we improve handling of ghost fields (#11393).
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with string sorts and ghost fields, triggering a `NullPointerException` because the code assumes that `LeafReader#terms` is not null if the field is indexed according to field infos. This commit fixes the issue and adds tests for ghost fields across all sort types. Hopefully we can simplify and remove the null check in the future when we improve handling of ghost fields (#11393).
FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates #11393
FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates #11393
FieldExistsQuery checks if there are points for a certain field, and then retrieves the corresponding point values. When all documents that had points for a certain field have been deleted from a certain segments, as well as merged away, field info may report that there are points yet the corresponding point values are null. With this change we add a null check in FieldExistsQuery. Long term, we will likely want to prevent this situation from happening. Relates #11393
On doc values and norms, we require that codec APIs return non-null empty instances on fields that have the feature turned on on their FieldInfo, even if all values have been merged away.
However postings and points have the choice: they may either return empty instances or
null
. See e.g.BasePostingsFormatTestCase#testGhosts
for instance.I fear that this could be a source of bugs, as a caller could be tempted to assume that he would get non-null terms on a FieldInfo that has IndexOptions that are not NONE. Should we introduce a contract that FieldsProducer (resp. PointsReader) must return a non-null instance when postings (resp. points) are indexed?
Migrated from LUCENE-10357 by Adrien Grand (@jpountz), updated Jul 13 2022
Pull requests: #907
The text was updated successfully, but these errors were encountered: