-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/202 transform date add date conversion #622
Bugfix/202 transform date add date conversion #622
Conversation
0d50db4
to
77d4499
Compare
Codecov Report
@@ Coverage Diff @@
## main #622 +/- ##
============================================
- Coverage 75.90% 75.88% -0.02%
- Complexity 2811 2849 +38
============================================
Files 362 364 +2
Lines 16042 16176 +134
Branches 2301 2324 +23
============================================
+ Hits 12176 12275 +99
- Misses 2535 2563 +28
- Partials 1331 1338 +7
|
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformContext.kt
Outdated
Show resolved
Hide resolved
...n/org/opensearch/indexmanagement/transform/action/preview/TransportPreviewTransformAction.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformUtils.kt
Show resolved
Hide resolved
PR comments need to be addressed |
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformIndexer.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
} | ||
val sourceFieldType = IndexUtils.getFieldFromMappings(dimension.sourceField, sourceIndexMapping) | ||
// Consider only date fields as relevant for building the target index mapping | ||
if (dimension !is DateHistogram && sourceFieldType?.get(TYPE) != null && sourceFieldType[TYPE] == "date") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why excludes DateHistogram here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the DateHistogram is supported by default.
The whole idea about this PR is about enabling support for grouping on the timestamp field (which is initial bug described here).
I listed here all the changes that I did in the PR.
So basically, grouping by using date_histogram is supported by default, only the format was missing - which I added in the PR (also in the comment of the PR I mentioned on which things we must take care when using date_histogram and format-interval relation).
You can also see that the integration tests I added are related with the date used in term aggregation - so If we decide to go without this support - I would suggest complete rework of this P (removal of term date agg support by adding validation that prevents user creating the grouping by date; then the question is do we need to create target index mapping based on a date fields at all)
src/test/kotlin/org/opensearch/indexmanagement/transform/TransformRunnerIT.kt
Outdated
Show resolved
Hide resolved
src/test/kotlin/org/opensearch/indexmanagement/transform/TransformRunnerIT.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformIndexer.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
if (!isFieldInMappings(dimension.sourceField, sourceIndexMapping)) { | ||
throw TransformIndexException("Missing field ${dimension.sourceField} in source index") | ||
} | ||
val sourceFieldType = IndexUtils.getFieldFromMappings(dimension.sourceField, sourceIndexMapping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here seems to be the a position to check and restrict user to only do date_histogram grouping on date field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out this comment.
If we decide to support only date_histogram than this PR should be reworked just to prevent user from doing a grouping based on the timestamp field (since date_histogram is already supported - just the format field is missing which is also added in this PR).
Pasting the Ravi response from mine, Praveen and Ravi communication:
"
It was a miss during initial implementation to correctly index the date field columns.
I believe the best way to fix this would be while creating the target index set the mapping types for target fields same as the mapping types of source fields -
https://github.com/opensearch-project/index-management/blob/main/src/main/kotlin/org/opensearch/indexmanagement/transform/TransformIndexer.kt#L54,at the moment we rely on cluster determine the mapping type for target fields.
We do something similar for rollups but instead of setting individual field mapping during rollup index creation we set the mapping type based on field suffix -
But for transforms we cannot define the types in template since we don’t enforce suffixes.
"
Now, we just need to decide :) @bowenlan-amzn what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I prefer to go with dynamic mapping then for transform and don't restrict the grouping on date type field
7975edd
to
a755e51
Compare
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformRunner.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/common/model/dimension/DateHistogram.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TargetIndexMappingService.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformSearchService.kt
Outdated
Show resolved
Hide resolved
...n/org/opensearch/indexmanagement/transform/action/preview/TransportPreviewTransformAction.kt
Show resolved
Hide resolved
src/test/kotlin/org/opensearch/indexmanagement/transform/TransformRunnerIT.kt
Show resolved
Hide resolved
a755e51
to
20af048
Compare
Signed-off-by: Stevan Buzejic <[email protected]>
…g once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]>
… index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]>
…d in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]>
…nsform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]>
Signed-off-by: Stevan Buzejic <[email protected]>
Signed-off-by: Stevan Buzejic <[email protected]>
…ield. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]>
20af048
to
0f29a0b
Compare
Signed-off-by: Stevan Buzejic <[email protected]>
Signed-off-by: Stevan Buzejic <[email protected]>
c39b680
to
7e43c7c
Compare
…form mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]>
e470657
to
96af8b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]> (cherry picked from commit 42833b1)
* 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]> (cherry picked from commit 42833b1) Co-authored-by: Stevan Buzejic <[email protected]>
* 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]>
* 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]> (cherry picked from commit 42833b1)
* 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]> (cherry picked from commit 42833b1) Co-authored-by: Stevan Buzejic <[email protected]>
opensearch-project#803) * 202: Added format property when specifying the date histogram Signed-off-by: Stevan Buzejic <[email protected]> * 202: Added component responsible for building the target index mapping once the transform is being triggered. Signed-off-by: Stevan Buzejic <[email protected]> * 202: date_histogram considered in the case of the creating the target index for the date fields when transform is executed Signed-off-by: Stevan Buzejic <[email protected]> * 202: Enabled target index date field mappings if those fields are used in aggregations or as a term aggregation for defining the buckets Signed-off-by: Stevan Buzejic <[email protected]> * Updated code according to comments. Added targetIndexMapping when transform preview action is triggered Signed-off-by: Stevan Buzejic <[email protected]> * Updated schema versions Signed-off-by: Stevan Buzejic <[email protected]> * Addressed the comments Signed-off-by: Stevan Buzejic <[email protected]> * Refactored transform tests related with aggregation based on a date field. Updated transform preview action to consider target index mapping when using a date field. Kept formatting of the date field in target index. Signed-off-by: Stevan Buzejic <[email protected]> * detekt fix Signed-off-by: Stevan Buzejic <[email protected]> * Added zone in IT Signed-off-by: Stevan Buzejic <[email protected]> * Added function for creating target index mapping that considers transform mapping json. Target date field mappings are generated after transform validation when running transform. Removed target index date field values formatting. emoved default format for date_histogram because of the rollup. Updated schema version in test. Signed-off-by: Stevan Buzejic <[email protected]> --------- Signed-off-by: Stevan Buzejic <[email protected]> (cherry picked from commit 42833b1) Co-authored-by: Stevan Buzejic <[email protected]> Signed-off-by: Ronnak Saxena <[email protected]>
Issue #, if available:
#202
Description of changes:
This PR enables creation of target index mapping for a date fields by following next "rules":
@bowenlan-amzn
Here is some contextual background:
Once grouping by date field as term is executed, the request is being translated to composite aggregation request, that is executed against the cluster (on transform side) ie.:
is being translated to:
First thing noticed is that composite aggregation doesn't allow specifying a format of a date field that will be used for the purpose of grouping the data into buckets when using composite aggregation. On the other hand, cluster can return "original" field value as a string if regular terms grouping is being used or if the aggregation function (like min/max) is applied:
Cluster, then returns:
So long story short, the cluster "flattened" the field values to epcoh millis and then did the grouping - in the case of composite aggregation the format can't be specified UNLESS the date_histogram is being used. In the case of "normal" term grouping, cluster by default returns the value _as_string.
Specifying a format is enabled on a date_histogram:
Caveat:
When defining the interval and format (when date_histogram is used), user must be aware of their co-relation ie. since transform creates a buckets and once each result group is being returned it returns after_key (which is used in order to detect where the processing ended, in which bucket) which is used later for pagination, cluster won't know how to create the after key (since it did bucketing on the hour level but because of the format it flattens in a date format and after key that is being returned is in one moment always the same).
Beside the date format when using the date_histogram, the idea was to create target index mapping with the date field types used as a field definition, when grouping by a date is applied.
Further improvement would be - taking care about the date format that is used for a date field in a source index (user can specify multiple formats so, somehow once the execution of transform occurs, we would need to figure out in which format the date is stored and maybe use this format for storing and formatting the target index date field once the cluster retrieves the value)
CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.