Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separation of Concerns in DocValueFormat #47469

Open
not-napoleon opened this issue Oct 2, 2019 · 6 comments
Open

Separation of Concerns in DocValueFormat #47469

not-napoleon opened this issue Oct 2, 2019 · 6 comments
Labels
:Analytics/Aggregations Aggregations Meta >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt

Comments

@not-napoleon
Copy link
Member

FieldType defines a method docValueFormat(String format, ZoneId timezone) which specific field types can override to provide support for formatters in aggregations. The resulting DocValueFormat, however, has multiple meanings which are sometimes incompatible. There are at least three different ways in which this formatter is used:

  • Parsing string values from the user supplied missing parameter
  • Formatting the key (i.e. bucket id) on bucket aggregations
  • Formatting the value on metric aggregations

There may be other, as yet unidentified, roles this formatter is being used for.

This creates a number of challenges, most notably when the input type and output type of an aggregation differ, as the same formatter will not make sense for both the input and the output. Most aggregations to this point have not encountered this issue because the input and output types line up, but as we try to expand aggregation support to more field types, that will stop being true. Especially after the planned values source refactor (see #42949), which will enable dynamically registering type support for aggregations, relying on inputs and outputs using the same format breaks down.

Please use this issue to discuss possible solutions and record issues we are running into with the current implementation.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

csoulios added a commit to csoulios/elasticsearch that referenced this issue Oct 3, 2019
StringStatsAggregatorTests#testSingleValuedFieldFormatter fails because
of elastic#47469
@not-napoleon
Copy link
Member Author

As per conversation with @jpountz , while we want to split this out for aggregations, we don't want to make a change at the field mapping level.

@jtibshirani
Copy link
Contributor

Adding another use of docvalues formats for context: they're used to format docvalues_fields in the search response. It's easy to forget this use case and we have some related bugs (#53246, #55993).

@imotov
Copy link
Contributor

imotov commented Mar 9, 2021

@jpountz @jtibshirani I would like to start addressing this issue on the aggs level. Just to keep things simple, I would like to narrow the scope of this particular issue to depreciating the format parameter in aggregations and introducing two separate input_format and output_format with first one to be used for handling missing parameter and the second one to be used for handling the formatting of the keys in bucket aggs and string representation of values in metrics aggs.

In other words we will not address the issues raised by Julie nor change the way format in field mapping works, we will just start with ability to have 2 separate format strings. If you are ok with this I will rename the issue accordingly.

@jpountz
Copy link
Contributor

jpountz commented Mar 9, 2021

@imotov Internally distinguishing the input format and the output format makes sense to me. I know of use-cases for configuring the output format on aggregations, e.g. making sure dates are formatted as millis since epoch rather than the format that is configured on the field. However I'm not sure I know of a use-case for configuring the input format, do you know of one?

@imotov
Copy link
Contributor

imotov commented Mar 9, 2021

Yes, we are using it while parsing string values specified by the user in the missing parameter, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Meta >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt
Projects
None yet
Development

No branches or pull requests

7 participants