BigInteger/BigDecimal support #5683

jprante · 2014-04-04T08:16:24Z

For XContentBuilder/XContentParser and document mapping, this will add support for "big" numeric types BigInteger/BigDecimal.

BigInteger/BigDecimal support for XContentBuilder/XContentParser is implemented by using the existing Jackson support for the "big" numeric types. A new method losslessDecimals() is used to switch the XContentParser into recognizing BigInteger/BigDecimal in precedence over primitive numeric types, for better convenience when using the Java API for parsing document sources with BigInteger/BigDecimal field values.

For the document mapping, new core types biginteger and bigdecimal are introduced. With a new flag lossless_numeric_detection, the precedence of BigInteger/BigDecimal over primitive numeric types can be controlled in the mapping. When set to true, new dynamic numeric fields are assigned to "big" numeric types first. Default is false, where primitive numeric types still take precedence.

Caveat: BigInteger/BigDecimal support is just meant for search and indexing/storing. The "big" numeric types are degraded to their .longValue() and .doubleValue() components when they are used in NumericRangeQuery and related contexts, so it is not recommended to use values larger than Long.MAX_VALUE or Double.MAX_VALUE in analytical queries like facets and aggregations, strange cut-offs or underflows/overflows should occur.

…n flag

Conflicts: src/test/java/org/elasticsearch/common/xcontent/builder/XContentBuilderTests.java

jpountz · 2014-04-15T07:49:44Z

The "big" numeric types are degraded to their .longValue() and .doubleValue() components when they are used in NumericRangeQuery and related contexts

FYI there is some discussion on https://issues.apache.org/jira/browse/LUCENE-5596 in order to add this range support to types that are more than 64 bits.

jpountz · 2014-08-22T08:49:28Z

Quick update: most of this change is good and we would be a good start to support big integers/decimals in the future. I added the stalled label, since I think it would be important to support efficient range queries on such types without information loss (either via https://issues.apache.org/jira/browse/LUCENE-5879 or https://issues.apache.org/jira/browse/LUCENE-5596). Some other thoughts/open questions:

these types should probably be forbidden in the numeric metrics aggregations, otherwise we would either need to use big decimals there which would kill performance, or the information loss would make results unusable
these types should probably be opt-ins only since they would have different capabilities than the other numeric fields,
for sorting, should we use SORTED or BINARY doc value types? (I would lend towards SORTED which would make sorting faster)
should they be specified as strings or numbers in the _source document? (would there be compatibility issues with some languages/json parsers/json generators with numbers?)

jprante · 2014-08-22T12:52:19Z

@jpountz

"these types should probably be forbidden in the numeric metrics aggregations, otherwise we would either need to use big decimals there which would kill performance, or the information loss would make results unusable"

I agree that numeric metrics aggregation must never use BigInteger/BigDecimal types. A thought is to add a special aggregation type, like "monetary/financial aggregation", where performance is less important with regard to exactness/correctness of numeric results, and BigDecimal is not converted to double/float.

"should they be specified as strings or numbers in the _source document? (would there be compatibility issues with some languages/json parsers/json generators with numbers?)"

The Jackson library maps it to "JSON Type number" http://wiki.fasterxml.com/JacksonDataBinding
there are some mechanisms to let the parser auto-detect BigInteger (no fraction), but BigDecimal must be configured to override double/float (with fraction).

clintongormley · 2014-08-22T15:35:53Z

"should they be specified as strings or numbers in the _source document? (would there be compatibility issues with some languages/json parsers/json generators with numbers?)"

The Jackson library maps it to "JSON Type number" http://wiki.fasterxml.com/JacksonDataBinding
there are some mechanisms to let the parser auto-detect BigInteger (no fraction), but BigDecimal must be configured to override double/float (with fraction).

My concern here is more with other languages, eg Javascript can't support bigint/decimals, and we'll find lots of similar issues. It may be ok to accept them as numbers, as long as we also support coercing from strings. That way users of languages without support can still use them.

jprante · 2014-08-22T15:51:11Z

The problem of Javascript is, it has poor support of numbers, even 64bit ints fail (and I think ES/Lucene supports 64bit longs for a while now). BigInteger/BigDecimals can be added as an extension, at least to Node.js https://www.npmjs.org/package/json-bignum

kul · 2015-01-14T11:18:39Z

👍 much awaited.

mikemccand · 2015-09-01T23:24:07Z

I think https://issues.apache.org/jira/browse/LUCENE-6697 (just released in Lucene 5.3.0) is a compelling way to allow fast range filters on BigInteger/Decimal values.

Values for the field must be indexed as a SortedSetDocValuesField (with the BigInteger/Decimal value converted to a byte[]) and the field must use the RangeTreeDocValuesFormat. Then use the NumericRangeTreeQuery at search time.

Some care must be taken in the byte[] encoding, so that sort order is the same, e.g. I think this means the BigInteger field must have a max allowed value (set once up front in the mapping), and maybe the BigDecimal field must have the same up-front scale across all values (?), and the sign bit needs to be flipped like we do for NumericField.

But I think it should work well, and from my limited perf testing on the original issue, the resulting index is smaller and filters are faster than NumericField/RangeQuery.

One caveat is because this code is very new, it lives in sandbox now, and there's no guarantee of back-compat of the file-format it writes. But then, the file format is also ridiculously simple ...

muelli · 2015-09-29T09:38:56Z

Big integers are also interesting for cryptographic applications.

SKumarMN · 2015-10-05T05:09:16Z

@jprante Does the above fix support range and filter queries too ?. Any Idea when Elastic Search is gonna add BigDecimal /BigInteger Support oficially

jprante · 2015-10-05T09:12:12Z

From what I can see BigDecimal/BigInteger is implemented in Lucene 5.3 which will appear in Elasticsearch 2.x (not 2.0)

SKumarMN · 2015-10-05T11:41:51Z

@jprante

Hey, I have applied this fix mentioned in this post however when I index data or fetch it data is getting rounded off. I am using the REST API calls. Am I doing anything wrong here.

Here is my mappings

{
"tweety": {
"properties": {
"message": {
"type": "string"
},
"post_date": {
"type": "date",
"format": "dateOptionalTime"
},
"newint": {
"type": "biginteger",
"lossless_numeric_detection": true
}
}
}
}

Data:

{
"newint": 19999999999999999999999999999999999,
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elastic-search"
}

Get Result:

{
"_index": "twitter",
"_type": "tweety",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"newint": 2e+34,
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elastic-search"
}
}

jprante · 2015-10-05T12:01:31Z

@SKumarMN the patch is only 50% of the required work. It only means that BigInteger/BigDecimal is accepted as JSON input. The default is to downgrade the accepted values to double/float wherever possible, otherwise, the change would not be compatible to existing ES applications. REST actions would have to be changed to prefer BigInteger/BigDecimal.

clintongormley · 2015-10-06T13:38:08Z

From what I can see BigDecimal/BigInteger is implemented in Lucene 5.3 which will appear in Elasticsearch 2.x (not 2.0)

This code is in the Lucene sandbox only. We need to wait until it graduates to core before we can start using it.

mikemccand · 2015-10-06T15:06:52Z

We need to wait until it graduates to core before we can start using it.

I'm working on graduating this to Lucene's core ... here's the first step: https://issues.apache.org/jira/browse/LUCENE-6825

clintongormley · 2015-10-06T17:58:05Z

w00t!

SKumarMN · 2015-10-19T11:19:32Z

@jpountz

Hi,

I have used the fixhttps://github.com//pull/5758 in my 1.4.4 code to support big integer by changing the IPV6 Mapper. Search and range queries works fine. Our application needs support for Bigdecimal too. Could you please provide me pointers about how can i implement big decimal support with range functionality as well..

clintongormley · 2016-03-08T14:11:28Z

Closing in favour of #17006

jprante added 5 commits March 31, 2014 22:53

BigInteger and BigDecimal support for XContentBuilder/XContentParser

cc4c8a5

BigInteger/BigDecimal types in mapping, new lossless_numeric_detectio…

af47b14

…n flag

Merge remote-tracking branch 'upstream/master' into bigdecimal

b93cb93

Conflicts: src/test/java/org/elasticsearch/common/xcontent/builder/XContentBuilderTests.java

BigInteger/BigDecimal test for XContentBuilder

3cb5d5b

added BigDecimal/BigInteger numeric mapping tests

d2b5fba

jprante mentioned this pull request Apr 4, 2014

IndexRequestBuilder.setSource(Map<String, Object> source) does not handle BigDecimal #5260

Closed

sync to master

a02eaed

nickminutello mentioned this pull request May 7, 2014

Add support for lossless storage of BigDecimal numeric values in _source #5491

Closed

clintongormley added Lucene 4.2 Upgrade and removed Lucene 4.2 Upgrade labels Aug 7, 2014

clintongormley added the high hanging fruit label Aug 22, 2014

jpountz added the stalled label Aug 22, 2014

clintongormley removed the discuss label Sep 6, 2014

clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Nov 11, 2014

This was referenced Nov 24, 2014

When i using elasticsearch to search a long field encountered a strange question #8546

Closed

Very large longs appear to be truncated in _source #8661

Closed

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

clintongormley mentioned this pull request Jul 23, 2015

XContentBuilder.writeValue() serializes java.math.BigDecimal as String #12385

Closed

mikemccand removed the stalled label Sep 1, 2015

mikemccand mentioned this pull request Sep 1, 2015

add ipv6 field support #5758

Closed

rjernst mentioned this pull request Oct 6, 2015

Support unsigned number types #13951

Closed

clintongormley closed this Mar 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigInteger/BigDecimal support #5683

BigInteger/BigDecimal support #5683

jprante commented Apr 4, 2014

jpountz commented Apr 15, 2014

jpountz commented Aug 22, 2014

jprante commented Aug 22, 2014

clintongormley commented Aug 22, 2014

jprante commented Aug 22, 2014

kul commented Jan 14, 2015

mikemccand commented Sep 1, 2015

muelli commented Sep 29, 2015

SKumarMN commented Oct 5, 2015

jprante commented Oct 5, 2015

SKumarMN commented Oct 5, 2015

jprante commented Oct 5, 2015

clintongormley commented Oct 6, 2015

mikemccand commented Oct 6, 2015

clintongormley commented Oct 6, 2015

SKumarMN commented Oct 19, 2015

clintongormley commented Mar 8, 2016

BigInteger/BigDecimal support #5683

BigInteger/BigDecimal support #5683

Conversation

jprante commented Apr 4, 2014

jpountz commented Apr 15, 2014

jpountz commented Aug 22, 2014

jprante commented Aug 22, 2014

clintongormley commented Aug 22, 2014

jprante commented Aug 22, 2014

kul commented Jan 14, 2015

mikemccand commented Sep 1, 2015

muelli commented Sep 29, 2015

SKumarMN commented Oct 5, 2015

jprante commented Oct 5, 2015

SKumarMN commented Oct 5, 2015

jprante commented Oct 5, 2015

clintongormley commented Oct 6, 2015

mikemccand commented Oct 6, 2015

clintongormley commented Oct 6, 2015

SKumarMN commented Oct 19, 2015

clintongormley commented Mar 8, 2016