Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal serialisation #3497

Closed
RevoluPowered opened this issue Nov 26, 2018 · 14 comments
Closed

Decimal serialisation #3497

RevoluPowered opened this issue Nov 26, 2018 · 14 comments

Comments

@RevoluPowered
Copy link

In C# we use decimal to represent money, I am just wondering if decimal is stored in elastic as two integers or in a decimal format which is okay for use with currency?

Or alternatively, how is the decimal type stored inside ES?

What's the error rate? etc

@Henr1k80
Copy link
Contributor

AFAIK, decimals are stored as doubles in the index.
When they are deserialized, they are of course decimal again, so you only need to worry about storage precision (unless you have business logic in elastic update scripts, then I guess the precision is double)

@RevoluPowered
Copy link
Author

RevoluPowered commented Nov 26, 2018

Thanks for your reply, we need more precision in our storage than double could provide for our monetary calculations, I am curious is there a good way of implementing a way of storing decimal to two ints perhaps, instead of to the double type in es?

@Henr1k80
Copy link
Contributor

I would probably make two properties for the integers with a getter that gets the integer part on serialization.
The setters, (only called on deserialization) just sets a backing field.
I would mark them [Obsolete("Only for serialization/deserialization", error: true)] to ensure that they are only used to serialize and not from code.

The decimal getter should have some logic to tell whether it should be initialized from the integers (only on first use after deserialization).
I would set [Ignore] to avoid any confusion by it being stored and indexed, unless it is needed for sorting where I guess double precision is ok?

@RevoluPowered
Copy link
Author

Something similar to this?

public class Currency
{
	public Currency(decimal value)
	{
		Value = value;
	}
	
	// stored in elastic
	public int pence;
	public int pounds;

	// could be lazy loaded too, to prevent having to do math when returning the value 
	[Ignore]
	public decimal Value{   
	get 
	{ 
		// add fractions of a pence to pounds amount in decimal types.
		return ((decimal)pence/100m)+ (decimal)pounds; 
	}
	set { 
		// note: not tested this yet
		pounds = Math.Truncate(value); // pounds / dollars
		pence = value - Math.Truncate(value); // pence / cents value
	}
	}
}

Then I could maybe do some further work on this so instead of having to declare 3 vars, I could have a Currency type? which serializes into elastic safely with the two int's?

So in future it uses something nested like this:,

public class Car
{
     public Currency price = new Currency( 2500.50m );
}

Shame there isn't a tidier way of doing this, I guess though it's better knowing things are all POCO and no special cases exist, then it can be planned for appropriately.

@Henr1k80
Copy link
Contributor

If you only are storing two decimals (pence), then I am pretty sure that double precision for storage is enough. At least for int pounds. Eg.:

public class Car
{
    [Ignore] // do not store without rounding
    public decimal Price { get; set; }

    [Obsolete("Only for serialization/deserialization, use Price", error: true)]
    [Number("Price")] // named Price in elastic for transparency
    public decimal StoredPrice {
        get { return Math.Round(Price, 2); }
        set { Price = Math.Round(value, 2); }
    }
}

Even simpler, only store the price in pence, if the floating point representation isn't needed in elastic

public class Car
{
    [Ignore] // do not store without rounding
    public decimal Price { get; set; }

    [Obsolete("Only for serialization/deserialization, use Price", error: true)]
    public long PencePrice {
        get { return Price*100L; }
        set { Price = value/100m; }
    }
}

@Mpdreamz
Copy link
Member

Love the solutions on this thread, we inherit our decimal serialization behaviour from Json.NET here which has no lossless support:

JamesNK/Newtonsoft.Json#1726

Another solution would be to create a custom json.net serializer that reads and writes longs while also mapping the Price field as scaled_float in Elasticsearch:

https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

This could be something we oughta be considering shipping out of the box support for, thoughts @codebrain @russcam ?

@codebrain
Copy link
Contributor

codebrain commented Nov 27, 2018

Currency storage is a little bit of a minefield. In most domains you very quickly get into the situation where you need to store both the currency representation and the ISO code.

https://github.com/dgg/nmoneys is a pretty decent resource on how to do this in .NET - I believe the value type used is decimal (https://github.com/dgg/nmoneys/blob/b9da020206cbdff51ff68bc53f599f68ee1e13df/src/NMoneys/Serialization/Data.net.cs#L39)

I terms of storing in ES... difficult if the data store does not support precise decimal values (128 bit).

I think if we try to work around this in the client we'd very quickly get into situations like:

  • Store the decimal value as a double in ES in order to use numeric-based functions (ranges / aggregations etc).
  • and Store the decimal value as a string in order to support lossless parsing when exact value is needed or Store decimal as two integer values to represent whole units and fractional units - both versions require some kind of parsing after retrieval.

I think we run the risk of being a little over-opinionated on our default implementation if we place this inside the client code - however, its a perfect candidate for an example serialiser.

@RevoluPowered
Copy link
Author

We store huge fractions of a pence unfortunately for what we're doing so 2dp is not enough. we use carbon per kWh, so fractions of a pence is extremely common in the smallest calculation, plus we do the calculation across literally millions of docs.
@Henr1k80

I'd love to see elastic fully support currency, obviously it's a headache but probably worth it if you develop a good solution as the code examples above have some overhead which across 1m+ docs would be slow, especially if you're using multiple fields.

@codebrain
Perhaps the ISO code wouldn't matter in some systems, could disable it by default and only enable it if someone specifies that the field requires it?

for example in my system we only use GBP for now, anything in USD etc would just be using a currency conversion lookup.

@codebrain
Copy link
Contributor

codebrain commented Nov 28, 2018

I would envisage currency support in two stages:

  1. Implement decimal types in ES core.
  2. Build a currency type based on an amalgam of string and decimal.

I'll ping our ES core developers and see where (and if) decimal support is on the roadmap.

@RevoluPowered your best option at the moment, if you want exact precision per individual value, is to store as a long or int where something like 1000000 = $1.000000 and convert accordingly.

@jpountz
Copy link

jpountz commented Nov 28, 2018

This issue has some context: elastic/elasticsearch#17006. In short aggregations use doubles internally, which makes supporting any data type more accurate than double probably deceptive in practice since values would be casted to the nearest double when aggregating. So we decided against supporting a decimal type.

@RevoluPowered
Copy link
Author

@codebrain yeah I will probably go with the long solution as a workaround for now, it would be nice to see something official though, this wrapper we have above is definitely not following the KISS principle.

@jpountz yeah, I get what you're saying but it might be the case that it needs a re-think, I'm fairly sure clients are storing calculations in C# using decimal, and blindly overlooking that its storing as a double in ES core, I think that could potentially go quite wrong.

Overall though, it's a lot of work so I'll leave it to you guys! Thanks for the feedback on this.

@russcam
Copy link
Contributor

russcam commented Nov 29, 2018

There is a section in the mapping documentation about mapping of .NET types to Elasticsearch field datatypes, with an important admonition about decimal type. This came about from discussion on #2463

@RevoluPowered
Copy link
Author

Just had a double precision issue in our aggregations and one of our configurations for our devices.

Is ES Core going to put decimal on the roadmap? if not I might have to make something down the custom serializer route for decimal.

If that is the case is there any way to handle anything other than double in aggregations, I assume it handles long, int but would a custom serializer function in aggs?

@russcam
Copy link
Contributor

russcam commented Mar 13, 2019

@RevoluPowered there isn't a way to plug your own serialization into aggregations in NEST; you can control the complete serialization/deserialization with the low level client, Elasticsearch.Net. Here's an example

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionConfiguration(pool);
var client = new ElasticLowLevelClient(settings);
var response = client.Search<StringResponse>(PostData.Serializable(
	new {
		size = 0,
		aggs = new 
		{
			genres = new
			{
				terms = new
				{
					field = "genre"
				}
			}
		}
	}
));

var jObject = JsonConvert.DeserializeObject<JObject>(response.Body);

This returns the response from Elasticsearch as a string, then deserializes to a Json.NET JObject.

I'm going to close this issue for now as there isn't anything further actionable in the client that we can do for decimal serialization.

@russcam russcam closed this as completed Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants