Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unbiased rounding. #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ TESTS = $(wildcard test/sql/*.sql)
REGRESS_BRIN := $(shell pg_config --version | grep -qE "XL 9\.[5-9]| 10\.0" && echo brin-xl)
REGRESS_BRIN += $(shell pg_config --version | grep -E "9\.[5-9]| 10\.0" | grep -qEv "XL" && echo brin)
REGRESS_VERSION_SPECIFIC := $(shell pg_config --version | grep -qE "XL" && echo index-xl || echo index)
REGRESS = $(shell echo aggregate cast comparison overflow $(REGRESS_BRIN) $(REGRESS_VERSION_SPECIFIC))
REGRESS = $(shell echo aggregate cast comparison overflow round $(REGRESS_BRIN) $(REGRESS_VERSION_SPECIFIC))

REGRESS_OPTS = --inputdir=test --outputdir=test --load-extension=fixeddecimal

Expand Down
119 changes: 80 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,51 @@ Works with PostgreSQL 9.5 and Postgres-XL 9.5
Overview
--------

XXX Remove this paragraph if accepted upstream. This feature fork adds half even
rounding. The rounding works accurately when parsing a string and casting
from the numeric data type. During division, extra digits are used to detect
if the result might be on a midpoint. If so, a modulus is used to indicate
midpoint rounding if there are only 0's past the midpoint (5). I am not sure
if the modulus is able to accurately indicate that there is a remainder if
the remainder is a string of 0's past the midpoint, and past the scale of
this calculation, changes to non-0 digits. This is where base 2 can cause
problems for someone expecting base 10 results.
Regardless, this feature significantly removes the bias present in
truncation and has less bias than the more biased rounding types. This may
be good enough for statistics, but financial calculations may want to type
cast to numeric before performing division or multiplication until this code
is proven to match decimal (base 10) calculations with the standard half
even rounding. These changes should not impact the performance advantage of
this data type outside of the operations described above.
It is my opinion that an exact data type should not offer inexact operations
without requiring something explicit like a type cast per the principle of
least surprise. I expect PostgreSQL not to violate that principle. I expect
that truncation was originally chosen because it may produce consistent
results between base 10 and base 2 operations, but I cannot think of many
use cases where truncation would be preferable over a low bias rounding
method, even with the base 10 consistency that it offers.
Note that exact means the ability to represent base 10 and its rounding
rules when fraction data underflows. Base 2 is not inexact in itself, it is
just that there are fractions in base 10 that can be exactly represented as
a number while base 2 cannot exactly represent the same number. But this
rule is true the other way around. Consider this, where float produces an
exact answer and numeric does not:
SELECT (1 * (987654321.0 * 123456789.0) / (0.123456789 / 998877665544332211.0)) / (987654321.0 * 123456789.0) * (0.123456789 / 998877665544332211.0) AS "Should be 1";
SELECT (1::FLOAT * (987654321.0 * 123456789.0) / (0.123456789 / 998877665544332211.0)) / (987654321.0 * 123456789.0) * (0.123456789 / 998877665544332211.0) AS "Should be 1";
The expectation that base 2 is less exact is probably due to the fact that
we display base 2 numbers in decimal notation. If numbers were commonly
displayed in binary notation, we would call float exact and decimal inexact.
Likewise, decimal is inexact for dozenal.

XXX Also, fix numeric and the round function so that it uses unbiased rounding
(someone might be working on this). For example, check the results of these
before and after this patch:
SELECT (54::fixeddecimal / 0.03::fixeddecimal) / 54::fixeddecimal * 0.03::fixeddecimal AS "Should be 1";
SELECT (54::numeric(8,4) / 0.03::numeric(8,4)) / 54::numeric(8,4) * 0.03::numeric(8,4) AS "Should be 1";

XXX Fix capitalization inconsistencies: FixedDecimal, Fixeddecimal,
fixeddecimal, FIXEDDECIMAL.

FixedDecimal is a fixed precision decimal type which provides a subset of the
features of PostgreSQL's builtin NUMERIC type, but with vastly increased
performance. Fixeddecimal is targeted to cases where performance and disk space
Expand All @@ -31,7 +76,11 @@ from NUMERIC.
although the underlying type is unable to represent the full range of
of the 17th significant digit.

2. FIXEDDECIMAL always rounds towards zero.
2. FIXEDDECIMAL uses base 2 instead of base 10 for operations. It is exact
until you multiply with a number that exceeds the scale or divide. Then,
numbers past the scale are subject to base 2 representation and may round
differently than a base 10 operation would. See the Caution section for
details.

3. FIXEDDECIMAL does not support NaN.

Expand Down Expand Up @@ -71,57 +120,49 @@ FIXEDDECIMAL_MULTIPLIER should be set to 10000. Doing this will mean that the
absolute limits of the type decrease to a range of -922337203685477.5808 to
922337203685477.5807.

The rounding is half even to reduce bias and to match the rounding expectations
set by various accounting and computer standards.

Caution
-------

FIXEDDECIMAL is mainly intended as a fast and efficient data type which will
suit a limited set numerical data storage and retrieval needs. Complex
arithmetic could be said to be one of fixeddecimal's limits. As stated above
division always rounds towards zero. Please observe the following example:

```
test=# select '2.00'::fixeddecimal / '3.00'::fixeddecimal;
?column?
----------
0.66
(1 row)
```
arithmetic could be said to be one of FIXEDDECIMAL's limits. As stated above
FIXEDDECIMAL uses base 2 for operations. This means that when using division or
multiplication with a non-zero fraction, FIXEDDECIMAL may produce results that
are inconsistent with the same operations as performed in base 10.

A workaround of this would be to perform all calculations in NUMERIC, and
ROUND() the result into the maximum scale of FIXEDDECIMAL:
A workaround of this would be to perform calculations that are not exclusively
addition and subtraction in NUMERIC, and ROUND() the result into the maximum
scale of FIXEDDECIMAL:

```
test=# select round('2.00'::numeric / '3.00'::numeric, 2)::fixeddecimal;
test=# select round('18.00'::numeric / '59.00'::numeric, 2)::fixeddecimal;
?column?
----------
0.67
0.31
(1 row)
```

It should also be noted that excess precision is ignored by fixeddecimal.
With a FIXEDDECIMAL_PRECISION of 2, any value after the 2nd digit following
the decimal point is completely ignored rather than rounded. The following
example demonstrates this:

```
test=# select '1.239'::fixeddecimal;
fixeddecimal
--------------
1.23
(1 row)
```

It is especially important to remember that this truncation also occurs during
arithmetic. Notice in the following example the result is 1120 rather than
1129:

```
test=# select '1000'::fixeddecimal * '1.129'::fixeddecimal;
?column?
----------
1120.00
(1 row)
```
FIXEDDECIMAL uses an additional set of decimal digits to perform unbiased
rounding. This set only exists when using 128bit for multiplication and
division. When this additional set begins with 5, the remaining are 0's, the
remainder of the division is checked for any non-zero value. This check may not
be accurate 100% of the time. It is used to remove bias like a IEEE 754 data
type, but until the math is proven, it cannot offer the same guarantee.

XXX Operations that can produce a fraction that overflows the scale and causes a
problem for this logic should be removed, or at least operate with and
return the numeric data type instead of FIXEDDECIMAL. Based on the history
of other data types, there is a good argument for making FIXEDDECIMAL /
FIXEDDECIMAL return numeric instead of FIXEDDECIMAL. For example, money
divided by money does not produce a money result. It is a ratio, which means
that it looses the money unit. However, there is a case to be made that
because this might not be a unit type, but rather might be a performance
type, it may not be subject to unit rules. Unfortunately, this
interpretation may be application/user specific and thus not have a single
correct answer.

Installation
------------
Expand Down
4 changes: 2 additions & 2 deletions fixeddecimal--1.0.0_base.sql
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ AS 'fixeddecimal', 'int4fixeddecimalmul'
LANGUAGE C IMMUTABLE STRICT;

CREATE FUNCTION int4fixeddecimaldiv(INT4, FIXEDDECIMAL)
RETURNS DOUBLE PRECISION
RETURNS FIXEDDECIMAL
AS 'fixeddecimal', 'int4fixeddecimaldiv'
LANGUAGE C IMMUTABLE STRICT;

Expand Down Expand Up @@ -405,7 +405,7 @@ AS 'fixeddecimal', 'int2fixeddecimalmul'
LANGUAGE C IMMUTABLE STRICT;

CREATE FUNCTION int2fixeddecimaldiv(INT2, FIXEDDECIMAL)
RETURNS DOUBLE PRECISION
RETURNS FIXEDDECIMAL
AS 'fixeddecimal', 'int2fixeddecimaldiv'
LANGUAGE C IMMUTABLE STRICT;

Expand Down
Loading