2ndQuadrant · pgstuff · Feb 11, 2016
diff --git a/Makefile b/Makefile
@@ -18,7 +18,7 @@ TESTS = $(wildcard test/sql/*.sql)
 REGRESS_BRIN := $(shell pg_config --version | grep -qE "XL 9\.[5-9]| 10\.0" && echo brin-xl)
 REGRESS_BRIN += $(shell pg_config --version | grep -E "9\.[5-9]| 10\.0" | grep -qEv "XL" && echo brin)
 REGRESS_VERSION_SPECIFIC := $(shell pg_config --version | grep -qE "XL" && echo index-xl || echo index)
-REGRESS = $(shell echo aggregate cast comparison overflow $(REGRESS_BRIN) $(REGRESS_VERSION_SPECIFIC))
+REGRESS = $(shell echo aggregate cast comparison overflow round $(REGRESS_BRIN) $(REGRESS_VERSION_SPECIFIC))
 
 REGRESS_OPTS = --inputdir=test --outputdir=test --load-extension=fixeddecimal
 

diff --git a/README.md b/README.md
@@ -6,6 +6,51 @@ Works with PostgreSQL 9.5 and Postgres-XL 9.5
 Overview
 --------
 
+XXX Remove this paragraph if accepted upstream. This feature fork adds half even
+    rounding. The rounding works accurately when parsing a string and casting
+    from the numeric data type. During division, extra digits are used to detect
+    if the result might be on a midpoint. If so, a modulus is used to indicate
+    midpoint rounding if there are only 0's past the midpoint (5). I am not sure
+    if the modulus is able to accurately indicate that there is a remainder if
+    the remainder is a string of 0's past the midpoint, and past the scale of
+    this calculation, changes to non-0 digits. This is where base 2 can cause
+    problems for someone expecting base 10 results.
+    Regardless, this feature significantly removes the bias present in
+    truncation and has less bias than the more biased rounding types. This may
+    be good enough for statistics, but financial calculations may want to type
+    cast to numeric before performing division or multiplication until this code
+    is proven to match decimal (base 10) calculations with the standard half
+    even rounding. These changes should not impact the performance advantage of
+    this data type outside of the operations described above.
+    It is my opinion that an exact data type should not offer inexact operations
+    without requiring something explicit like a type cast per the principle of
+    least surprise. I expect PostgreSQL not to violate that principle. I expect
+    that truncation was originally chosen because it may produce consistent
+    results between base 10 and base 2 operations, but I cannot think of many
+    use cases where truncation would be preferable over a low bias rounding
+    method, even with the base 10 consistency that it offers.
+    Note that exact means the ability to represent base 10 and its rounding
+    rules when fraction data underflows. Base 2 is not inexact in itself, it is
+    just that there are fractions in base 10 that can be exactly represented as
+    a number while base 2 cannot exactly represent the same number. But this
+    rule is true the other way around. Consider this, where float produces an
+    exact answer and numeric does not:
+SELECT (1 * (987654321.0 * 123456789.0) / (0.123456789 / 998877665544332211.0)) / (987654321.0 * 123456789.0) * (0.123456789 / 998877665544332211.0) AS "Should be 1";
+SELECT (1::FLOAT * (987654321.0 * 123456789.0) / (0.123456789 / 998877665544332211.0)) / (987654321.0 * 123456789.0) * (0.123456789 / 998877665544332211.0) AS "Should be 1";
+    The expectation that base 2 is less exact is probably due to the fact that
+    we display base 2 numbers in decimal notation. If numbers were commonly
+    displayed in binary notation, we would call float exact and decimal inexact.
+    Likewise, decimal is inexact for dozenal.
+
+XXX Also, fix numeric and the round function so that it uses unbiased rounding
+    (someone might be working on this). For example, check the results of these
+    before and after this patch:
+SELECT (54::fixeddecimal / 0.03::fixeddecimal) / 54::fixeddecimal * 0.03::fixeddecimal AS "Should be 1";
+SELECT (54::numeric(8,4) / 0.03::numeric(8,4)) / 54::numeric(8,4) * 0.03::numeric(8,4) AS "Should be 1";
+
+XXX Fix capitalization inconsistencies: FixedDecimal, Fixeddecimal,
+    fixeddecimal, FIXEDDECIMAL.
+
 FixedDecimal is a fixed precision decimal type which provides a subset of the
 features of PostgreSQL's builtin NUMERIC type, but with vastly increased
 performance. Fixeddecimal is targeted to cases where performance and disk space
@@ -31,7 +76,11 @@ from NUMERIC.
 	although the underlying type is unable to represent the full range of
 	of the 17th significant digit.
 
-2.	FIXEDDECIMAL always rounds towards zero.
+2.	FIXEDDECIMAL uses base 2 instead of base 10 for operations. It is exact
+	until you multiply with a number that exceeds the scale or divide. Then,
+	numbers past the scale are subject to base 2 representation and may round
+	differently than a base 10 operation would. See the Caution section for
+	details.
 
 3.	FIXEDDECIMAL does not support NaN.
 
@@ -71,57 +120,49 @@ FIXEDDECIMAL_MULTIPLIER should be set to 10000. Doing this will mean that the
 absolute limits of the type decrease to a range of -922337203685477.5808 to
 922337203685477.5807.
 
+The rounding is half even to reduce bias and to match the rounding expectations
+set by various accounting and computer standards.
+
 Caution
 -------
 
 FIXEDDECIMAL is mainly intended as a fast and efficient data type which will
 suit a limited set numerical data storage and retrieval needs. Complex
-arithmetic could be said to be one of fixeddecimal's limits. As stated above
-division always rounds towards zero. Please observe the following example:
-
-```
-test=# select '2.00'::fixeddecimal / '3.00'::fixeddecimal;
- ?column?
-----------
- 0.66
-(1 row)
-```
+arithmetic could be said to be one of FIXEDDECIMAL's limits. As stated above
+FIXEDDECIMAL uses base 2 for operations. This means that when using division or
+multiplication with a non-zero fraction, FIXEDDECIMAL may produce results that
+are inconsistent with the same operations as performed in base 10.
 
-A workaround of this would be to perform all calculations in NUMERIC, and
-ROUND() the result into the maximum scale of FIXEDDECIMAL:
+A workaround of this would be to perform calculations that are not exclusively
+addition and subtraction in NUMERIC, and ROUND() the result into the maximum
+scale of FIXEDDECIMAL:
 
 ```
-test=# select round('2.00'::numeric / '3.00'::numeric, 2)::fixeddecimal;
+test=# select round('18.00'::numeric / '59.00'::numeric, 2)::fixeddecimal;
  ?column?
 ----------
- 0.67
+ 0.31
 (1 row)
 ```
 
-It should also be noted that excess precision is ignored by fixeddecimal.
-With a FIXEDDECIMAL_PRECISION of 2, any value after the 2nd digit following
-the decimal point is completely ignored rather than rounded. The following
-example demonstrates this:
-
-```
-test=# select '1.239'::fixeddecimal;
- fixeddecimal
---------------
- 1.23
-(1 row)
-```
-
-It is especially important to remember that this truncation also occurs during
-arithmetic. Notice in the following example the result is 1120 rather than
-1129:
-
-```
-test=# select '1000'::fixeddecimal * '1.129'::fixeddecimal;
- ?column?
-----------
- 1120.00
-(1 row)
-```
+FIXEDDECIMAL uses an additional set of decimal digits to perform unbiased
+rounding. This set only exists when using 128bit for multiplication and
+division. When this additional set begins with 5, the remaining are 0's, the
+remainder of the division is checked for any non-zero value. This check may not
+be accurate 100% of the time. It is used to remove bias like a IEEE 754 data
+type, but until the math is proven, it cannot offer the same guarantee.
+
+XXX Operations that can produce a fraction that overflows the scale and causes a
+    problem for this logic should be removed, or at least operate with and
+    return the numeric data type instead of FIXEDDECIMAL. Based on the history
+    of other data types, there is a good argument for making FIXEDDECIMAL /
+    FIXEDDECIMAL return numeric instead of FIXEDDECIMAL. For example, money
+    divided by money does not produce a money result. It is a ratio, which means
+    that it looses the money unit. However, there is a case to be made that
+    because this might not be a unit type, but rather might be a performance
+    type, it may not be subject to unit rules. Unfortunately, this
+    interpretation may be application/user specific and thus not have a single
+    correct answer.
 
 Installation
 ------------

diff --git a/fixeddecimal--1.0.0_base.sql b/fixeddecimal--1.0.0_base.sql
@@ -309,7 +309,7 @@ AS 'fixeddecimal', 'int4fixeddecimalmul'
 LANGUAGE C IMMUTABLE STRICT;
 
 CREATE FUNCTION int4fixeddecimaldiv(INT4, FIXEDDECIMAL)
-RETURNS DOUBLE PRECISION
+RETURNS FIXEDDECIMAL
 AS 'fixeddecimal', 'int4fixeddecimaldiv'
 LANGUAGE C IMMUTABLE STRICT;
 
@@ -405,7 +405,7 @@ AS 'fixeddecimal', 'int2fixeddecimalmul'
 LANGUAGE C IMMUTABLE STRICT;
 
 CREATE FUNCTION int2fixeddecimaldiv(INT2, FIXEDDECIMAL)
-RETURNS DOUBLE PRECISION
+RETURNS FIXEDDECIMAL
 AS 'fixeddecimal', 'int2fixeddecimaldiv'
 LANGUAGE C IMMUTABLE STRICT;