Background for precision loss due to change to scientific notation #798
Replies: 17 comments 21 replies
-
I did the conversion and I kept every digit of precision that was in any conversion multiplier. This was about counting zeros and nothing else.Jack Hodges, Ph.D.Arbor StudiosOn Oct 30, 2023, at 3:13 AM, Florian Kleedorfer ***@***.***> wrote:
Hi all,
In the latest release notes, @steveraysteveray writes:
We decided to return to using scientific notation, but only for very large (>10^5) and very small (<10^-5) values. While this issue has come up before, we currently believe the ability to express numbers without many zeroes outweighs the small errors (typically in the 4th decimal place or smaller) introduced in calculations. Users should be aware, of course, that critical applications should always look to authoritative sources for numbers such as conversion factors and constant values, such as ISO or NIST.
I am trying to understand this move and what it means for our use case, in which the 4th decimal place may be very relevant at times. I have reviewed the changes made to the units file in commit d82ee06,
two questions:
I don't see why there should be a loss of precision. All numbers that I checked visually do seem correct and retain the same level of precision, at least in the units file.
However, if there is some systematic loss of precision introduced by this change, I don't understand the reasoning. Should we not strive for the highest possible level of correctness? I understand that there may always be bugs and so users have to be cautious, but systematic errors would be an unexpected design choice. Is it a trade-off between systematic errors and manual entry errors?
Cheers!
Florian
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@fkleedorfer, I do not profess to be an expert here, but the tradeoffs seem to be detailed here and here. |
Beta Was this translation helpful? Give feedback.
-
Thanks for explaining. I'll try to sum up what I have understood so far:
If that is correct then I must admit I wish the choice had been the opposite - decimal notation. The way it is, QUDT is saying: here are the numbers - please don't use them. In order to figure out what that means for our library downstream of QUDT I made a quick junit test:
Which convinces me that QUDTLib does not need to undergo any changes to acommodate the switch to scientific notation. This immunity is owed to the robust implementation of BigDecimal. Other projects on other platforms might be in a different situation. However, the stability problems will sooner or later bite anyone who follows section 4 or 5 of the How-To, i.e, who uses the multipliers/offsets/constants directly in SPARQL. Having said that, I am possibly unaware of the advantages of using scientific notation that were factored into the decision, and being made aware of them might change my opinion. As I see it now, I'd recommend walking back on that decision. |
Beta Was this translation helpful? Give feedback.
-
AFAICT the reason to use scientific notation is only to avoid potential errors counting large numbers of consecutive zeros - either before or after the significant figures. i.e. this is a crutch to help people manually editing. Which should not happen often. To me that seems the wrong optimisation. Integrity in the values is more important. So for the reasons explained by @fkleedorfer |
Beta Was this translation helpful? Give feedback.
-
@dr-shorthair, ...except for the very small and very large numbers that require xsd:double, right? |
Beta Was this translation helpful? Give feedback.
-
As long as it ‘never’ needs to be edited manually, by anyone (including casual submissions), then having decimal values is fine. Personally, I would like to be able to see what the value is every once in a while, and counting zeros, on either side of the decimal point is, simply, ridiculous. If we had a SPARQL insert that could convert to any numeric value, that we could execute when a PR is submitted, that might be the way to go.Jack Hodges, Ph.D.Arbor StudiosOn Nov 1, 2023, at 3:49 PM, Simon Cox ***@***.***> wrote:
AFAICT the reason to use scientific notation is only to avoid potential errors counting large numbers of consecutive zeros - either before or after the significant figures. i.e. this is a crutch to help people manually editing. Which should not happen often.
To me that seems the wrong optimisation. Integrity in the values is more important. So for the reasons explained by @fkleedorfer xsd:decimal is a better solution.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
There should be no exceptions. All decimal if we are going this route, with the provisos I mentioned in my earlier response.As I mentioned today, it is almost worth having both a decimal value and a double value.Jack Hodges, Ph.D.Arbor StudiosOn Nov 1, 2023, at 4:08 PM, steveraysteveray ***@***.***> wrote:
@dr-shorthair, ...except for the very small and very large numbers that require xsd:double, right?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Certainly, but isn't that a presentation issue? Is it worth sacrificing precision and numeric stability? |
Beta Was this translation helpful? Give feedback.
-
As a part of this problem set is about ensuring the correctness of community submissions, how about requiring 2 unit tests for each numeric value that involves other units/constants, and include those in the upcoming github CI pipeline? By linking to 2 other (existing) units/constants we would implicitly get tests for them, and in the aggregate, a network of tests covering the whole database (assuming we'd also do this for existing entities). This would ensure correctness beyond a reviewer not making mistakes counting zeros. For example, the pipeline could have a query that calculates conversions between units and a csv file with inputs and expected outputs. For each unit, there could be two entries in the csv file, such as:
|
Beta Was this translation helpful? Give feedback.
-
It is unacceptable to compromise precision but it seems that the stability problem is not in the model but in the system that interprets it. The question I ask, given the stability example provided, is how many significant digits is anyone responsible for. Even saying this, it is more a rhetorical question because precision should not be compromised.Jack Hodges, Ph.D.Arbor StudiosOn Nov 2, 2023, at 1:50 AM, Florian Kleedorfer ***@***.***> wrote:
Personally, I would like to be able to see what the value is every once in a while, and counting zeros, on either side of the decimal point is, simply, ridiculous.
Certainly, but isn't that a presentation issue? Is it worth sacrificing precision and numeric stability?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Bing (i.e. ChatGPT) does conversions from decimal to scientific notation.
|
Beta Was this translation helpful? Give feedback.
-
It is always worth asking ChatGPT things but you (we all) always have to check. We should ask for a conversion from xsd:double to xsd:decimal since that is what we currently have. I tried a casting xsd:decimal to xsd:double in SPARQL but it didn’t work.Jack Hodges, Ph.D.Arbor StudiosOn Nov 28, 2023, at 1:31 PM, Simon Cox ***@***.***> wrote:
I just checked and the answer from Bing is wrong, so I just disproved the point I thought I was making
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
We really need to write the function in SPIN or SHACL so that we can apply it to the entire graph anyway.Jack Hodges, Ph.D.Arbor StudiosOn Nov 28, 2023, at 1:26 PM, Simon Cox ***@***.***> wrote:
Bing (i.e. ChatGPT) does conversions from decimal to scientific notation.
This makes it easy to count zeroes when checking data manually.
Question: What is 0.0000000000000000000000000000000000000000000000000000346789 in scientific notation
Answer:
[The number 0.0000000000000000000000000000000000000000000000000000346789 in scientific notation is 3.46789 x 10^-53 ](https://www.calculatorsoup.com/calculators/math/scientific-notation-converter.php)[1](https://www.calculatorsoup.com/calculators/math/scientific-notation-converter.php).
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Maybe the answer is another optional property for the value in scientific notation? The property to be called 'qudt:valueInScientificNotation'? This could be used if preferred by the specific software executables. |
Beta Was this translation helpful? Give feedback.
-
After discussion in the Board, our intention is to create variants of the following relations:
...namely:
These new relations will be used to identify the scientific notation version of each of the respective values. So, each Unit instance will have qudt:conversionMultiplier and qudt:conversionMultiplierSN. The former will be expressed as a decimal number (xsd:decimal) and the latter in scientific notation that is commonly interpreted as an xsd:double. Applications can choose whichever value they like for computation, display, etc. ConstantValue instances will be handled similarly. This change will likely take place in March 2024, time permitting. |
Beta Was this translation helpful? Give feedback.
-
Good point. We only use it for Celsius and Fahrenheit currently, but we should treat it the same way. |
Beta Was this translation helpful? Give feedback.
-
Closes with #870, which concludes the implementations of the decisions made in this thread |
Beta Was this translation helpful? Give feedback.
-
Hi all,
In the latest release notes, @steveraysteveray writes:
I am trying to understand this move and what it means for our use case, in which the 4th decimal place may be very relevant at times. I have reviewed the changes made to the units file in commit d82ee0684858d6035baefd5f176d27135ce4b7a7,
two questions:
I don't see why there should be a loss of precision. All numbers that I checked visually do seem correct and retain the same level of precision, at least in the units file.
However, if there is some systematic loss of precision introduced by this change, I don't understand the reasoning. Should we not strive for the highest possible level of correctness? I understand that there may always be bugs and so users have to be cautious, but systematic errors would be an unexpected design choice. Is it a trade-off between systematic errors and manual entry errors?
Cheers!
Florian
Beta Was this translation helpful? Give feedback.
All reactions