-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- adds decimal format #3190
- adds decimal format #3190
Conversation
I guess encoding floating point as strings allows retaining leading/trailing zeroes, but if this is for code generation then how does one know which type to generate? Is that made clear in the ISO standard? I can't read ISO standards because I don't have the budget to buy access as a freelancer. Is there a public specification that can be referenced? |
I'll defer the question to @ioggstream who made the suggestion originally as to which standard to use. maybe @ralfhandl can also help considering his comment I was merely trying to capture the different discussions in a single PR, and later on splat some of the formats to separate PRs to facilitate the main one's merge as you know. As you mentioned the main value of this format over float is to retain the trailing zeros which "maintains the level of precision". Something anybody working with money, scientific data (measurements), etc... will appreciate. It will come at a cost in terms of serialization format. (i.e. using string as opposed to numbers, and having to parse things "twice") |
The draft link @ioggstream supplied as a free alternative no longer works. I do notice that he requested fixed point decimal and not floating point, so I feel like floating point is a.) unnecessary scope creep, and b.) an undesirable overload of a single format. If people want floating point in strings, add a |
2464efc
to
02cc280
Compare
Also, how is anyone expected to tell if the value is intended to be fixed vs floating? type: string
format: decimal applied to instance:
Is this fixed point or floating point, and why? |
@handrews Some background: SQL has two types / type families for numbers that are stored with a decimal mantissa, which is quite useful for financial data:
These numbers can natively be represented in many programming languages, with the notable exception of JavaScript (as of today, one can hope that BigDecimal will arrive some time after BigInt). They also can be represented as JSON numbers (unless you want to JSON.parse() it with JavaScript), but to be on the safe side it is more robust to allow wrapping them in JSON strings. Now we need to express that a JSON string will contain such a decimal number: type: string
format: decimal
x-sap-precision: 23
x-sap-scale: 2 and type: string
format: decimal
x-sap-precision: 34
# no scale We are currently using SAP-specific extensions for OpenAPI 3.0 and will contact you for how to define a corresponding JSON Schema vocabulary for OpenAPI 3.1. |
@ralfhandl that's helpful but still does not answer the most important question: Why overload If you insist on overloading it, again, without relying on someone else's extension keyword that may or may not be present, how do I tell, given schema And for that matter, if we're using This proposal for A format should solve one clearly defined problem, unambiguously. If you want to solve multiple propblems, you need multiple formats or a vocabulary. You should not mix them. |
@ralfhandl I should qualify this assertion:
Realistically, it might be easier to find an implementation with extensible However:
So, if the point of a format is to indicate the data type into which the instance value should be parsed, whether that is through code generation, or through reading the Based on that wikipedia article, it's not possible from just that information to determine whether DECIMAL or DECFLOAT is intended. The format needs to indicate which is intended, or it's useless – different applications will behave differently given the same input. |
The interesting aspect here is that languages don't seem to agree with one another:
|
@baywet yeah, wow that's a mess, isn't it? All the more reason that the format specification needs to say exactly what is meant, so that implementers can make the appropriate effort to produce compliant behavior, whatever that means. We don't want the behavior to be "whatever the underlying language uses", because then the schema will behave inconsistently. There are several problems to solve here: Number in a string or mathematical guarantees?If all we are doing is indicating that it's safe to parse the string as a base-10 number of some sort, then we should just have a Fixed vs floating?If we want to make mathematical guarantees, fixed-point and floating-point are radically different, and I cannot conceive of a use case where I would both care about the mathematical properties and not care whether those were fixed-point or floating-point. Given the variations in what "decimal" means, I think we need a different name or names that are less ambiguous (see next section for ideas). Each name needs to correspond to clear behavior. If someone really wants a "fixed or floating, I don't care", that's what This is a hill I wil die on. There is no conceivable way in which I will voluntarily approve a format that spans both fixed and floating point. I'm not part of @OAI/tsc so folks can overrule me if they want, and I'll take it in good grace, but nothing anyone has said here has given me any reason to change my mind in the slightest. Does the format constrain the range and precision, or is that application-interpreted?If I have a schema of: type: array
items:
type: string
format: fixed-point and an instance of: - 1.22
- 111.2 What is happening? We have a number with a precision of 3 and a scale of 2, and one with a precision of 4 and a scale of 1. Neither of which likely matches the actual storage precision and scale of any data type. This, btw, is why no one has succeeded in defining a fixed-point format despite it having been requested repeatedly for years. I've yet to see anyone propose a workable, interoperable approach grounded in real use cases. For floating point, there are interchange formats For fixed point, the tricky thing is that while precision is probably dictated by storage format, scale can be anything allowed within that precision. Enumerating different scales in the format name is impractical (this is one of many reasons that vocabularies are a better solution).
|
Interestingly enough the IEE-754 seems to define decimals as floating points as well.
Also most languages seem to leverage 128 bits behind the scenes if that helps. |
Yes, I was proposing that if we want floating-point-as-strings that But the original ask was fixed-point-as-strings. As far as I'm concerned, all of this floating point stuff is a distraction from the original ask and should be a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I think it is clear that this concept is not ready. I'm going to ask that this PR be closed and two issues be filed: one for fixed-point numbers as strings, and one for floating-point numbers as strings.
Questions that need to be resolved (separately in each of the two issues) include
- does this concept need a
format
? - do there need to be multiple formats based on standardized storage sizes?
- how does the application determine the precision and scale to use?
- does the formatting (leading/trailing zeros) of the instance affect the precision and scale used, and if so, how?
- what should the format(s) be called?
These questions need to be answered by people who will be using these formats (a group of people that does not include me). A PR that conflates two major use cases and has a complex conversation already is not the place to debate these questions. We should be using issues to hammer out what is really needed, and then move to PRs when the only work remaining is how to word the pages.
I'm marking this review as "request changes" but I'd prefer to mark it "reject in favor of further work in issues." @MikeRalphson @darrelmiller @ralfhandl what do you think?
@handrews my goal here was merely to capture discussions that already happened here #889 #845 and keep the registry with "what's in the wild" up to date. That being said it seems that most descriptions using the decimal format today are actually azure related. I suspect that's a reflection of the dotnet decimal type for the most part, which probably is itself tied to some kind of data storage ultimately. I do understand the harm that putting something in the registry that's not "specific enough" could make. We've just spent a couple of weeks cleaning that up... Could we define this format as equivalent to IEE-754 decimal128, floating or fixed, since more client languages seem to lean towards that? And if people need more precision let them contribute to JSON schema so additional attributes about the size/precision can be added to the schema? |
@baywet I remain vehemently against anything that combines fixed point and floating point, as: This is at least the third time over the years that somoene has tried to add a format of this sort, and the result is the same every time: I raise a bunch of concerns pointing out that it's not clear how to implement the proposal, and the person making the proposal refuses to address them and keeps trying to get me to approve something that i think is too vague to implement. And what's lacking here, as in all past attempts, is someone who actually is using this format themselves who can answer the hard questions in way that is grounded in real usage and results in a definition of the format that anyone can implement without additional keywords. I don't think it's a good idea to put a format in the registry just because it exists in the wild - the registry is for making people aware of formats that can be used in a reasonably interoperable way. That is not what this format is doing. I still don't even understand why you added floating-point in the first place, and am not at all convinced that it's relevant to the original ask or the existing usage in the wild. I don't have anything else to say on this proposal, so I am handing this off to @OAI/tsc . I'll go with whatever they decide, but I think my position is clear. |
@handrews I appreciate your effort to drive precision work here, thanks a lot with your help on this (and other) pull requests. I think the (potential) disconnect here might come from multiple factors:
Would it help if we got a formal direction to follow from TSC on that last point? And documented that on the index page of the format registry? We can always come back to the nitty gritty details of this specific format later on. |
@handrews I'm fine with splitting into fixed and floating-point decimals. I like the idea of using |
@handrews Fixed-point decimals DECIMAL(precision,scale) can only be fully described by specifying both precision and scale in addition to the bit "this is a fixed-point decimal". We could define a family of formats A validator can extract precision and scale from the format name and use them for validating a numeric value. This proposal doesn't need additional keywords because all information is encoded in the format names. I'm open for cosmetic changes on how to encode precision and scale in the format name. |
@baywet my frustration (and I apologize for letting that show a bit too much) doesn't have much to do with you at all. While I do feel like there has been a bit of repetition without much progress, that is equally true of my own comments. Most of my frustration has to do with the fact that I don't think The only specifically frustrating thing here is that, unless I've missed it, you have never directly addressed my concern with mixing fixed and floating point. @ralfhandl has said he is OK with splitting them, so I hope you are now on board with that. If you are not, I have tried to show how you can make a case. For example, if you think this is what is happening in the wild, you need to show that in detail, not just tell me that that's what you are doing and expect me to accept it – I've never see a fixed point / floating point mix so I don't understand your argument. If the usage is inconsistent then I don't see a point in registering it. My last comment wasn't intended as a storming off in anger, but as a recognition that we've taken this as far as I can take it and still contribute productively. I would like to hand this off to someone before I start showing more frustration, becasue I do not have the authority to reject or fully approve this, and I don't personally care about fixed point and floating point, nor do I have the necessary expertise to fully evaluate proposals. I feel like I became the person responding because I was responding about the encoding formats, but those are very different beasts. I am begging someone else to take this over for me as I have already spent way more time researching this topic than I intended. |
And if whoever takes it over wants to just approve it as-is, I am truly 100% fine with that. I am just not comfortable with approving it myself, and the push to accept something is increasing my discomfort/frustration. So I am trying to get out of the way. |
Thanks @handrews for all your input, and sorry to have left you to it as it were. My only concerns with registering a new format are:
My feeling is we could add decimal as a string representing a fixed point number and get some benefit, or drop the proposal until we have more consensus on the original issue. In one corpus, decimal is the 16th most used format (after many of the 'built-in' ones). Which is both an argument for adopting it, and not blindly assuming what people's intent was when using it. I can try and pull up sibling example properties of format: decimal from the data set if that would help? @baywet @ralfhandl |
Eh, I tend to put these things on myself and then get frustrated about it. I'm trying to work on not doing that so much. But thank you for both the sentiment and stepping in! 🙂 |
@handrews no worries, in case I didn't make this clear: I don't "care" about floating & fixed point vs fixed point only. I'm just trying to facilitate the work and get those formats published. While doing so, the only reason I included floating point in the description was because of an earlier comment from @ralfhandl in the previous PR. But I'm happy to remove it. I think at this point we've all reached a consensus that decimal, which is already in use, most likely represents a fixed point decimal number. And because we don't want to introduce new semantics/keywords at this point, the information of "how it is structured" should come from a convention that relies on a broadly adopted standard (kind of what we did with the different date formats). Is that a fair representation of where things are at? Assuming that's where we are all at, we'd need to remove the floating point part from that format description. And my suggestion is to align on a 128 bits wide type, as it seems to be what most languages are using today. Now, I don't know enough about that space to properly map it to a standardized type information, and that's where I need help. Once I get that information, I'll update the PR to reflect the above. Does that work for everyone? |
@baywet That means you would add a file ---
owner: baywet
issue: 889
description: A 128-bit floating point decimal number as defined by IEEE 754-2008
base_type: string
layout: default
---
# <a href="..">{{ page.collection }}</a>
## {{ page.slug }} - {{ page.description }}
Base type: `{{ page.base_type }}`.
The `{{page.slug}}` format represents a [128-bit floating point decimal number](https://en.wikipedia.org/wiki/Decimal128_floating-point_format) as defined by IEEE 754-2008.
{% if page.issue %}
### GitHub Issue
* [#{{ page.issue }}](https://github.com/OAI/OpenAPI-Specification/issues/{{ page.issue }})
{% endif %}
{% if page.remarks %}
### Remarks
{{ page.remarks }}
{% endif %} Or should I rather open a separate PR for |
@ralfhandl my suggestion he is to add a "decimal" format that maps to fixed 128 to it captures the existing. |
The 128-bit wide IEEE EEE 754 decimal128 representation is for floating-point decimals, so it unfortunately is of no use if What we can do is define a new What we could in addition do is define a new "template format" |
What would you map the existing usage of "decimal" (keep the exact same name) to with this suggestion? |
The current "decimal" format doesn't map to anything specific, and API designers are advised to use one of the concrete formats
instead if they want to map to a specific decimal type. We only know that "decimal" is widely used, and we don't know its API-specific interpretation. |
Notes from today's meeting:
TODO (for me)
|
Co-authored-by: Mike Ralphson <[email protected]>
Signed-off-by: Vincent Biret <[email protected]>
02cc280
to
3d2cbf9
Compare
@handrews @ralfhandl Thanks again for the support through this one. I updated it to reflect our recent discussions. |
Signed-off-by: Vincent Biret <[email protected]>
@baywet I'm less concerned about warning about loss of precision (loss compared to what? It does not guarantee precision) than I am about warning of inconsistent usage and implementation. Registering a format sets the expectation that it will behave according to the registry. This is a reasonable expectation for all other registered formats, which all meet at least one of the following criteria:
The most complex is HTML, because of the nature of that spec, but that is a problem with parsing HTML in general (and is addressed in the spec, at least in theory). It needs to be very clear from this registry that using |
"guarantee" might the wrong word b/c the registry doesn't guarantee things, but I hope you get what I mean. If not, please ask, as I am happy to keep discussing this one point. |
Signed-off-by: Vincent Biret <[email protected]>
@ralfhandl @handrews added changes from the comments. I'm almost tempted to mark it as deprecated now, or should we wait for the PR for decimal128 (or equivalent) to do that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baywet thanks for working through everything on this- I think we've reached the finish line! I appreciate your perseverance.
Let's not mark it deprecated now, as the way in which decimal
is discouraged (lack of interoperability) is very different from the way that binary
and byte
are discouraged (a better alternative outside of format
entirely) are very different. If we need to have a discussion about whether this is also a "deprecation" let's have that in a separate issue or PR so this one can get merged now.
Thanks everyone for pulling through this one and making sure we documented the right thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you all!
related #3167
related #889