-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow provider-specific data types #529
base: develop
Are you sure you want to change the base?
Conversation
Co-authored-by: Antanas Vaitkus <[email protected]>
…a types off from basic data types.
Co-authored-by: Antanas Vaitkus <[email protected]>
…with "namespace-specific".
I have a general question: how to understand, if a prefix denotes a provider-specific datatype, not a general-purpose common datatype. In other words, compare |
apparently, by underscore as the first symbol, correct? |
Co-authored-by: Antanas Vaitkus <[email protected]>
Provider-specific data type names are only going to appear in |
Co-authored-by: Antanas Vaitkus <[email protected]>
@merkys should we merge? |
Ok, I'll take a deeper look at this over the next couple of days. |
Likewise, as I want to use this pretty much straight away! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up @merkys!
I only have one general comment; should we encourage the custom type to be described in the description of the fields that use it? Otherwise I don't see where this information would go (even if its only human-readable anyway). I've added two suggestions below along these lines.
@@ -2158,7 +2167,7 @@ A Property Definition MUST be composed according to the combination of the requi | |||
|
|||
- :field:`x-optimade-type`: String. | |||
Specifies the OPTIMADE data type for this level of the defined property. | |||
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, or :val:`"dictionary"`. | |||
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. | |
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. | |
When a namespace-specific data type is used, the human-readable property description should be used to describe its usage and format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in the end we will want this info to go into our "standards" definitions (see https://github.com/Materials-Consortia/OPTIMADE/tree/develop/schemas#standards-definitions ). For the OPTIMADE one see here: https://schemas.optimade.org/defs/v1.2/standards/optimade.html
If forced to handle this today I would put that info into the description field there. Something along the lines of:
Description: The Magnetic materials standard node types and properties.
Apart from the supplied property definitions, the Magnetic materials standard defines a namespace-specific data type for the magnetic susceptibility tensor "_mag_sustensor" that allows comparison using < and > to express filters of the magnetic susceptibility magnitude. In more detail, the following operators are valid:
...
BUT, writing this up; even though the magnetic susceptibility tensor example is quite bogus, I realize we may be overly limiting deciding all custom datatypes must be represented as strings? Why not allow it to be expressed as any basic or container datatype?
But, back to @ml-evs comment: to allow not repeating the definition a lot, maybe this is better?:
When a namespace-specific data type is used, the human-readable property description should be used to describe its usage and format or to give a reference to where this information is available.
(That reference can then be to the "standards definition" if it is included there).
@@ -2158,7 +2167,7 @@ A Property Definition MUST be composed according to the combination of the requi | |||
|
|||
- :field:`x-optimade-type`: String. | |||
Specifies the OPTIMADE data type for this level of the defined property. | |||
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, or :val:`"dictionary"`. | |||
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. | |
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix. | |
When a namespace-specific data type is used, the property definition MUST list the top-level `type` as `'string'` (or `['string', 'null']` for optional fields). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work with this!
The one blocking issue I see is that the PR misses a corresponding edit under "Property Definition keys from JSON Schema" -> "type", where in the current edition of the PR (see below) should say that when x-optimade-type
is set to a string that indicates a provider-specific datatype, "type" MUST be "string". (I see @ml-evs is "fixing" this on line 2171, but whether or not that is added, the change should also be done on in that part of the specification).
(I couldn't add this as an "edit" comment, it is too far away...)
I would be ok with merging this for string custom datatypes only. However, I think it is worthwhile to pause for one minute here to consider what I wrote in a reply to one of @ml-evs suggestions: can we lightly edit the PR to allow custom data types to be represented using any of the basic, list or dict types?
I foresee that basically the first thing that will happen once we implement this is someone coming along and asking "oh, but in our community we have this data thing that is always represented as an NxM matrix, and for this data the whole community obviously want the ">" operator to mean Z." ...
Thanks for reviews, @ml-evs and @rartino! Data type description in both human-readable text and Property Definitions is noncontroversial, thus I will address it later on. Now on the extension beyond the string-based data types. Historically, I drafted this PR having SMILES in mind, thus strings were enough for me. Moreover, we already have strings with internal semantics (timestamps). Now I do not see why we cannot extend this to all basic+list+dict data types, but yes, this needs some more thought. Some issues coming to my mind as I am writing this:
|
This PR attempts to generalize #436 by allowing to define provider-specific data types. The following conditions for such data types apply:
The latter condition might be too strict for some uses, but since casting is unclear at the moment, let us leave such extensions for future.
Merging this PR will allow to reassign #392 and #436 (SMILES) to
_cheminfo_
namespace where it can become a property with special set of rules for comparison.