Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow provider-specific data types #529

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open
13 changes: 11 additions & 2 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ representation in all contexts. They are as follows:
A list can be empty, i.e., contain no items.
- **dictionary**: an associative array of **keys** and **values**, where **keys** are pre-determined strings, i.e., for the same entry property, the **keys** remain the same among different entries whereas the **values** change.
The **values** of a dictionary can be any basic type, list, dictionary, or unknown.
- Namespace-specific data type (described in `Type handling and conversions in comparisons`_).

An entry property value that is not present in the database is **unknown**.
This is equivalently expressed by the statement that the value of that entry property is :val:`null`.
Expand Down Expand Up @@ -627,6 +628,7 @@ In the JSON response format, property types translate as follows:
- **timestamp** uses a string representation of date and time as defined in `RFC 3339 Internet Date/Time Format <https://tools.ietf.org/html/rfc3339#section-5.6>`__.
- **dictionary** is represented by the JSON object type.
- **unknown** properties are represented by either omitting the property or by a JSON :field-val:`null` value.
- Namespace-specific data types use string representations.

Every response SHOULD contain the following fields, and MUST contain at least :field:`meta`:

Expand Down Expand Up @@ -1815,7 +1817,10 @@ The following tokens are used in the filter query component:

(Note that at the end of the string value above the four final backslashes represent the two terminal backslashes in the value, and the final double quote is a terminator, it is not escaped.)

String value tokens are also used to represent **timestamps** in form of the `RFC 3339 Internet Date/Time Format <https://tools.ietf.org/html/rfc3339#section-5.6>`__.
String value tokens are also used to represent:

- **timestamps** in form of the `RFC 3339 Internet Date/Time Format <https://tools.ietf.org/html/rfc3339#section-5.6>`__.
- Namespace-specific data types.

- **Numeric values** are represented as decimal integers or in scientific notation, using the usual programming language conventions.
A regular expression giving the number syntax is given below as a `POSIX Extended Regular Expression (ERE) <https://en.wikipedia.org/w/index.php?title=Regular_expression&oldid=786659796#Standards>`__ or as a `Perl-Compatible Regular Expression (PCRE) <http://www.pcre.org>`__:
Expand Down Expand Up @@ -2070,6 +2075,10 @@ As the filter language syntax does not define a lexical token for timestamps, va
In a comparison with a timestamp property, a string token represents a timestamp value that would result from parsing the string according to RFC 3339 Internet Date/Time Format.
Interpretation failures MUST be reported with error :http-error:`400 Bad Request`.

Namespace providers MAY introduce namespace-specific data types, representing them with string lexical tokens both in filters and responses.
It is up to the providers to decide which comparison operators to support and how the comparisons should be performed.
For example, a provider that introduces a set-valued property :property:`_exmpl_set`, may decide to override the :val:`CONTAINS` operator so that :filter:`identifier CONTAINS set` is true when :val:`set` is a subset of a property.

Optional filter features
~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2158,7 +2167,7 @@ A Property Definition MUST be composed according to the combination of the requi

- :field:`x-optimade-type`: String.
Specifies the OPTIMADE data type for this level of the defined property.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, or :val:`"dictionary"`.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix.
When a namespace-specific data type is used, the human-readable property description should be used to describe its usage and format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the end we will want this info to go into our "standards" definitions (see https://github.com/Materials-Consortia/OPTIMADE/tree/develop/schemas#standards-definitions ). For the OPTIMADE one see here: https://schemas.optimade.org/defs/v1.2/standards/optimade.html

If forced to handle this today I would put that info into the description field there. Something along the lines of:

Description: The Magnetic materials standard node types and properties.

Apart from the supplied property definitions, the Magnetic materials standard defines a namespace-specific data type for the magnetic susceptibility tensor "_mag_sustensor" that allows comparison using < and > to express filters of the magnetic susceptibility magnitude. In more detail, the following operators are valid:
...

BUT, writing this up; even though the magnetic susceptibility tensor example is quite bogus, I realize we may be overly limiting deciding all custom datatypes must be represented as strings? Why not allow it to be expressed as any basic or container datatype?

But, back to @ml-evs comment: to allow not repeating the definition a lot, maybe this is better?:

When a namespace-specific data type is used, the human-readable property description should be used to describe its usage and format or to give a reference to where this information is available.

(That reference can then be to the "standards definition" if it is included there).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix.
MUST be one of :val:`"string"`, :val:`"integer"`, :val:`"float"`, :val:`"boolean"`, :val:`"timestamp"`, :val:`"list"`, :val:`"dictionary"` or the name of a namespace-specific data type that starts with an underscore followed by the provider-specific prefix.
When a namespace-specific data type is used, the property definition MUST list the top-level `type` as `'string'` (or `['string', 'null']` for optional fields).


- :field:`x-optimade-unit`: String.
A (compound) symbol for the physical unit in which the value of the defined property is given or one of the strings :val:`dimensionless` or :val:`inapplicable`.
Expand Down