diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst new file mode 100644 index 0000000000000..3ede97ef7dcae --- /dev/null +++ b/docs/source/format/CanonicalExtensions.rst @@ -0,0 +1,75 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _format_canonical_extensions: + +************************* +Canonical Extension Types +************************* + +============ +Introduction +============ + +The Arrow Columnar Format allows defining +:ref:`extension types ` so as to extend +standard Arrow data types with custom semantics. Often these semantics +will be specific to a system or application. However, it is beneficial +to share the definitions of well-known extension types so as to improve +interoperability between different systems integrating Arrow columnar data. + +Standardization +=============== + +These rules must be followed for the standardization of canonical extension +types: + +* Canonical extension types are described and maintained below in this document. + +* Each canonical extension type requires a distinct discussion and vote + on the `Arrow development mailing-list `__. + +* The specification text to be added *must* follow these requirements: + + 1) It *must* define a well-defined extension name starting with "``arrow.``". + + 2) Its parameters, if any, *must* be described in the proposal. + + 3) Its serialization *must* be described in the proposal and should + not require unduly implementation work or unusual software dependencies + (for example, a trivial custom text format or JSON would be acceptable). + + 4) Its expected semantics *should* be described as well and any + potential ambiguities or pain points addressed or at least mentioned. + +* The extension type *should* have one implementation submitted; + preferably two if non-trivial (for example if parameterized). + +Making Modifications +==================== + +Like standard Arrow data types, canonical extension types should be considered +stable once standardized. Modifying a canonical extension type (for example +to expand the set of parameters) should be an exceptional event, follow the +same rules as laid out above, and provide backwards compatibility guarantees. + + +============= +Official List +============= + +No canonical extension types have been standardized yet. diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 109b81e2b9dff..5f9537384c000 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -1167,6 +1167,11 @@ structure. These extension keys are: * ``'ARROW:extension:metadata'`` for a serialized representation of the ``ExtensionType`` necessary to reconstruct the custom type +.. note:: + Extension names beginning with ``arrow.`` are reserved for + :ref:`canonical extension types `, + they should not be used for third-party extension types. + This extension metadata can annotate any of the built-in Arrow logical types. The intent is that an implementation that does not support an extension type can still handle the underlying data. For example a @@ -1190,6 +1195,10 @@ extension types: metadata indicating the market trading calendar the data corresponds to +.. seealso:: + :ref:`format_canonical_extensions` + + Implementation guidelines ========================= diff --git a/docs/source/format/Glossary.rst b/docs/source/format/Glossary.rst index 5944d7c18cffe..ac18c1618bceb 100644 --- a/docs/source/format/Glossary.rst +++ b/docs/source/format/Glossary.rst @@ -52,6 +52,14 @@ Glossary device (e.g. GPU) memory, etc., though not all Arrow implementations support all of these possibilities. + canonical extension type + An :term:`extension type` that has been standardized by the + Arrow community so as to improve interoperability between + implementations. + + .. seealso:: + :ref:`format_canonical_extensions`. + child array parent array In an array of a :term:`nested type`, the parent array @@ -112,10 +120,10 @@ Glossary extension type storage type - A user-defined :term:`data type` that adds additional semantics - to an existing data type. This allows implementations that do - not support a particular extension type to still handle the - underlying data type (the "storage type"). + An extension type is an user-defined :term:`data type` that adds + additional semantics to an existing data type. This allows + implementations that do not support a particular extension type to + still handle the underlying data type (the "storage type"). For example, a UUID can be represented as a 16-byte fixed-size binary type. diff --git a/docs/source/index.rst b/docs/source/index.rst index b261474c6fa10..60879993e45f5 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -74,6 +74,7 @@ target environment.** format/Versioning format/Columnar + format/CanonicalExtensions format/Flight format/FlightSql format/Integration