Skip to content

Commit

Permalink
ARROW-13055: [Doc] Create canonical extension types document (#14167)
Browse files Browse the repository at this point in the history
Vote result at https://lists.apache.org/thread/sxd5fhc42hb6svs79t3fd79gkqj83pfh

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
  • Loading branch information
pitrou authored Sep 20, 2022
1 parent 2577ac1 commit 2629f20
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 4 deletions.
75 changes: 75 additions & 0 deletions docs/source/format/CanonicalExtensions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
.. _format_canonical_extensions:

*************************
Canonical Extension Types
*************************

============
Introduction
============

The Arrow Columnar Format allows defining
:ref:`extension types <format_metadata_extension_types>` so as to extend
standard Arrow data types with custom semantics. Often these semantics
will be specific to a system or application. However, it is beneficial
to share the definitions of well-known extension types so as to improve
interoperability between different systems integrating Arrow columnar data.

Standardization
===============

These rules must be followed for the standardization of canonical extension
types:

* Canonical extension types are described and maintained below in this document.

* Each canonical extension type requires a distinct discussion and vote
on the `Arrow development mailing-list <https://arrow.apache.org/community/>`__.

* The specification text to be added *must* follow these requirements:

1) It *must* define a well-defined extension name starting with "``arrow.``".

2) Its parameters, if any, *must* be described in the proposal.

3) Its serialization *must* be described in the proposal and should
not require unduly implementation work or unusual software dependencies
(for example, a trivial custom text format or JSON would be acceptable).

4) Its expected semantics *should* be described as well and any
potential ambiguities or pain points addressed or at least mentioned.

* The extension type *should* have one implementation submitted;
preferably two if non-trivial (for example if parameterized).

Making Modifications
====================

Like standard Arrow data types, canonical extension types should be considered
stable once standardized. Modifying a canonical extension type (for example
to expand the set of parameters) should be an exceptional event, follow the
same rules as laid out above, and provide backwards compatibility guarantees.


=============
Official List
=============

No canonical extension types have been standardized yet.
9 changes: 9 additions & 0 deletions docs/source/format/Columnar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1167,6 +1167,11 @@ structure. These extension keys are:
* ``'ARROW:extension:metadata'`` for a serialized representation
of the ``ExtensionType`` necessary to reconstruct the custom type

.. note::
Extension names beginning with ``arrow.`` are reserved for
:ref:`canonical extension types <format_canonical_extensions>`,
they should not be used for third-party extension types.

This extension metadata can annotate any of the built-in Arrow logical
types. The intent is that an implementation that does not support an
extension type can still handle the underlying data. For example a
Expand All @@ -1190,6 +1195,10 @@ extension types:
metadata indicating the market trading calendar the data corresponds
to

.. seealso::
:ref:`format_canonical_extensions`


Implementation guidelines
=========================

Expand Down
16 changes: 12 additions & 4 deletions docs/source/format/Glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,14 @@ Glossary
device (e.g. GPU) memory, etc., though not all Arrow
implementations support all of these possibilities.

canonical extension type
An :term:`extension type` that has been standardized by the
Arrow community so as to improve interoperability between
implementations.

.. seealso::
:ref:`format_canonical_extensions`.

child array
parent array
In an array of a :term:`nested type`, the parent array
Expand Down Expand Up @@ -112,10 +120,10 @@ Glossary

extension type
storage type
A user-defined :term:`data type` that adds additional semantics
to an existing data type. This allows implementations that do
not support a particular extension type to still handle the
underlying data type (the "storage type").
An extension type is an user-defined :term:`data type` that adds
additional semantics to an existing data type. This allows
implementations that do not support a particular extension type to
still handle the underlying data type (the "storage type").

For example, a UUID can be represented as a 16-byte fixed-size
binary type.
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ target environment.**

format/Versioning
format/Columnar
format/CanonicalExtensions
format/Flight
format/FlightSql
format/Integration
Expand Down

0 comments on commit 2629f20

Please sign in to comment.