Skip to content

Commit

Permalink
BJData optimized binary array type
Browse files Browse the repository at this point in the history
Introduces a dedicated `B` marker for bytes. This is used as the strong
type marker in optimized array format to encode binary data such that
it can also be decoded back to binary data (instead of decoding as an
integer array).

See NeuroJSON/bjdata#6 for further information.
  • Loading branch information
nebkat committed Nov 25, 2024
1 parent ee32bfc commit 646a609
Show file tree
Hide file tree
Showing 5 changed files with 166 additions and 62 deletions.
25 changes: 15 additions & 10 deletions docs/mkdocs/docs/features/binary_formats/bjdata.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

The [BJData format](https://neurojson.org) was derived from and improved upon
[Universal Binary JSON(UBJSON)](https://ubjson.org) specification (Draft 12). Specifically, it introduces an optimized
array container for efficient storage of N-dimensional packed arrays (**ND-arrays**); it also adds 4 new type markers -
`[u] - uint16`, `[m] - uint32`, `[M] - uint64` and `[h] - float16` - to unambiguously map common binary numeric types;
furthermore, it uses little-endian (LE) to store all numerics instead of big-endian (BE) as in UBJSON to avoid
unnecessary conversions on commonly available platforms.
array container for efficient storage of N-dimensional packed arrays (**ND-arrays**); it also adds 5 new type markers -
`[u] - uint16`, `[m] - uint32`, `[M] - uint64`, `[h] - float16` and `[B] - byte` - to unambiguously map common binary
numeric types; furthermore, it uses little-endian (LE) to store all numerics instead of big-endian (BE) as in UBJSON to
avoid unnecessary conversions on commonly available platforms.

Compared to other binary JSON-like formats such as MessagePack and CBOR, both BJData and UBJSON demonstrate a rare
combination of being both binary and **quasi-human-readable**. This is because all semantic elements in BJData and
Expand Down Expand Up @@ -49,6 +49,7 @@ The library uses the following mapping from JSON values types to BJData types ac
| string | *with shortest length indicator* | string | `S` |
| array | *see notes on optimized format/ND-array* | array | `[` |
| object | *see notes on optimized format* | map | `{` |
| binary | *see notes on binary values* | array | `[$B` |

!!! success "Complete mapping"

Expand Down Expand Up @@ -128,15 +129,17 @@ The library uses the following mapping from JSON values types to BJData types ac

Due to diminished space saving, hampered readability, and increased security risks, in BJData, the allowed data
types following the `$` marker in an optimized array and object container are restricted to
**non-zero-fixed-length** data types. Therefore, the valid optimized type markers can only be one of `UiuImlMLhdDC`.
This also means other variable (`[{SH`) or zero-length types (`TFN`) can not be used in an optimized array or object
in BJData.
**non-zero-fixed-length** data types. Therefore, the valid optimized type markers can only be one of
`UiuImlMLhdDCB`. This also means other variable (`[{SH`) or zero-length types (`TFN`) can not be used in an
optimized array or object in BJData.

!!! info "Binary values"

If the JSON data contains the binary type, the value stored is a list of integers, as suggested by the BJData
documentation. In particular, this means that the serialization and the deserialization of JSON containing binary
values into BJData and back will result in a different JSON object.
BJData provides a dedicated `B` marker (defined in the [BJData specification (Draft 3)][BJDataBinArr]) that is used
in optimized arrays to designate binary data. This means that, unlike UBJSON, binary data can be both serialized and
deserialized.

[BJDataBinArr]: https://github.com/NeuroJSON/bjdata/blob/master/Binary_JData_Specification.md#optimized-binary-array)

??? example

Expand Down Expand Up @@ -171,11 +174,13 @@ The library maps BJData types to JSON value types as follows:
| int32 | number_integer | `l` |
| uint64 | number_unsigned | `M` |
| int64 | number_integer | `L` |
| byte | number_unsigned | `B` |
| string | string | `S` |
| char | string | `C` |
| array | array (optimized values are supported) | `[` |
| ND-array | object (in JData annotated array format)|`[$.#[.`|
| object | object (optimized values are supported) | `{` |
| binary | binary (strongly-typed byte array) | `[$B` |

!!! success "Complete mapping"

Expand Down
20 changes: 19 additions & 1 deletion include/nlohmann/detail/input/binary_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2310,6 +2310,16 @@ class binary_reader
case 'Z': // null
return sax->null();

case 'B': // byte
{
if (input_format != input_format_t::bjdata)
{
break;
}
std::uint8_t number{};
return get_number(input_format, number) && sax->number_unsigned(number);
}

case 'U':
{
std::uint8_t number{};
Expand Down Expand Up @@ -2510,7 +2520,7 @@ class binary_reader
return false;
}

if (size_and_type.second == 'C')
if (size_and_type.second == 'C' || size_and_type.second == 'B')
{
size_and_type.second = 'U';
}
Expand All @@ -2532,6 +2542,13 @@ class binary_reader
return (sax->end_array() && sax->end_object());
}

// If BJData type marker is 'B' decode as binary
if (input_format == input_format_t::bjdata && size_and_type.first != npos && size_and_type.second == 'B')
{
binary_t result;
return get_binary(input_format, size_and_type.first, result) && sax->binary(result);
}

if (size_and_type.first != npos)
{
if (JSON_HEDLEY_UNLIKELY(!sax->start_array(size_and_type.first)))
Expand Down Expand Up @@ -2973,6 +2990,7 @@ class binary_reader

#define JSON_BINARY_READER_MAKE_BJD_TYPES_MAP_ \
make_array<bjd_type>( \
bjd_type{'B', "byte"}, \
bjd_type{'C', "char"}, \
bjd_type{'D', "double"}, \
bjd_type{'I', "int16"}, \
Expand Down
11 changes: 6 additions & 5 deletions include/nlohmann/detail/output/binary_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -847,11 +847,11 @@ class binary_writer
oa->write_character(to_char_type('['));
}

if (use_type && !j.m_data.m_value.binary->empty())
if (use_type && (use_bjdata || !j.m_data.m_value.binary->empty()))
{
JSON_ASSERT(use_count);
oa->write_character(to_char_type('$'));
oa->write_character('U');
oa->write_character(use_bjdata ? 'B' : 'U');
}

if (use_count)
Expand All @@ -870,7 +870,7 @@ class binary_writer
{
for (size_t i = 0; i < j.m_data.m_value.binary->size(); ++i)
{
oa->write_character(to_char_type('U'));
oa->write_character(to_char_type(use_bjdata ? 'B' : 'U'));
oa->write_character(j.m_data.m_value.binary->data()[i]);
}
}
Expand Down Expand Up @@ -1618,7 +1618,8 @@ class binary_writer
bool write_bjdata_ndarray(const typename BasicJsonType::object_t& value, const bool use_count, const bool use_type)
{
std::map<string_t, CharType> bjdtype = {{"uint8", 'U'}, {"int8", 'i'}, {"uint16", 'u'}, {"int16", 'I'},
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'}, {"char", 'C'}
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'},
{"char", 'C'}, {"byte", 'B'}
};

string_t key = "_ArrayType_";
Expand Down Expand Up @@ -1651,7 +1652,7 @@ class binary_writer
write_ubjson(value.at(key), use_count, use_type, true, true);

key = "_ArrayData_";
if (dtype == 'U' || dtype == 'C')
if (dtype == 'U' || dtype == 'C' || dtype == 'B')
{
for (const auto& el : value.at(key))
{
Expand Down
31 changes: 25 additions & 6 deletions single_include/nlohmann/json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11563,6 +11563,16 @@ class binary_reader
case 'Z': // null
return sax->null();

case 'B': // byte
{
if (input_format != input_format_t::bjdata)
{
break;
}
std::uint8_t number{};
return get_number(input_format, number) && sax->number_unsigned(number);
}

case 'U':
{
std::uint8_t number{};
Expand Down Expand Up @@ -11763,7 +11773,7 @@ class binary_reader
return false;
}

if (size_and_type.second == 'C')
if (size_and_type.second == 'C' || size_and_type.second == 'B')
{
size_and_type.second = 'U';
}
Expand All @@ -11785,6 +11795,13 @@ class binary_reader
return (sax->end_array() && sax->end_object());
}

// If BJData type marker is 'B' decode as binary
if (input_format == input_format_t::bjdata && size_and_type.first != npos && size_and_type.second == 'B')
{
binary_t result;
return get_binary(input_format, size_and_type.first, result) && sax->binary(result);
}

if (size_and_type.first != npos)
{
if (JSON_HEDLEY_UNLIKELY(!sax->start_array(size_and_type.first)))
Expand Down Expand Up @@ -12226,6 +12243,7 @@ class binary_reader

#define JSON_BINARY_READER_MAKE_BJD_TYPES_MAP_ \
make_array<bjd_type>( \
bjd_type{'B', "byte"}, \
bjd_type{'C', "char"}, \
bjd_type{'D', "double"}, \
bjd_type{'I', "int16"}, \
Expand Down Expand Up @@ -15994,11 +16012,11 @@ class binary_writer
oa->write_character(to_char_type('['));
}

if (use_type && !j.m_data.m_value.binary->empty())
if (use_type && (use_bjdata || !j.m_data.m_value.binary->empty()))
{
JSON_ASSERT(use_count);
oa->write_character(to_char_type('$'));
oa->write_character('U');
oa->write_character(use_bjdata ? 'B' : 'U');
}

if (use_count)
Expand All @@ -16017,7 +16035,7 @@ class binary_writer
{
for (size_t i = 0; i < j.m_data.m_value.binary->size(); ++i)
{
oa->write_character(to_char_type('U'));
oa->write_character(to_char_type(use_bjdata ? 'B' : 'U'));
oa->write_character(j.m_data.m_value.binary->data()[i]);
}
}
Expand Down Expand Up @@ -16765,7 +16783,8 @@ class binary_writer
bool write_bjdata_ndarray(const typename BasicJsonType::object_t& value, const bool use_count, const bool use_type)
{
std::map<string_t, CharType> bjdtype = {{"uint8", 'U'}, {"int8", 'i'}, {"uint16", 'u'}, {"int16", 'I'},
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'}, {"char", 'C'}
{"uint32", 'm'}, {"int32", 'l'}, {"uint64", 'M'}, {"int64", 'L'}, {"single", 'd'}, {"double", 'D'},
{"char", 'C'}, {"byte", 'B'}
};

string_t key = "_ArrayType_";
Expand Down Expand Up @@ -16798,7 +16817,7 @@ class binary_writer
write_ubjson(value.at(key), use_count, use_type, true, true);

key = "_ArrayData_";
if (dtype == 'U' || dtype == 'C')
if (dtype == 'U' || dtype == 'C' || dtype == 'B')
{
for (const auto& el : value.at(key))
{
Expand Down
Loading

0 comments on commit 646a609

Please sign in to comment.