Skip to content

Commit

Permalink
Add Cbor features for COSE compliance (#2412)
Browse files Browse the repository at this point in the history
This PR contains all features required to serialize and parse COSE-compliant CBOR (thanks to @nodh). While some canonicalization steps (such as sorting keys) still need to be performed manually. It does get the job done quite well. Namely, we have successfully used the features introduced here to create and validate ISO/IEC 18013-5:2021-compliant mobile driving license data.

This PR introduces the following features to the CBOR format:

- Serial Labels
- Tagging of keys and values
- Definite length encoding (this is the largest change, as it effectively makes the cbor encoder two-pass)
- Option to globally prefer major type 2 for byte array encoding
- Various QoL changes, such as public CborEncoder/CborDecoder interfaces and separate CborConfiguration class.

This PR obsoletes #2371 and #2359 as it contains the features of both PRs and many more.

Fixes #1955
Fixes #1560

Co-authored-by: Christian Kollmann <[email protected]>
Co-authored-by: Leonid Startsev <[email protected]>
  • Loading branch information
3 people authored Jul 22, 2024
1 parent af5095e commit 2017084
Show file tree
Hide file tree
Showing 31 changed files with 4,230 additions and 1,472 deletions.
1 change: 1 addition & 0 deletions benchmark/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ dependencies {
implementation(libs.okio)
implementation(libs.kotlinx.io)
implementation(project(":kotlinx-serialization-core"))
implementation(project(":kotlinx-serialization-cbor"))
implementation(project(":kotlinx-serialization-json"))
implementation(project(":kotlinx-serialization-json-okio"))
implementation(project(":kotlinx-serialization-json-io"))
Expand Down
62 changes: 62 additions & 0 deletions benchmark/src/jmh/kotlin/kotlinx/benchmarks/cbor/CborBaseLine.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
/*
* Copyright 2017-2024 JetBrains s.r.o. Use of this source code is governed by the Apache 2.0 license.
*/

package kotlinx.benchmarks.cbor

import kotlinx.serialization.Serializable
import kotlinx.serialization.cbor.*
import org.openjdk.jmh.annotations.*
import java.util.concurrent.*

@Serializable
data class KTestAllTypes(
val i32: Int,
val i64: Long,
val f: Float,
val d: Double,
val s: String,
val b: Boolean = false,
)

@Serializable
data class KTestOuterMessage(
val a: Int,
val b: Double,
val inner: KTestAllTypes,
val s: String,
val ss: List<String>
)

@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(1)
open class CborBaseline {
val baseMessage = KTestOuterMessage(
42,
256123123412.0,
s = "string",
ss = listOf("a", "b", "c"),
inner = KTestAllTypes(-123124512, 36253671257312, Float.MIN_VALUE, -23e15, "foobarbaz")
)

val cbor = Cbor {
encodeDefaults = true
encodeKeyTags = false
encodeValueTags = false
useDefiniteLengthEncoding = false
preferCborLabelsOverNames = false
}

val baseBytes = cbor.encodeToByteArray(KTestOuterMessage.serializer(), baseMessage)

@Benchmark
fun toBytes() = cbor.encodeToByteArray(KTestOuterMessage.serializer(), baseMessage)

@Benchmark
fun fromBytes() = cbor.decodeFromByteArray(KTestOuterMessage.serializer(), baseBytes)

}
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ afterEvaluate { // Can be applied only when the project is evaluated
"kotlinx-serialization-core" -> "kotlinx.serialization.internal.SuppressAnimalSniffer"
"kotlinx-serialization-hocon" -> "kotlinx.serialization.hocon.internal.SuppressAnimalSniffer"
"kotlinx-serialization-protobuf" -> "kotlinx.serialization.protobuf.internal.SuppressAnimalSniffer"
"kotlinx-serialization-cbor" -> "kotlinx.serialization.cbor.internal.SuppressAnimalSniffer"
else -> "kotlinx.serialization.json.internal.SuppressAnimalSniffer"
}

Expand Down
90 changes: 90 additions & 0 deletions docs/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ stable, these are currently experimental features of Kotlin Serialization.
* [CBOR (experimental)](#cbor-experimental)
* [Ignoring unknown keys](#ignoring-unknown-keys)
* [Byte arrays and CBOR data types](#byte-arrays-and-cbor-data-types)
* [Definite vs. Indefinite Length Encoding](#definite-vs-indefinite-length-encoding)
* [Tags and Labels](#tags-and-labels)
* [Arrays](#arrays)
* [Custom CBOR-specific Serializers](#custom-cbor-specific-serializers)
* [ProtoBuf (experimental)](#protobuf-experimental)
* [Field numbers](#field-numbers)
* [Integer types](#integer-types)
Expand Down Expand Up @@ -164,6 +168,8 @@ Per the [RFC 7049 Major Types] section, CBOR supports the following data types:

By default, Kotlin `ByteArray` instances are encoded as **major type 4**.
When **major type 2** is desired, then the [`@ByteString`][ByteString] annotation can be used.
Moreover, the `alwaysUseByteString` configuration switch allows for globally preferring **major type 2** without needing
to annotate every `ByteArray` in a class hierarchy.

<!--- INCLUDE
import kotlinx.serialization.*
Expand Down Expand Up @@ -221,6 +227,90 @@ BF # map(*)
FF # primitive(*)
```

### Definite vs. Indefinite Length Encoding
CBOR supports two encodings for maps and arrays: definite and indefinite length encoding. kotlinx.serialization defaults
to the latter, which means that a map's or array's number of elements is not encoded, but instead a terminating byte is
appended after the last element.
Definite length encoding, on the other hand, omits this terminating byte, but instead prepends number of elements
to the contents of a map or array. The `useDefiniteLengthEncoding` configuration switch allows for toggling between the
two modes of encoding.


### Tags and Labels

CBOR allows for optionally defining *tags* for properties and their values. These tags are encoded into the resulting
byte string to transport additional information
(see [RFC 8949 Tagging of Items](https://datatracker.ietf.org/doc/html/rfc8949#name-tagging-of-items) for more info).
The [`@KeyTags`](Tags.kt) and [`@ValueTags`](Tags.kt) annotations can be used to define such tags while
writing and verifying such tags can be toggled using the `encodeKeyTags`, `encodeValueTags`, `verifyKeyTags`, and
`verifyValueTags` configuration switches respectively.
In addition, it is possible to directly declare classes to always be tagged.
This then applies to all instances of such a tagged class, regardless of whether they are used as values in a list
or when they are used as a property in another class.
Forcing objects to always be tagged in such a manner is accomplished by the [`@ObjectTags`](Tags.kt) annotation,
which works just as `ValueTags`, but for class definitions.
When serializing, `ObjectTags` will always be encoded directly before to the data of the tagged object, i.e. a
value-tagged property of an object-tagged type will have the value tags preceding the object tags.
Writing and verifying object tags can be toggled using the `encodeObjectTags` and `verifyObjectTags` configuration
switches. Note that verifying only value tags can result in some data with superfluous tags to still deserialize
successfully, since in this case - by definition - only a partial validation of tags happens.
Well-known tags are specified in [`CborTag`](Tags.kt).

In addition, CBOR supports keys of all types which work just as `SerialName`s.
COSE restricts this again to strings and numbers and calls these restricted map keys *labels*. String labels can be
assigned by using `@SerialName`, while number labels can be assigned using the [`@CborLabel`](CborLabel.kt) annotation.
The `preferCborLabelsOverNames` configuration switch can be used to prefer number labels over SerialNames in case both
are present for a property. This duality allows for compact representation of a type when serialized to CBOR, while
keeping expressive diagnostic names when serializing to JSON.

A predefined Cbor instance (in addition to the default [`Cbor.Default`](Cbor.kt) one) is available, adhering to COSE
encoding requirements as [`Cbor.CoseCompliant`](Cbor.kt). This instance uses definite length encoding,
encodes and verifies all tags and prefers labels to serial names.

### Arrays

Classes may be serialized as a CBOR Array (major type 4) instead of a CBOR Map (major type 5).

Example usage:

```
@Serializable
data class DataClass(
val alg: Int,
val kid: String?
)
Cbor.encodeToByteArray(DataClass(alg = -7, kid = null))
```

will normally produce a Cbor map: bytes `0xa263616c6726636b6964f6`, or in diagnostic notation:

```
A2 # map(2)
63 # text(3)
616C67 # "alg"
26 # negative(6)
63 # text(3)
6B6964 # "kid"
F6 # primitive(22)
```

When annotated with `@CborArray`, serialization of the same object will produce a Cbor array: bytes `0x8226F6`, or in diagnostic notation:

```
82 # array(2)
26 # negative(6)
F6 # primitive(22)
```
This may be used to encode COSE structures, see [RFC 9052 2. Basic COSE Structure](https://www.rfc-editor.org/rfc/rfc9052#section-2).


### Custom CBOR-specific Serializers
Cbor encoders and decoders implement the interfaces [CborEncoder](CborEncoder.kt) and [CborDecoder](CborDecoder.kt), respectively.
These interfaces contain a single property, `cbor`, exposing the current CBOR serialization configuration.
This enables custom cbor-specific serializers to reuse the current `Cbor` instance to produce embedded byte arrays or
react to configuration settings such as `preferCborLabelsOverNames` or `useDefiniteLengthEncoding`, for example.

## ProtoBuf (experimental)

[Protocol Buffers](https://developers.google.com/protocol-buffers) is a language-neutral binary format that normally
Expand Down
4 changes: 4 additions & 0 deletions docs/serialization-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ Once the project is set up, we can start serializing some classes.
* <a name='cbor-experimental'></a>[CBOR (experimental)](formats.md#cbor-experimental)
* <a name='ignoring-unknown-keys'></a>[Ignoring unknown keys](formats.md#ignoring-unknown-keys)
* <a name='byte-arrays-and-cbor-data-types'></a>[Byte arrays and CBOR data types](formats.md#byte-arrays-and-cbor-data-types)
* <a name='definite-vs-indefinite-length-encoding'></a>[Definite vs. Indefinite Length Encoding](formats.md#definite-vs-indefinite-length-encoding)
* <a name='tags-and-labels'></a>[Tags and Labels](formats.md#tags-and-labels)
* <a name='arrays'></a>[Arrays](formats.md#arrays)
* <a name='custom-cbor-specific-serializers'></a>[Custom CBOR-specific Serializers](formats.md#custom-cbor-specific-serializers)
* <a name='protobuf-experimental'></a>[ProtoBuf (experimental)](formats.md#protobuf-experimental)
* <a name='field-numbers'></a>[Field numbers](formats.md#field-numbers)
* <a name='integer-types'></a>[Integer types](formats.md#integer-types)
Expand Down
120 changes: 119 additions & 1 deletion formats/cbor/api/kotlinx-serialization-cbor.api
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,144 @@ public synthetic class kotlinx/serialization/cbor/ByteString$Impl : kotlinx/seri

public abstract class kotlinx/serialization/cbor/Cbor : kotlinx/serialization/BinaryFormat {
public static final field Default Lkotlinx/serialization/cbor/Cbor$Default;
public synthetic fun <init> (ZZLkotlinx/serialization/modules/SerializersModule;Lkotlin/jvm/internal/DefaultConstructorMarker;)V
public synthetic fun <init> (Lkotlinx/serialization/cbor/CborConfiguration;Lkotlinx/serialization/modules/SerializersModule;Lkotlin/jvm/internal/DefaultConstructorMarker;)V
public fun decodeFromByteArray (Lkotlinx/serialization/DeserializationStrategy;[B)Ljava/lang/Object;
public fun encodeToByteArray (Lkotlinx/serialization/SerializationStrategy;Ljava/lang/Object;)[B
public final fun getConfiguration ()Lkotlinx/serialization/cbor/CborConfiguration;
public fun getSerializersModule ()Lkotlinx/serialization/modules/SerializersModule;
}

public final class kotlinx/serialization/cbor/Cbor$Default : kotlinx/serialization/cbor/Cbor {
public final fun getCoseCompliant ()Lkotlinx/serialization/cbor/Cbor;
}

public abstract interface annotation class kotlinx/serialization/cbor/CborArray : java/lang/annotation/Annotation {
}

public synthetic class kotlinx/serialization/cbor/CborArray$Impl : kotlinx/serialization/cbor/CborArray {
public fun <init> ()V
}

public final class kotlinx/serialization/cbor/CborBuilder {
public final fun getAlwaysUseByteString ()Z
public final fun getEncodeDefaults ()Z
public final fun getEncodeKeyTags ()Z
public final fun getEncodeObjectTags ()Z
public final fun getEncodeValueTags ()Z
public final fun getIgnoreUnknownKeys ()Z
public final fun getPreferCborLabelsOverNames ()Z
public final fun getSerializersModule ()Lkotlinx/serialization/modules/SerializersModule;
public final fun getUseDefiniteLengthEncoding ()Z
public final fun getVerifyKeyTags ()Z
public final fun getVerifyObjectTags ()Z
public final fun getVerifyValueTags ()Z
public final fun setAlwaysUseByteString (Z)V
public final fun setEncodeDefaults (Z)V
public final fun setEncodeKeyTags (Z)V
public final fun setEncodeObjectTags (Z)V
public final fun setEncodeValueTags (Z)V
public final fun setIgnoreUnknownKeys (Z)V
public final fun setPreferCborLabelsOverNames (Z)V
public final fun setSerializersModule (Lkotlinx/serialization/modules/SerializersModule;)V
public final fun setUseDefiniteLengthEncoding (Z)V
public final fun setVerifyKeyTags (Z)V
public final fun setVerifyObjectTags (Z)V
public final fun setVerifyValueTags (Z)V
}

public final class kotlinx/serialization/cbor/CborConfiguration {
public final fun getAlwaysUseByteString ()Z
public final fun getEncodeDefaults ()Z
public final fun getEncodeKeyTags ()Z
public final fun getEncodeObjectTags ()Z
public final fun getEncodeValueTags ()Z
public final fun getIgnoreUnknownKeys ()Z
public final fun getPreferCborLabelsOverNames ()Z
public final fun getUseDefiniteLengthEncoding ()Z
public final fun getVerifyKeyTags ()Z
public final fun getVerifyObjectTags ()Z
public final fun getVerifyValueTags ()Z
public fun toString ()Ljava/lang/String;
}

public abstract interface class kotlinx/serialization/cbor/CborDecoder : kotlinx/serialization/encoding/Decoder {
public abstract fun getCbor ()Lkotlinx/serialization/cbor/Cbor;
}

public final class kotlinx/serialization/cbor/CborDecoder$DefaultImpls {
public static fun decodeNullableSerializableValue (Lkotlinx/serialization/cbor/CborDecoder;Lkotlinx/serialization/DeserializationStrategy;)Ljava/lang/Object;
public static fun decodeSerializableValue (Lkotlinx/serialization/cbor/CborDecoder;Lkotlinx/serialization/DeserializationStrategy;)Ljava/lang/Object;
}

public abstract interface class kotlinx/serialization/cbor/CborEncoder : kotlinx/serialization/encoding/Encoder {
public abstract fun getCbor ()Lkotlinx/serialization/cbor/Cbor;
}

public final class kotlinx/serialization/cbor/CborEncoder$DefaultImpls {
public static fun beginCollection (Lkotlinx/serialization/cbor/CborEncoder;Lkotlinx/serialization/descriptors/SerialDescriptor;I)Lkotlinx/serialization/encoding/CompositeEncoder;
public static fun encodeNotNullMark (Lkotlinx/serialization/cbor/CborEncoder;)V
public static fun encodeNullableSerializableValue (Lkotlinx/serialization/cbor/CborEncoder;Lkotlinx/serialization/SerializationStrategy;Ljava/lang/Object;)V
public static fun encodeSerializableValue (Lkotlinx/serialization/cbor/CborEncoder;Lkotlinx/serialization/SerializationStrategy;Ljava/lang/Object;)V
}

public final class kotlinx/serialization/cbor/CborKt {
public static final fun Cbor (Lkotlinx/serialization/cbor/Cbor;Lkotlin/jvm/functions/Function1;)Lkotlinx/serialization/cbor/Cbor;
public static synthetic fun Cbor$default (Lkotlinx/serialization/cbor/Cbor;Lkotlin/jvm/functions/Function1;ILjava/lang/Object;)Lkotlinx/serialization/cbor/Cbor;
}

public abstract interface annotation class kotlinx/serialization/cbor/CborLabel : java/lang/annotation/Annotation {
public abstract fun label ()J
}

public synthetic class kotlinx/serialization/cbor/CborLabel$Impl : kotlinx/serialization/cbor/CborLabel {
public fun <init> (J)V
public final synthetic fun label ()J
}

public final class kotlinx/serialization/cbor/CborTag {
public static final field BASE16 J
public static final field BASE64 J
public static final field BASE64_URL J
public static final field BIGFLOAT J
public static final field BIGNUM_NEGAIVE J
public static final field BIGNUM_POSITIVE J
public static final field CBOR_ENCODED_DATA J
public static final field CBOR_SELF_DESCRIBE J
public static final field DATE_TIME_EPOCH J
public static final field DATE_TIME_STANDARD J
public static final field DECIMAL_FRACTION J
public static final field INSTANCE Lkotlinx/serialization/cbor/CborTag;
public static final field MIME_MESSAGE J
public static final field REGEX J
public static final field STRING_BASE64 J
public static final field STRING_BASE64_URL J
public static final field URI J
}

public abstract interface annotation class kotlinx/serialization/cbor/KeyTags : java/lang/annotation/Annotation {
public abstract fun tags ()[J
}

public synthetic class kotlinx/serialization/cbor/KeyTags$Impl : kotlinx/serialization/cbor/KeyTags {
public synthetic fun <init> ([JLkotlin/jvm/internal/DefaultConstructorMarker;)V
public final synthetic fun tags ()[J
}

public abstract interface annotation class kotlinx/serialization/cbor/ObjectTags : java/lang/annotation/Annotation {
public abstract fun tags ()[J
}

public synthetic class kotlinx/serialization/cbor/ObjectTags$Impl : kotlinx/serialization/cbor/ObjectTags {
public synthetic fun <init> ([JLkotlin/jvm/internal/DefaultConstructorMarker;)V
public final synthetic fun tags ()[J
}

public abstract interface annotation class kotlinx/serialization/cbor/ValueTags : java/lang/annotation/Annotation {
public abstract fun tags ()[J
}

public synthetic class kotlinx/serialization/cbor/ValueTags$Impl : kotlinx/serialization/cbor/ValueTags {
public synthetic fun <init> ([JLkotlin/jvm/internal/DefaultConstructorMarker;)V
public final synthetic fun tags ()[J
}

Loading

0 comments on commit 2017084

Please sign in to comment.