-
Notifications
You must be signed in to change notification settings - Fork 33
Using Fleece
Version 1.3 … Feb 11, 2022
This document is a guide to the Fleece APIs for C and C++, at a higher level than the API docs in the headers.
To begin with, I'll explain the concepts behind the API, without the language-specific details of types and methods. Those will come next.
Fleece's data model is almost identical to JSON's, with the addition of a binary data (blob) type. This means Fleece has seven data types: null, boolean, numbers, strings, data, arrays, and dictionaries. Arrays can contain any types, and dictionaries have strings as keys and values of any types.
When Fleece-encoded data is parsed, it isn't converted into heap-allocated objects. Instead, the Fleece objects used in the API point directly into the encoded data. This means the parsing is incredibly fast and allocates no memory!
Warning: The downside of this is that if the encoded data is invalidated, for example by freeing the heap block containing it, all the Fleece objects are invalidated too, and accessing them will likely return garbage or crash. There is a
Doc
class that helps by acting as a safer ref-counted container for the data.
These Fleece objects are immutable, since they're frozen inside the encoded data block. So how do you create new ones? There are two ways: with an Encoder, or using mutable objects.
An Encoder is an object that generates encoded Fleece data using a streaming API. You write a value to it, and get the encoded data at the end. The value you write is probably a collection, so you call beginArray
or beginDict
, then write the values one at a time, then endArray
or endDict
. Those values can be scalars, given as C/C++ types like float
or string
, or they can themselves be collections that require a nested begin
and end
call. (Details are below.)
Fleece also supports mutable collections. Unlike the ones parsed from encoded data, these are individually allocated from the heap, like the objects in a typical collection API.
You can create an empty mutable array or dictionary from scratch and use it as the root of an object tree, adding values to it (including immutable value references.) You can also create a mutable copy of an immutable collection. This copy operation can be shallow (only the collection is copied; its contents will be the same values as the originals) or deep (values inside the collection are mutable-copied too, recursively.)
Note: Under the hood, making a shallow mutable copy is very cheap: instead of copying the entire collection into heap memory, it just allocates a small stub that points back to the immutable object. Its contents are inherited from the source object, as in JavaScript. As you make changes to the mutable collection, those are added to its heap data, shadowing the original contents. (You don't need to know about this to use collections, but it's pretty cool.)
To save a mutable object as encoded data, you write it to an Encoder. (See “Workflow”, below.)
Mutable values are reference-counted: Array and Dictionary each have retain
and release
functions that increment/decrement the reference count. Adding a mutable value to a collection also increments its reference count. The value (and its contents) will remain alive as long as its reference count is positive.
Note: In C++ the reference counting is done for you.
MutableArray
andMutableDict
are a type of smart pointer that retains the value while in scope.
Immutable values are, as explained above, at the mercy of the memory block they're parsed from. Their lifespan is the same as the lifespan of that block; they cannot be individually retained or released. You must watch out for this, and avoid freeing that memory prematurely!
Warning: A mutable collection may contain immutable values; this happens when you make a shallow mutable copy of a nested immutable collection. The mutable collection cannot retain those values — they're still limited to the lifespan of their parsed memory block. If that block gets invalidated, the mutable collection remains valid but its contents will be garbage! If this is a problem, making a deep copy of the collection with the
kFLDeepCopyImmutables
mode will ensure that all values within the collection are copied to the heap.
A typical workflow for updating persistent Fleece data is:
- Read the data into memory
- Parse the data, which returns a reference to the root collection
- Make a mutable copy of the root
- Make changes to the copy (possibly making mutable copies of nested collections)
- Encode the copy
- Write the encoded data back to storage
- Free the original and updated data blocks
If the persistent data doesn't exist yet, you'd initialize by creating a new empty mutable collection, then jumping to step 4.
Now let's go into the actual details of the API.
The first thing you need to do is include the right headers. Make sure to add the Fleece source tree's API
subdirectory to the compiler's header search path.
#include <fleece/Fleece.h>
#include <fleece/Fleece.hh>
#include <fleece/Mutable.hh> // Only needed if you use mutable classes
using namespace fleece; // Optional but recommended :)
Note: There is also an internal C++ API, in the package
fleece::impl
. This used to be the public API, but it's been superseded. Please don't use it!
The basic Fleece API type is Value
(FLValue
in C.) This is a reference to a value of any type. Of the other data types, Array
and Dictionary
have their own C++ types (FLArray
and FLDict
in C) which are subclasses of Value
. Scalar values are just accessed using C/C++ types.
In C, FLValue
, FLArray
and FLDict
are typedefs for opaque pointers. The methods on them are functions that take the receiver as the first parameter. The name of the function reflects the type it operates on; for example, Dict
's count
method is called as the function FLDict_Count(FLDict)
.
Note: C doesn't support inheritance, so
FLArray
andFLDict
are not type-compatible with FLValue. If you need to pass one of those to a function parameter that expects an FLValue, just type-cast.
In C++, they're classes that act as “smart pointers”, even though you don't use pointer syntax (*
, ->
) with them. They are reference types, not value types like std::vector
or std::map
.
We need to mention a few support types that are lower-level than values. These are used to represent both strings and binary data blobs. They have more extensive documentation, but here's a short intro:
A slice
is a simple struct consisting of a pointer and a length. All it does is point to a range of memory. It doesn't imply any ownership; it just says “over here, for this many bytes.” Nonetheless, it's very useful, and there are a lot of utility methods on it, including ones to convert to and from C++ and C strings.
alloc_slice
is a subclass that does own memory. It always points to a heap block that it manages. The heap block is reference-counted, so it's freed when the last alloc_slice
pointing to it goes out of scope.
Note: The null slice is
{NULL, 0}
. You test a slice for null by comparing its pointer (buf
) with NULL. Comparing its size with 0 isn't the same: it's possible to have an empty but non-null slice. (C++slice
has abool
conversion operator that tests for null.) It is, however, illegal to have a slice with a null pointer but nonzero size.
A slice literal can be written as a string literal with _sl
appended, e.g. "something"_sl
.
The constant nullslice
represents a null slice.
The C++ alloc_slice
manages ref-counting automatically.
There are a great number of utility methods on slices; look in fleece/slice.hh
for details. Comparisons, matching, splitting, hex or base64 encoding...
In C, these types are called FLSlice
and FLSliceResult
, and an FLSlice
literal can be written as FLSTR("something")
.
The constant kFLSliceNull
represents a null slice.
As usual, reference-counting is up to you in C: whenever an FLSliceResult
is returned from an API call, you are responsible for calling FLSliceResult_Release
when you're done with it.
There are only a couple of utility functions in C. FLSlice_Equal
compares two slices for equality. FLSlice_Compare
is a 3-way comparison, like strcmp()
. You can create new ref-counted FLSliceResults
with FLSliceResult_New
and FLSlice_Copy
.
Value
's type
property returns an enumeration that identifies which type of value it really is.
There are methods to get a scalar from a Value, or to cast it to a more specific (collection) type. They all return an empty default result if the value is not of the assumed type:
- Boolean:
asBool
(returnsfalse
if the value is not boolean) - Numbers:
asInt
,asUnsigned
,asFloat
,asDouble
(returns0
if the value is not numeric) - Strings:
asString
(returns a null slice if the value is not a string) - Data:
asData
(returns a null slice if the value is not data) - Arrays:
asArray
(returnsNULL
if the Value is not an Array) - Dicts:
asDict
(returnsNULL
if the Value is not a Dict)
A Value can be a NULL
pointer (i.e. a reference to address 0
.) This is different from a JSON null
! It means that there is literally no value. It's equivalent to JavaScript's undefined
. It's returned from collection getters when the requested index or key doesn't exist, or from asDict
/ asArray
when the value is not of the required type. It's also the initial state of a Value in C++.
It's safe to operate on a NULL Value — in general, any operation on it will return NULL
, or false, or zero. (If you're used to Objective-C, it acts like nil
. And it might remind you of ?.
in Swift or Kotlin.) This is unusual, but it has the benefit of making it easy and safe to work with values whose schema is unknown or can't be guaranteed. For example, you can dive into nested properties like this:
double width = root.asDict()["dimensions"].asArray()[0].asDouble();
There are six things that could go wrong here, if the data isn't in the expected form: root
might be NULL
, or it might not be a dictionary, or it might not have a dimensions
property, or the value of that property might not be an array, or that array might be empty, or its first value might not be a number. If anything goes wrong, all that happens is that width
is set to 0. (If NULL
weren't safe, you'd have to insert six error checks, turning that one line of code into about 18, or else risk crashing!)
If you want to distinguish between those failures and the case where width
exists but really is 0, you can do this:
Value widthVal = root.asDict()["dimensions"].asArray()[0];
if (widthVal.type() != kFLNumber)
throw "missing or invalid width!"
width = widthVal.asDouble();
This works because type()
called on a NULL
Value returns kFLUndefined
.
The collection API should be pretty familiar if you've used other frameworks...
- Array and Dict both have a
count
property, and a booleanempty
property (for convenience, and because in some cases it can take longer to determine the actual count than to just check if the collection is empty.) - Arrays are indexed by (unsigned) integers starting at zero. Getting an index past the end of an array just returns NULL.
- Dicts are indexed by strings. If the Dict doesn't contain the key you requested, it returns NULL.
All collections have an isMutable
property that tells you if the instance is actually mutable, and an asMutable
property that type-casts to the appropriate mutable subclass, or returns NULL if it's not mutable.
You can create a mutable collection from scratch by calling MutableArray::newArray
or MutableDict::newDict
. Or you can copy an existing collection by calling its mutableCopy
method. There are three modes for copying, which are progressively more expensive (but safer):
-
kFLDefaultCopy
: A shallow copy that makes a new mutable collection object but leaves its values the same. -
kFLDeepCopy
: Nested mutable collections will also be copied. This is useful if you want to ensure that no other references can modify the object tree. -
kFLDeepCopyImmutables
: Immutable collections (and scalars) are also copied. The resulting object tree is now entirely heap-based, detached from any parsed Fleece data, so there's no danger of dangling references if that data is invalidated.
Mutable collections have a set
method to store a value at a particular index/key, and a remove
method to remove one. MutableArray also has append
to add a value at the end. The setters return a special Slot
type, which is a reference to where the value is stored; Slot
has setter methods that store different types into it. For example, to store 17 into the first item of an array, call array.set(0).setInt(17)
.
Note: Collections can contain
null
values, but notNULL
.
MutableArray
and MutableDict
also have the slightly-confusing-but-useful methods getMutableArray
and getMutableDict
. These are very useful when you have an immutable collection and want to make a mutable copy of it with some nested values changed.
Arrays and Dicts have iterators that let you look at their values one by one. Regular iterators are “shallow”, but there's a DeepIterator
class for when you ned to recursively visit every value in a tree.
It's OK to iterate over a NULL
collection reference; it acts like an empty collection.
Warning: As with most other collection APIs, it's illegal to modify a mutable collection while you're iterating it. There's no explicit check for this, but the results will be, as they say, “undefined”.
The idiom is that you use a for
loop to construct the iterator, test whether it's done, and move it to the next item:
for (Array::iterator i(myArray); i; ++i) {
doSomethingWith( *i );
}
for (Dict::iterator i(myDict); i; ++i) {
doSomethingWith( i.key(), i.value() );
}
Everything's a bit more awkward in C, isn't it? 😝
FLArrayIterator iter;
FLArrayIterator_Begin(myArray, &iter);
FLValue value;
while (NULL != (value = FLArrayIterator_GetValue(&iter))) {
doSomethingWith( value );
FLArrayIterator_Next(&iter);
}
FLDictIterator iter;
FLDictIterator_Begin(myDict, &iter);
FLValue value;
while (NULL != (value = FLDictIterator_GetValue(&iter))) {
FLString key = FLDictIterator_GetKeyString(&iter);
doSomethingWith( key, value );
FLDictIterator_Next(&iter);
}
WARNING: It is illegal to call FLArrayIterator_Next
or FLDictIterator_Next
when the iterator's already at the end! In particular, do not do this:
// Incorrect code, for demonstration only:
do {
value = FLDictIterator_GetValue(&iter);
if (value) { ... }
} while (FLDictIterator_Next(&iter)); // wrong! ☠️
This looks reasonable, but if myDict
is empty the iterator starts out at the end, so the first call to FLDictIterator_Next
is already illegal. The recommended loop in the first listing avoids this problem.
As described previously, an Encoder
is an object that generates encoded Fleece data using a streaming API. You use it like this:
- Construct an
Encoder
- Tell the encoder to begin a collection:
beginArray
orbeginDict
- Write values to the encoder, which adds them to the collection:
- If the collection is a dictionary, call writeKey to define the key for the value.
- To write a scalar, call:
writeNull
,writeBool
,writeInt
, etc. - To write a collection, recursively perform steps 2–4: Begin the collection, write values, end it.
- End the collection:
endArray
orendDict
- Call
finish
, which returns the encoded data
The most recently begun collection is the “current collection” that values will be added to. When that collection is ended, the containing collection becomes current.
Encoder enc;
enc.beginDict();
enc.writeKey("dimensions");
enc.beginArray();
enc.writeInt(10);
enc.writeInt(16);
enc.endArray();
enc.writeKey("color");
enc.writeString("blue");
enc.endDict();
alloc_slice encodedData = enc.finish();
If you already have the root collection as an Array or Dict object, just write it as the only value, without a begin or end call:
Encoder enc;
enc.writeValue(myRootDict);
alloc_slice encodedData = enc.finish();
Encoding can fail for a number of reasons, mostly through programmer error (like not nesting begin
/end
calls properly), but also if memory runs out.
The individual begin
/end
/write
methods return false on error, but it's easiest to just ignore those until the end of encoding and then check whether finish
returned a null slice. If so, you can check the Encoder's error
and errorMessage
properties for details.
The fastest, most dangerous way to parse Fleece is to call Value::fromData
, which takes a slice pointing to a block of Fleece data, and returns a pointer to its root object. (This pointer is not heap-allocated; it points inside the input data.) If the data isn't valid, NULL
is returned.
On the plus side, this allocates no memory; its only overhead is a quick scan through the data to make sure it’s not corrupted. The drawback is that you bear full responsibility for making sure the lifespan of that block of encoded data outlives any of the object pointers you got from it.
The better way to parse Fleece is to use a Doc
object. Its constructor takes an alloc_slice
containing Fleece data – this has to be heap-allocated, but it’s ref-counted, and the Doc takes ownership and ensures the data stays alive as long as it does. And Doc itself is ref-counted, so it’s easier to manage. There’s more documentation of Doc on the Advanced Fleece page.
Both methods of parsing take a trust parameter, whose value can be kFLTrusted
or kFLUntrusted
. This determines how much checking is done. Untrusted data is thoroughly scanned to make sure it's valid, at least to the extent that it won't lead to a crash. Trusted data goes through less (but some) checking. It's a speed vs. security tradeoff.
Warning: For security reasons, always use
kFLUntrusted
if the data comes from the network, or from an arbitrary file. Only usekFLTrusted
if the data is under your control — e.g. a record inside a database inside your app, or if the data has already passed a previous untrusted parse, or if you just encoded it yourself. Trusting corrupt or malicious data could cause Fleece to read outside the data's bounds in memory, resulting in garbage or crashes.
Fleece interoperates well with JSON!
Fleece has a JSON converter that takes JSON data and returns it translated to Fleece. The usual next step is to parse the Fleece (in trusted mode) to get to its root object. The Doc class encapsulates this for convenience:
slice jsonData("{\"hello\":12345}");
Doc convertedDoc = Doc::fromJSON(jsonData);
Dict root = convertedDoc.root().asDict();
There are two ways to generate JSON from Fleece:
- You can call
toJSON()
on anyValue
and get back a JSON string. - You can create an Encoder whose output format is JSON, by passing the format value
kFLEncodeJSON
to its constructor. The Encoder works just as usual, except that its output will be JSON instead of Fleece.
Note: Remember that binary-data type that isn't in JSON? Those values turn into base64-encoded strings.
JSON5 is a superset of JSON syntax that adds some JavaScript sugar for convenience. You can use single or double quotes; you can omit the quotes around keys; you can leave trailing commas at the end of a collection ... it's wonderful. 🤩
All of Fleece's JSON APIs support JSON5. You just need to change the method name slightly or pass an optional parameter; see the API docs for details.
Ready for more? Continue to the Advanced Fleece document, if you dare!