-
Notifications
You must be signed in to change notification settings - Fork 298
Signed Documents
This is a specification for digitally signing a JSON document. It's not tied to Couchbase Lite (or the Couchbase Sync Gateway) though it was created for use with them. Nor do those products require signed documents.
Signing a document provides these benefits:
- The enclosed public key can be used as an identifier of the entity that signed the document.
- Any unauthorized modification of the document can be detected, as it'll invalidate the signature.
Thus a signature serves as a form of authentication. Why do we need this when servers like the Sync Gateway and CouchDB already supports several types of authentication?
- The signature authenticates a document, not a connection. This is a very important distinction when documents are replicated, especially when they can pass between multiple servers. A document may be forwarded by an entity that didn't create it, so the fact that the replicator connection is authenticated does not authenticate the document. The document has to carry its own credentials.
- Public keys allow for many types of identity and authentication. In the simplest case, an entity can create a key-pair and use the public key as its sole identification; this is useful even though it doesn't tie that entity to any external form of ID. More complex systems can use a hierarchical public-key infrastructure like X.509 or a "web of trust" like PGP.
History: This is an evolution of an earlier spec that I first wrote in 2009. The data format has been simplified but the basic principles are the same. This in turn was heavily influenced by SDSI, an experimental public-key infrastructure that used S-expressions as its universal data representation.
There are two main algorithms here: the signature algorithm takes a JSON object and a private key and produces a signature object, and the verification algorithm takes a JSON object and a signature object and determines whether or not the signature is valid for that object.
Unlike some other JSON-signature systems, the object being signed doesn't need to be specially encoded. This is important because it doesn't get in the way of systems (like Couchbase Lite or CouchDB) that read the object.
Another advantage is that the signature doesn't need to be contained in the signed object (or vice versa.) It is common for the signature to be contained -- and there's a special (signed)
property defined for it -- but there are situations where this isn't practical. For example, some storage systems may require metadata such as a signature to be stored externally. In this case it's up to the application to have a way to find the signature of an object.
When signing documents belonging to a CouchDB-family database (also including Couchbase Lite and the Couchbase Sync Gateway) it's important to handle the document metadata correctly.
The key point is that the document ID and the parent revision ID must be signed. If not, the document can be used for replay attacks. If the doc ID isn't signed, an attacker can change it to another ID and create a copy of the document. If signature doesn't include the parent revision ID, an attacker can re-post the document at any time, reverting it to an older version.
Specifically:
- When signing a document or verifying a signature, its
_id
property MUST be included in the JSON being signed. - The parent rev ID isn't stored in the document, so it needs to be explicitly added as a
parent_rev
property. This property MUST be stored in a signed Couch-type document (and included in the JSON being signed) and MUST be equal to the_rev
property of the parent revision, unless this is a first-generation document with no parent revision, in which the property MUST be absent. - The only other metadata property that typically appears is
_attachments
-- in principle this would be good to sign, but I am not sure its exact contents will stay the same across replication (for example thedigest
properties might change format.) More research is needed.
This is a data blob, tagged with the identity of the algorithm that produced it. It's encoded as an array of two strings:
- A short string identifying the algorithm or algorithm family, e.g. "SHA" for the SHA digest algorithms.
- The base64-encoded data blob.
It's not necessary to use separate algorithm strings for every size variation of an algorithm, since the size can be determined by the length of the associated blob. For example, instead of "RSA-2048" and "RSA-4096", just "RSA" suffices since the associated key already determines the key size.
-
SHA
: A SHA-family digest (SHA-1, SHA-256, etc.) -
RSA
: An RSA public key, encoded in ASN.1 BSAFE format
This is a JSON object that acts as a digital signature of some other JSON object (without specifying where that other object is.)
A signature object has at least the following properties:
-
digest
: A cryptographic digest of the canonical encoding of the object being signed, encoded as an algorithmic blob. -
key
: The public key of the key-pair performing the signing, encoded as an algorithmic blob. -
sig
: The digital signature of the canonical encoding of the signature object minus this field. This is just a base64-encoded blob; the algorithm is implicitly the same as the one used for thekey
property.
Optional properties include:
-
date
: A JSON-format timestamp identifying when the signature was generated. -
expires
: The number of minutes the signature remains valid after being generated.
This is simply a JSON object that directly contains its signature as the value of a (signed)
property. Obviously this property needs to be ignored while computing the canonical digest of the object.
Digest algorithms like SHA-1 operate on raw binary data, not abstract objects like JSON. There are many different ways to encode the same JSON object as data, which will all result in different digests. So for the signer and verifier to agree on the same digest of an object, we have to define a canonical encoding algorithm that always maps equivalent objects to identical data.
There is no standard for canonical JSON encoding yet, but the OLPC group has documented one that's pretty reasonable:
- No whitespace.
- Numbers must be representable as 48-bit integers (i.e. in the range [-2^47 .. 2^47-1].)
- Numbers cannot have decimal points nor scientific notation nor leading zeros. "-0" is not allowed.
- Strings (including keys) are converted to Unicode Normalization Form C.
- No escape sequences in strings, other than
\"
and\\
. All other characters must be represented literally, including control characters. - Object keys are lexicographically sorted by Unicode character values (code points). The sorting occurs before escape sequences are added.
- The entire output is encoded in UTF-8.
Note: Non-integers are forbidden because different formatting libraries will convert them to textual form in different ways.
Note: Integers are restricted to 48-bit, not 64-bit, because many JSON parsers convert numbers to double-precision floating point, which is a 64-bit value but only has about 50 bits of precision (mantissa).
Note: The above-linked OLPC spec says "string are uninterpreted bytes" and "arbitrary [binary] content may be represented as a string" — this is untrue. The JSON specification states that "a string is a sequence of zero or more Unicode characters". The encoding of a string must therefore be valid UTF-8 data. The only safe way to store binary blobs in JSON is to encode them somehow, typically as Base64.
- Compute the canonical digest of the object being signed.
- Create an unfinished signature object with only
digest
andkey
properties. - If desired, add
date
andexpires
properties. - Add any other optional properties desired.
- Compute the canonical digest of the unfinished signature object.
- Generate a digital signature of the canonical digest from step 4, using the private key that matches the public key used in step 2.
- Add the base64-encoded signature as the
sig
property of the signature object.
Note: The purpose of removing underscore-prefixed properties from the canonical form is to exclude those properties from the signature.
- Temporarily remove from the target object any properties that should not be considered part of its content; for instance, metadata like a date received (or the
(signed)
property itself, in the case of a signed object.) - Compute the canonical digest of the target object, using the algorithm given in the signature's
digest
property. - Compare this digest with the one contained in the
digest
property of the signature. If they aren't equal, fail (the object does not match what was signed.) - Copy the signature object and remove the
sig
property from the copy. - Compute the canonical digest of the copied signature object.
- Verify the digest against the signature contained in the
sig
property, using the public key contained in thekey
property. If verification fails, fail (the signature itself has been altered.) - If the signature contains a
date
property: - If that date is in the future, fail (not valid yet, or else there's unacceptable clock skew.)
- If the signature also contains an
expires
property, add that number of minutes to thedate
. If the resulting time is in the past, fail (signature expired). - Succeed: the signature is valid!
At any step:
- If any value in the signature object is invalid (date not in ISO-8601 format, invalid base64, digest not expressed as 2-element array, etc.), then fail (the signature is syntactically invalid).
- If any algorithm string is unrecognized or the program can't perform that algorithm, then fail (not possible to verify the signature.) It is not known whether the signature is valid, but the application should not trust the signature or the object that was signed.
{
"digest": ["SHA", "CVVhu8Ux+kv7dZHV+9gV7q5tWVA="],
"key": [
"RSA",
"MIIBCgKCAQEAyjt5YE/R0f8fkQS95RjV0KqogjtNIIMiv7EuxYZLXL0AxXHKH13VmUTUis9PjtlsW3CoBNldPUyj5Xmujn39AlIhTioUXtBOrySrfAiaqfo28ytavY2q2X75YzQwLbTt1mRaP4Vl/NyYFf1sx7EfBoC807VcXbSbajxf7T5E9o/zwBgTME6nvXy1OPT+LkOHPmoat9RM37rOhBhA1hLHG2zxeQum31Ck0TrKJrefgmITQQ/SQsX5d8b9vXwvYrc7enGU0EfxBL4Ni62+mPYuFkh6uYrvoTNZ0wqSnhljF8C8JsxwQZ7zUWvRFhpsR1Xk71XYGcis/ZxiWXOQ+7LvAQIDAQAB"
],
"sig": "nBMh3nrOPwsQIrduAhHSaXIwtHQ74xFz0S4YN2IrSPhxUtTomuRSwO0vHHjHHZFKyReYJUikmVrJ7gkObdO325E07bRYfRG2phao1R1D/Jmj0rBEhAXaDbkfSd99URJjzsjxCagwRXU2JjrjNsih53dUJXKwYcyPpjgwhBy7Nzs8PjJCr4szA//ckLtSBI8G5pjY8eTrPR2udLIflwUgji51sxRvT6+GRFjqWH9JeLPoyvK6J1E3+xsCj397dUAcodCgomotnjghC/VywK/O7wDjgA9aj8/OyMhTyf3MPGjF05zQj2ggjo76Yuqz9Z7aHp5A9eJeARNKqTy2646gWQ=="
}