forked from w3c-ccg/hashlink
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.xml
467 lines (442 loc) · 17.9 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc PUBLIC "-//IETF//DTD RFC 2629//EN" "http://xml.resource.org/authoring/rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc7049 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7049.xml">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xsl" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc toc="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="std" ipr="trust200902" submissionType="independent"
docName="draft-sporny-hashlink-04">
<front>
<title>Cryptographic Hyperlinks</title>
<author initials="M.S." surname="Sporny" fullname="Manu Sporny">
<organization>Digital Bazaar</organization>
<address>
<postal>
<street>203 Roanoke Street W.</street>
<city>Blacksburg</city>
<region>VA</region>
<code>24060</code>
<country>US</country>
</postal>
<phone>+1 540 961 4469</phone>
<email>[email protected]</email>
<uri>http://manu.sporny.org/</uri>
</address>
</author>
<date month="Nov" year="2019" />
<area>Security</area>
<workgroup />
<keyword>hyperlink</keyword>
<keyword>cryptography</keyword>
<keyword>security</keyword>
<abstract>
<t>
When using a hyperlink to fetch a resource from the Internet, it is often useful
to know if the resource has changed since the data was published. Cryptographic
hashes, such as SHA-256, are often used to determine if published data has
changed in unexpected ways. Due to the nature of most hyperlinks, the
cryptographic hash is often published separately from the link itself.
This specification describes a data model and serialization formats for
expressing cryptographically protected hyperlinks. The mechanisms described
in the document enables a system to publish a hyperlink in a way that
empowers a consuming application to determine if the resource associated with
the hyperlink has changed in unexpected ways.
</t>
</abstract>
<note title="Feedback">
<t>
This specification is a work product of the
<eref target="https://w3c-dvcg.github.io/">W3C Digital Verification Community Group</eref>
and the
<eref target="https://w3c-ccg.github.io/">W3C Credentials Community Group</eref>.
Feedback related to this specification should be logged in the
<eref target="https://github.com/w3c-dvcg/hashlink/issues">issue tracker</eref>
or be sent to
<eref target="mailto:[email protected]">[email protected]</eref>.
</t>
</note>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>
Uniform Resource Locators (URLs) enable software developers to build
distributed systems that are able to publish information using hyperlinks.
When a client fetches a resource at the given hyperlink, the result is
typically a stream of data that the client may further process. Due to the
design of most hyperlinks, the data associated with a hyperlink may change
over time. This design feature is often not an issue for systems that
do not depend on static data.
</t>
<t>
Some software systems expect data published at a specific URL to not change.
For example, firmware files, operating system releases, security upgrades,
and other high-risk files are often distributed with associated manifest files.
These manifest files typically utilize a cryptographic hash per URL to
ensure that an attack to modify the files themselves will be detected:
<figure>
<artwork>
b1a653e5...de5d3e8f3 https://example.com/operating-system.iso
7b23bf52...557a0902c https://example.com/firmware-v4.35.bin
</artwork>
</figure>
</t>
<t>
An unfortunate downside of the manifest file approach is that a separate system
from the URL itself must be utilized to add this level of content integrity
protection. In addition, the cryptographic hash format for the files are often
application specific and are not easily upgradeable once newer and more
advanced cryptographic hash formats are standardized.
</t>
<t>
New types of distributed file storage networks have been deployed over the
past several decades. Examples include HTTP file mirrors for the Debian
Operating System, peer-to-peer file networks such as BitTorrent, and
content-addressed networks, such as the Inter Planetary File System (IPFS).
While each one of these systems have their own URL format, it is currently not
possible to express a content-addressed URL that associates the content
address to a file published on each one of these networks.
</t>
<t>
This specification provides a simple data model and serialization formats
for cryptographic hyperlinks that:
<list style="symbols">
<t>Enable existing URLs to add content integrity protection.</t>
<t>Provide a URL format for multi-sourced content integrity protected data.</t>
<t>Enable URL metadata to be discarded without having to re-encode the URL.</t>
<t>Enable algorithm agility for all data model components</t>
</list>
</t>
</section>
<section anchor="data-model" title="Hashlink Data Model">
<t>
The hashlink data model is a simple expression of a cryptographic hash of
the resource, one or more URLs, and a content type.
</t>
<section anchor="resource-hash" title="The Resource Hash">
<t>
The resource hash is the the mechanism that enables content integrity
protection for the associated data stream. The resource hash value MUST be
provided in a hashlink.
</t>
</section>
<section anchor="metadata" title="The Optional Metadata">
<t>
All metadata associated with the hashlink is optional and is provided to
enable a client to more easily discover data that matches the provided
resource hash.
</t>
<section anchor="md-url" title="URLs">
<t>
A hashlink may be associated with a set of one or more URLs that, when
dereferenced, result in data that matches the resource hash.
</t>
</section>
<section anchor="md-content-type" title="Content Type">
<t>
A hashlink may be associated with exactly one Content Type that may be used
in protocols that support content types, such as HTTP's Accept header.
</t>
</section>
<section anchor="md-x" title="Experimental Metadata">
<t>
Application developers often need to express other important metadata related
to their specific application. These developers MUST use this field to do so.
Data expressed in this field MAY conflict with keys chosen by other developers
in other applications. Experimental fields that become widely used are expected
to be standardized and become core metadata fields.
</t>
</section>
</section>
</section>
<section anchor="serialization" title="Hashlink Serialization">
<t>
A hashlink may be serialized in one or two ways. The first is the
RECOMMENDED method, called a "Hashlink URL", which is a compact URL
representation of the Hashlink data model. The second is called a
"Hashlink as a Parameterized URL", which MUST NOT be used unless there is no
mechanism available to upgrade the application's URL resolver.
</t>
<section anchor="hashlink-url" title="Hashlink URL">
<t>
The beginning of a Hashlink URL always starts with the following three
characters:
</t>
<figure>
<artwork>hl:</artwork>
</figure>
<t>
The remainder of the URL is a concatenation of the resource hash and,
optionally, the Hashlink URL metadata.
</t>
<section anchor="serializing-rh" title="Serializing the Resource Hash">
<t>
The value of the resource hash can be generated by utilizing the following
algorithm:
<list style="numbers">
<t>
Generate the raw hash value by processing the resource data using the
cryptographic hashing algorithm.
</t>
<t>
Generate the multihash value by encoding the raw hash using the
<xref target="multihash">Multihash Data Format</xref>.
</t>
<t>
Generate the multibase hash by encoding the multihash value using the
<xref target="multibase">Multibase Data Format</xref>.
</t>
<t>
Output the multibase hash as the resource hash.
</t>
</list>
The example below demonstrates the output of the algorithm above for a
hashlink that expresses the data "Hello World!" processed using the
SHA-2, 256 bit, 32 byte cryptographic algorithm which is then expressed using
the base-58 Bitcoin base-encoding format:
<figure>
<artwork>zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e</artwork>
</figure>
</t>
</section>
<section anchor="serializing-md" title="Serializing the Metadata">
<t>
To generate the value for the metadata, the metadata values are encoded in
the <xref target="RFC7049">CBOR Data Format</xref> using the following
algorithm:
<list style="numbers">
<t>
Create the raw output map (CBOR major type 5).
</t>
<t>
If at least one URL exists, add a CBOR key of 15 (0x0f) to the raw output map
with a value that is an array (CBOR major type 4).
<list style="numbers">
<t>
Encode each URL as a CBOR URI (CBOR type 32) and place it into the array.
</t>
</list>
</t>
<t>
If the content type exists, add a CBOR key of 14 (0x0e) to the raw output
map with a value that is a UTF-8 byte string (0x6) and the
value of the content type.
</t>
<t>
If experimental metadata exists, add a CBOR key of 13 (0x0d) and encode it
as a map by creating a raw output map (CBOR major type 5). For each item
in the map, serialize to CBOR where the CBOR major types, the key name,
and the value is derived from the input data. For example a key of "foo" and a
value of 200 would be encoded as a CBOR major type of 2 for the key and a
CBOR major type of 0 for the value.
</t>
<t>
Generate the multibase value by encoding the raw output map using the
Multibase Data Format.
</t>
</list>
The example below demonstrates the output of the algorithm above for metadata
containing a single URL ("http://example.org/hw.txt") with a content type
of "text/plain" expressed using the base-58 Bitcoin base-encoding format:
<figure>
<artwork>zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF</artwork>
</figure>
</t>
</section>
<section anchor="deserializing-md" title="Deserializing the Metadata">
<t>
To deserialize the metadata, the "Serializing the Metadata" algorithm is
reversed. Implementers MUST use the following table to deserialize keys to
JSON:
</t>
<texttable anchor="mh-registry-table" title="Multihash Algorithms Registry">
<ttcol align="center">Key (hex)</ttcol>
<ttcol align="center">JSON key</ttcol>
<ttcol align="center">JSON value</ttcol>
<c>0x0f</c><c>"url"</c><c>Array of strings</c>
<c>0x0e</c><c>"content-type"</c><c>string</c>
<c>0x0d</c><c>"experimental"</c><c>JSON Object</c>
</texttable>
<t>
The example below demonstrates the output of the algorithm above for metadata
containing a single URL ("http://example.org/hw.txt") with a content type
of "text/plain", and an experimental metadata key of "foo" and value of 123:
<figure>
<artwork>{
"url": ["http://example.org/hw.txt"],
"content-type": "text/plain",
"experimental": {
"foo": 123
}
}</artwork>
</figure>
</t>
</section>
<section anchor="ex-simple" title="A Simple Hashlink Example">
<t>
The example below demonstrates a simple hashlink that provides content
integrity protection for the "http://example.org/hw.txt" file, which has a
content type of "text/plain" (line breaks added for readability purposes):
<figure>
<artwork>hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF</artwork>
</figure>
</t>
</section>
</section>
<section anchor="hl-embedded" title="Alternate Embedded Encoding">
<t>
If a hashlink is embedded in a format where colons ":" are deemed inappropriate,
(such as an HTTP URL path, query or fragment)
each colon SHOULD be replaced by two hyphens "--".
</t>
<t>
The placement of any such embedding SHOULD be specified by the applicable
protocol or application.
</t>
<section anchor="ex-embedded" title="Embedded Example">
<t>
The example below demonstrates a simple hashlink that provides content
integrity protection for the "http://example.org/hw.txt" file, which has a
content type of "text/plain" (line breaks added for readability purposes):
<figure>
<artwork>hl--
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e--
zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF</artwork>
</figure>
</t>
</section>
</section>
</section>
<section anchor="processors" title="Hashlink Encoders and Decoders">
<t>
Hashlink encoders and decoders MUST support the following core algorithms:
<list style="numbers">
<t>
The SHA-2, 256 bit, 32 byte output cryptographic hashing algorithm and the
associated Multihash Data Format.
</t>
<t>
The Bitcoin base58-encoding and decoding algorithm and the associated Multibase
Data Format.
</t>
</list>
Implementations MAY support algorithms and data formats in addition to the ones
listed above.
</t>
</section>
<section anchor="security" title="Security Considerations">
<t>
This section documents the security attacks that are out of scope for this
specification as well as known attacks and mitigations against those attacks.
</t>
<section anchor="sec-badhash" title="Insecure Hashing Functions">
<t>
There are a number of insecure cryptographic hashing functions in deployment
today. Among these are MD5 and SHA-1. Implementers MUST throw an error by
default when encoding or decoding these values. Implementers MAY provide a
non-default library option to override the error.
</t>
</section>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc7049;
<reference anchor="multibase" target="https://tools.ietf.org/id/draft-multiformats-multibase-00.html">
<front>
<title>The Multihash Data Format</title>
<author initials="J.B." surname="Benet" fullname="Juan Benet">
<organization>Protocol Labs</organization>
</author>
<author initials="M.S." surname="Sporny" fullname="Manu Sporny">
<organization>Digital Bazaar</organization>
</author>
<date month="December" year="2018" />
</front>
</reference>
<reference anchor="multihash" target="https://tools.ietf.org/id/draft-multiformats-multihash-00.html">
<front>
<title>The Multihash Data Format</title>
<author initials="J.B." surname="Benet" fullname="Juan Benet">
<organization>Protocol Labs</organization>
</author>
<author initials="M.S." surname="Sporny" fullname="Manu Sporny">
<organization>Digital Bazaar</organization>
</author>
<date month="August" year="2018" />
</front>
</reference>
</references>
<section anchor="appendix-a" title="Security Considerations">
<t>
There are a number of security considerations to take into account when
implementing or utilizing this specification: TBD
</t>
</section>
<section anchor="appendix-c" title="Test Values">
<t>
The following test values may be used to verify the conformance of Hashlink
encoders and decoders.
</t>
<section anchor="tv-simple" title="Simple Hashlink URL">
<t>
The following Hashlink URL encodes the data "Hello World!" served from the
"http://example.org/hw.txt" URL with a content type of "text/plain". The
resource hash is generated using the SHA-2, 256 bit, 32 byte cryptographic
algorithm which is then encoded using the base-58 Bitcoin base-encoding format.
The metadata options are encoded using the base-58 Bitcoin base-encoding format.
The final Hashlink URL is (new lines added for readability purposes):
<figure>
<artwork>
hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF
</artwork>
</figure>
</t>
</section>
<section anchor="tv-multisource" title="Multi-sourced Hashlink URL">
<t>
The following Hashlink URL encodes the data "Hello World!" served from three
different networks. The first is a standard Web-based URL
("http://example.org/hw.txt"), the second is an IPFS-based URL
("ipfs:/ipfs/QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/hello"), and
the third is a Tor-based URL ("http://c4m3g2upq6pkufl4.onion/hworld.txt"). The
resource hash is generated using the SHA-2, 256 bit, 32 byte cryptographic
algorithm which is then encoded using the base-58 Bitcoin base-encoding format.
The metadata options are encoded using the base-58 Bitcoin base-encoding format.
The final Hashlink URL is (new lines added for readability purposes):
<figure>
<artwork>
hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
z333PdTakFeJueF2bim3PaaDqbtqjkpxUc8ETSWXe6dQLWXQWvqiUdw8TJrncx3uKhwfc
88MtM5xZbR27FhVRUKv9ogekamVtdE3UbXnXpMRT1AseCtoBUt1NE8x2SsnJxGfiZN45V
VSCp6jh4dgcufL16tWrHREiSYESEGP1J75yXCvAdvKPr7nb5aYujLeay8Ww
</artwork>
</figure>
</t>
</section>
</section>
<section anchor="acknowledgements" title="Acknowledgements">
<t>
The editors would like to thank the following individuals for feedback on and
implementations of the specification (in alphabetical order): TBD
</t>
<t>
Portions of the work on this specification have been funded by the United
States Department of Homeland Security's Science and Technology Directorate
under contract HSHQDC-17-C-00019. The content of this specification does not
necessarily reflect the position or the policy of the U.S. Government and no
official endorsement should be inferred.
</t>
</section>
</back>
</rfc>