Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations #117

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 38 additions & 3 deletions 1.2.0/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ <h2>Terminology</h2>
[[FRICTIONLESS-DATA-PACKAGE]] specification. It MUST contain the following
keys:

- `profile`: Set to `data-package`
- `profile`: Set to `wacz`
- `resources`: a list of file names, paths, sizes and fixity for all files
contained in the WACZ.

Expand All @@ -374,8 +374,8 @@ <h2>Terminology</h2>
that allow rendering applications to present the user with <a>contextual
information</a> about the web archive:

- `profile`: the string "wacz/1.2.0"
- `title`: a string or one sentence description for the collection
- `profile`: the string "data-package/wacz"
- `title`: a string or one sentence description for the web archive
- `description`: a longer description of the archive's contents
which MUST be Markdown formatted (plain text is valid Markdown)
[[RFC7763].
Expand All @@ -396,6 +396,41 @@ <h2>Terminology</h2>
- `url`: The URL of the collection's home page
- `ts`: An [[RFC3339]] date for when the snapshot of URL was made

## Aggregations

Due to file size limitations, technical workflow details, and the need to
thematically group web archives into collections it can be useful to provide an
*aggregated* view of multiple WACZ files. To support these use cases the
`resources` list in a WACZ's `datapackage.json` MAY contain links to WACZ files
instead of WARC files. The metadata in the WACZ's `datapackage.json` refers to
the aggregation, and in addition:

* `profile`: MUST be set to "data-package/wacz-aggregation"
* `resources`: each resource MUST contain a `path` that points to a URL for the specified WACZ

Other metadata in the `datapackage.json` refers to the aggregation. If desired
additional properties MAY be included for each listed `resource`.

<pre class="example">
"profile": "WACZ-Aggregation",
"title": "My Collection",
"resources": [
{
"name": "Website Archive 1",
"path": "https://example.org/web-archive-1.wacz",
"hash": "sha256:8a7fc0d302700bed02294404a627ddbbf0e35487565b1c6181c729dff8d2fff6",
"bytes": 75293838
},
{
"name": "Website Archive 2",
"path": "https://example.org/web-archive-2.wacz",
"hash": "sha256:0e7101316ba5d4b66f86a371ee615fbd20f9d3f32d32563ed2c829db062f7714",
"bytes": 11469796
},
...
]
</pre>

## CDXJ

The CDXJ format provides a standardized way of representing the files in
Expand Down