An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
All items, apart from the source code under 'tools', is CC0 licenced unless otherwise stated. The source code is Apache 2.0 Licenced unless otherwise stated.
A recent summary of the contents of the repository can be found here.
See http://wiki.curatecamp.org/index.php/Collecting_format_ID_test_files for more information.
See metadata-template.ext.md for a simple per-file metadata template.
As well as pooling example files, we also pool format signatures:
- Tika signatures staged here: https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/tika-bl-staging
- Tika signatures later merged here: [https://github.com/openplanets/format-corpus/blob/master/tools/fidget/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml here]
- DROID signatures go [https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/droid here].
More details here: http://wiki.curatecamp.org/index.php/Improving_format_ID_coverage