Skip to content

Small Python package to infer the file type checking the magic numbers signature

License

Notifications You must be signed in to change notification settings

ninousf/filetype.py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

filetype.py Build Status PyPI API

Small and dependency free Python package to infer file type and MIME type checking the magic numbers signature of a file or buffer.

This is a Python port from filetype Go package.

Features

  • Simple and friendly API
  • Supports a wide range of file types
  • Provides file extension and MIME type inference
  • File discovery by extension or MIME type
  • File discovery by kind (image, video, audio...)
  • Pluggable: add new custom type matchers
  • Fast, even processing large files
  • Only first 261 bytes representing the max file header is required, so you can just pass a list of bytes
  • Dependency free (just Python code, no C extensions, no libmagic bindings)
  • Cross-platform file recognition

Installation

pip install filetype

API

See annotated API reference.

Examples

Simple file type checking

import filetype

def main():
    kind = filetype.guess('tests/fixtures/sample.jpg')
    if kind is None:
        print('Cannot guess file type!')
        return

    print('File extension: %s' % kind.extension)
    print('File MIME type: %s' % kind.mime)

if __name__ == '__main__':
    main()

Supported types

Image

  • jpg - image/jpeg
  • png - image/png
  • gif - image/gif
  • webp - image/webp
  • cr2 - image/x-canon-cr2
  • tif - image/tiff
  • bmp - image/bmp
  • jxr - image/vnd.ms-photo
  • psd - image/vnd.adobe.photoshop
  • ico - image/x-icon

Video

  • mp4 - video/mp4
  • m4v - video/x-m4v
  • mkv - video/x-matroska
  • webm - video/webm
  • mov - video/quicktime
  • avi - video/x-msvideo
  • wmv - video/x-ms-wmv
  • mpg - video/mpeg
  • flv - video/x-flv

Audio

  • mid - audio/midi
  • mp3 - audio/mpeg
  • m4a - audio/m4a
  • ogg - audio/ogg
  • flac - audio/x-flac
  • wav - audio/x-wav
  • amr - audio/amr

Archive

  • epub - application/epub+zip
  • zip - application/zip
  • tar - application/x-tar
  • rar - application/x-rar-compressed
  • gz - application/gzip
  • bz2 - application/x-bzip2
  • 7z - application/x-7z-compressed
  • xz - application/x-xz
  • pdf - application/pdf
  • exe - application/x-msdownload
  • swf - application/x-shockwave-flash
  • rtf - application/rtf
  • eot - application/octet-stream
  • ps - application/postscript
  • sqlite - application/x-sqlite3
  • nes - application/x-nintendo-nes-rom
  • crx - application/x-google-chrome-extension
  • cab - application/vnd.ms-cab-compressed
  • deb - application/x-deb
  • ar - application/x-unix-archive
  • Z - application/x-compress
  • lz - application/x-lzip

Font

  • woff - application/font-woff
  • woff2 - application/font-woff
  • ttf - application/font-sfnt
  • otf - application/font-sfnt

Benchmarks

Measured using real files.

Environment: OSX x64 i7 2.7 Ghz

------------------------------------------------------------------------------------------ benchmark: 7 tests ------------------------------------------------------------------------------------------
Name (time in ns)                       Min                     Max                   Mean                StdDev                 Median                   IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_infer_image_from_bytes        357.6279 (1.0)       29,166.5395 (1.0)       1,642.3360 (1.0)        380.9934 (1.0)       1,509.9843 (1.0)        158.9457 (1.0)       9095;13752  102301           6
test_infer_audio_from_bytes        953.6743 (2.67)      96,082.6874 (3.29)     16,534.5880 (10.07)    3,002.1143 (7.88)     15,974.0448 (10.58)      953.6743 (6.00)       4514;6051   41528           1
test_infer_video_from_bytes     13,828.2776 (38.67)    272,989.2731 (9.36)     16,151.3144 (9.83)     3,361.2320 (8.82)     15,020.3705 (9.95)       953.6743 (6.00)       2522;2887   22193           1
test_infer_image_from_disk      15,974.0448 (44.67)    108,957.2906 (3.74)     18,621.0844 (11.34)    3,895.4441 (10.22)    17,166.1377 (11.37)    1,192.0929 (7.50)       1528;1804   10206           1
test_infer_video_from_disk      23,841.8579 (66.67)    229,120.2545 (7.86)     28,691.3476 (17.47)    6,242.9901 (16.39)    25,987.6251 (17.21)    4,053.1158 (25.50)      1987;1247   15651           1
test_infer_zip_from_disk        26,941.2994 (75.33)    230,073.9288 (7.89)     32,123.3861 (19.56)    7,524.4988 (19.75)    29,087.0667 (19.26)    4,768.3716 (30.00)      1349;1292   16132           1
test_infer_tar_from_disk        33,855.4382 (94.67)    164,031.9824 (5.62)     36,884.4401 (22.46)    4,489.4443 (11.78)    36,001.2054 (23.84)      953.6743 (6.00)       1036;1828   14666           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

License

MIT - Tomas Aparicio

About

Small Python package to infer the file type checking the magic numbers signature

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.2%
  • Makefile 2.8%