Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document stream() detection limitation #434

Merged
merged 9 commits into from
Jul 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 25 additions & 13 deletions core.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -355,32 +355,44 @@ declare namespace core {
*/
const mimeTypes: Set<core.MimeType>;

interface StreamOptions {
/**
Overrides the default sample size of 4100 bytes.
*/
readonly sampleSize?: number
}

/**
Detect the file type of a readable stream.
Returns a `Promise` which resolves to the original readable stream argument, but with an added `fileType` property, which is an object like the one returned from `FileType.fromFile()`.

This method can be handy to put in between a stream, but it comes with a price.
Internally `stream()` builds up a buffer of `sampleSize` bytes, used as a sample, to determine the file type.
The sample size impacts the file detection resolution.
A smaller sample size will result in lower probability of the best file type detection.

*Note:* This method is only available when using Node.js.

@param readableStream - A [readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable) containing a file to examine.
@returns A `Promise` which resolves to the original readable stream argument, but with an added `fileType` property, which is an object like the one returned from `FileType.fromFile()`.

@example
```
import * as fs from 'fs';
import * as crypto from 'crypto';
import fileType = require('file-type');
import got = require('got');
import FileType = require('file-type');

(async () => {
const read = fs.createReadStream('encrypted.enc');
const decipher = crypto.createDecipheriv(alg, key, iv);
const stream = await fileType.stream(read.pipe(decipher));
const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

console.log(stream.fileType);
//=> {ext: 'mov', mime: 'video/quicktime'}
(async () => {
const stream1 = got.stream(url);
const stream2 = await FileType.stream(stream1, {sampleSize: 1024});

const write = fs.createWriteStream(`decrypted.${stream.fileType.ext}`);
stream.pipe(write);
if (stream2.fileType && stream2.fileType.mime === 'image/jpeg') {
// stream2 can be used to stream the JPEG image (from the very beginning of the stream)
}
})();
```
*/
function stream(readableStream: ReadableStream): Promise<core.ReadableStreamWithFileType>
function stream(readableStream: ReadableStream, options?: StreamOptions): Promise<core.ReadableStreamWithFileType>
}

export = core;
9 changes: 7 additions & 2 deletions core.js
Original file line number Diff line number Diff line change
Expand Up @@ -1414,10 +1414,15 @@ async function _fromTokenizer(tokenizer) {
}
}

const stream = readableStream => new Promise((resolve, reject) => {
const stream = (readableStream, options) => new Promise((resolve, reject) => {
// Using `eval` to work around issues when bundling with Webpack
const stream = eval('require')('stream'); // eslint-disable-line no-eval

options = {
sampleSize: minimumBytes,
...options
};

readableStream.on('error', reject);
readableStream.once('readable', async () => {
// Set up output stream
Expand All @@ -1431,7 +1436,7 @@ const stream = readableStream => new Promise((resolve, reject) => {
}

// Read the input stream and detect the filetype
const chunk = readableStream.read(minimumBytes) || readableStream.read() || Buffer.alloc(0);
const chunk = readableStream.read(options.sampleSize) || readableStream.read() || Buffer.alloc(0);
try {
const fileType = await fromBuffer(chunk);
pass.fileType = fileType;
Expand Down
44 changes: 40 additions & 4 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,13 +278,49 @@ Type: [`ITokenizer`](https://github.com/Borewit/strtok3#tokenizer)

A file source implementing the [tokenizer interface](https://github.com/Borewit/strtok3#tokenizer).

### FileType.stream(readableStream)

Detect the file type of a readable stream.
### FileType.stream(readableStream, options?)

Returns a `Promise` which resolves to the original readable stream argument, but with an added `fileType` property, which is an object like the one returned from `FileType.fromFile()`.

*Note:* This method is only available using Node.js.
This method can be handy to put in between a stream, but it comes with a price.
Internally `stream()` builds up a buffer of `sampleSize` bytes, used as a sample, to determine the file type.
The sample size impacts the file detection resolution.
A smaller sample size will result in lower probability of the best file type detection.

*Note:* This method is only available when using Node.js.

#### readableStream

Type: [`stream.Readable`](https://nodejs.org/api/stream.html#stream_class_stream_readable)

#### options

Type: `object`

##### sampleSize

Type: `number`\
Default: `4100`

The sample size in bytes.

#### Example

```js
const got = require('got');
const FileType = require('file-type');

const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';

(async () => {
const stream1 = got.stream(url);
const stream2 = await FileType.stream(stream1, {sampleSize: 1024});

if (stream2.fileType && stream2.fileType.mime === 'image/jpeg') {
// stream2 can be used to stream the JPEG image (from the very beginning of the stream)
}
})();
```

#### readableStream

Expand Down
16 changes: 16 additions & 0 deletions test.js
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,12 @@ test('.stream() method - short stream', async t => {
t.deepEqual(bufferA, bufferB);
});

test('.stream() method - no end-of-stream errors', async t => {
const file = path.join(__dirname, 'fixture', 'fixture.ogm');
const stream = await FileType.stream(fs.createReadStream(file), {sampleSize: 30});
t.is(stream.fileType, undefined);
});

test('.stream() method - error event', async t => {
const errorMessage = 'Fixture';

Expand All @@ -351,6 +357,16 @@ test('.stream() method - error event', async t => {
await t.throwsAsync(FileType.stream(readableStream), errorMessage);
});

test('.stream() method - sampleSize option', async t => {
const file = path.join(__dirname, 'fixture', 'fixture.ogm');
let stream = await FileType.stream(fs.createReadStream(file), {sampleSize: 30});
t.is(typeof (stream.fileType), 'undefined', 'file-type cannot be determined with a sampleSize of 30');

stream = await FileType.stream(fs.createReadStream(file), {sampleSize: 4100});
t.is(typeof (stream.fileType), 'object', 'file-type can be determined with a sampleSize of 4100');
t.is(stream.fileType.mime, 'video/ogg');
});

test('FileType.extensions.has', t => {
t.true(FileType.extensions.has('jpg'));
t.false(FileType.extensions.has('blah'));
Expand Down