Skip to content

Commit

Permalink
Merge pull request #320 from ZJONSSON/update-docs
Browse files Browse the repository at this point in the history
Update docs
  • Loading branch information
ZJONSSON authored Jun 8, 2024
2 parents 23cc3b8 + 1b4b210 commit 2608bc2
Show file tree
Hide file tree
Showing 2 changed files with 204 additions and 173 deletions.
375 changes: 203 additions & 172 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,28 +13,218 @@

# unzipper

This is an active fork and drop-in replacement of the [node-unzip](https://github.com/EvanOxfeld/node-unzip) and addresses the following issues:
* finish/close events are not always triggered, particular when the input stream is slower than the receivers
* Any files are buffered into memory before passing on to entry
## Installation

The structure of this fork is similar to the original, but uses Promises and inherit guarantees provided by node streams to ensure low memory footprint and emits finish/close events at the end of processing. The new `Parser` will push any parsed `entries` downstream if you pipe from it, while still supporting the legacy `entry` event as well.
```bash
$ npm install unzipper
```

Breaking changes: The new `Parser` will not automatically drain entries if there are no listeners or pipes in place.
## Open methods

Unzipper provides simple APIs similar to [node-tar](https://github.com/isaacs/node-tar) for parsing and extracting zip files.
There are no added compiled dependencies - inflation is handled by node.js's built in zlib support.
The open methods allow random access to the underlying files of a zip archive, from disk or from the web, s3 or a custom source.

Please note: Methods that use the Central Directory instead of parsing entire file can be found under [`Open`](#open)
The open methods return a promise on the contents of the central directory of a zip file, with individual `files` listed in an array.

Chrome extension files (.crx) are zipfiles with an [extra header](http://www.adambarth.com/experimental/crx/docs/crx.html) at the start of the file. Unzipper will parse .crx file with the streaming methods (`Parse` and `ParseOne`). The `Open` methods will check for `crx` headers and parse crx files, but only if you provide `crx: true` in options.
Each file record has the following methods, providing random access to the underlying files:
* `stream([password])` - returns a stream of the unzipped content which can be piped to any destination
* `buffer([password])` - returns a promise on the buffered content of the file.

## Installation
If the file is encrypted you will have to supply a password to decrypt, otherwise you can leave blank.

```bash
$ npm install unzipper
Unlike `adm-zip` the Open methods will never read the entire zipfile into buffer.

The last argument to the `Open` methods is an optional `options` object where you can specify `tailSize` (default 80 bytes), i.e. how many bytes should we read at the end of the zipfile to locate the endOfCentralDirectory. This location can be variable depending on zip64 extensible data sector size. Additionally you can supply option `crx: true` which will check for a crx header and parse the file accordingly by shifting all file offsets by the length of the crx header.


### Open.file([path], [options])

Returns a Promise to the central directory information with methods to extract individual files. `start` and `end` options are used to avoid reading the whole file.

Here is a simple example of opening up a zip file, printing out the directory information and then extracting the first file inside the zipfile to disk:
```js
async function main() {
const directory = await unzipper.Open.file('path/to/archive.zip');
console.log('directory', directory);
return new Promise( (resolve, reject) => {
directory.files[0]
.stream()
.pipe(fs.createWriteStream('firstFile'))
.on('error',reject)
.on('finish',resolve)
});
}

main();
```

If you want to extract all files from the zip file, the directory object supplies an extract method. Here is a quick example:

```js
async function main() {
const directory = await unzipper.Open.file('path/to/archive.zip');
await directory.extract({ path: '/path/to/destination' })
}
```


### Open.url([requestLibrary], [url | params], [options])

This function will return a Promise to the central directory information from a URL point to a zipfile. Range-headers are used to avoid reading the whole file. Unzipper does not ship with a request library so you will have to provide it as the first option.

Live Example: (extracts a tiny xml file from the middle of a 500MB zipfile)

```js
const request = require('request');
const unzipper = require('./unzip');

async function main() {
const directory = await unzipper.Open.url(request,'http://www2.census.gov/geo/tiger/TIGER2015/ZCTA5/tl_2015_us_zcta510.zip');
const file = directory.files.find(d => d.path === 'tl_2015_us_zcta510.shp.iso.xml');
const content = await file.buffer();
console.log(content.toString());
}

main();
```


This function takes a second parameter which can either be a string containing the `url` to request, or an `options` object to invoke the supplied `request` library with. This can be used when other request options are required, such as custom headers or authentication to a third party service.

```js
const request = require('google-oauth-jwt').requestWithJWT();

const googleStorageOptions = {
url: `https://www.googleapis.com/storage/v1/b/m-bucket-name/o/my-object-name`,
qs: { alt: 'media' },
jwt: {
email: google.storage.credentials.client_email,
key: google.storage.credentials.private_key,
scopes: ['https://www.googleapis.com/auth/devstorage.read_only']
}
});

async function getFile(req, res, next) {
const directory = await unzipper.Open.url(request, googleStorageOptions);
const file = zip.files.find((file) => file.path === 'my-filename');
return file.stream().pipe(res);
});
```

## Quick Examples

### Open.s3([aws-sdk], [params], [options])

This function will return a Promise to the central directory information from a zipfile on S3. Range-headers are used to avoid reading the whole file. Unzipper does not ship with with the aws-sdk so you have to provide an instantiated client as first arguments. The params object requires `Bucket` and `Key` to fetch the correct file.

Example:

```js
const unzipper = require('./unzip');
const AWS = require('aws-sdk');
const s3Client = AWS.S3(config);

async function main() {
const directory = await unzipper.Open.s3(s3Client,{Bucket: 'unzipper', Key: 'archive.zip'});
return new Promise( (resolve, reject) => {
directory.files[0]
.stream()
.pipe(fs.createWriteStream('firstFile'))
.on('error',reject)
.on('finish',resolve)
});
}

main();
```


### Open.buffer(buffer, [options])

If you already have the zip file in-memory as a buffer, you can open the contents directly.

Example:

```js
// never use readFileSync - only used here to simplify the example
const buffer = fs.readFileSync('path/to/arhive.zip');

async function main() {
const directory = await unzipper.Open.buffer(buffer);
console.log('directory',directory);
// ...
}

main();
```


### Open.custom(source, [options])

This function can be used to provide a custom source implementation. The source parameter expects a `stream` and a `size` function to be implemented. The size function should return a `Promise` that resolves the total size of the file. The stream function should return a `Readable` stream according to the supplied offset and length parameters.

Example:

```js
// Custom source implementation for reading a zip file from Google Cloud Storage
const { Storage } = require('@google-cloud/storage');

async function main() {
const storage = new Storage();
const bucket = storage.bucket('my-bucket');
const zipFile = bucket.file('my-zip-file.zip');

const customSource = {
stream: function(offset, length) {
return zipFile.createReadStream({
start: offset,
end: length && offset + length
})
},
size: async function() {
const objMetadata = (await zipFile.getMetadata())[0];
return objMetadata.size;
}
};

const directory = await unzipper.Open.custom(customSource);
console.log('directory', directory);
// ...
}

main();
```


### Open.[method].extract()

The directory object returned from `Open.[method]` provides an `extract` method which extracts all the files to a specified `path`, with an optional `concurrency` (default: 1).

Example (with concurrency of 5):

```js
unzip.Open.file('path/to/archive.zip')
.then(d => d.extract({path: '/extraction/path', concurrency: 5}));
```


Please note: Methods that use the Central Directory instead of parsing entire file can be found under [`Open`](#open)

Chrome extension files (.crx) are zipfiles with an [extra header](http://www.adambarth.com/experimental/crx/docs/crx.html) at the start of the file. Unzipper will parse .crx file with the streaming methods (`Parse` and `ParseOne`).


## Streaming an entire zip file (legacy)

This library began as an active fork and drop-in replacement of the [node-unzip](https://github.com/EvanOxfeld/node-unzip) to address the following issues:
* finish/close events are not always triggered, particular when the input stream is slower than the receivers
* Any files are buffered into memory before passing on to entry

Originally the only way to use the library was to stream the entire zip file. This method is inefficient if you are only interested in selected files from the zip files. Additionally this method can be error prone since it relies on the local file headers which could be wrong.

The structure of this fork is similar to the original, but uses Promises and inherit guarantees provided by node streams to ensure low memory footprint and emits finish/close events at the end of processing. The new `Parser` will push any parsed `entries` downstream if you pipe from it, while still supporting the legacy `entry` event as well.

Breaking changes: The new `Parser` will not automatically drain entries if there are no listeners or pipes in place.

Unzipper provides simple APIs similar to [node-tar](https://github.com/isaacs/node-tar) for parsing and extracting zip files.
There are no added compiled dependencies - inflation is handled by node.js's built in zlib support.

### Extract to a directory
```js
Expand Down Expand Up @@ -203,164 +393,5 @@ fs.createReadStream('path/to/archive.zip')
});
```

## Open
Previous methods rely on the entire zipfile being received through a pipe. The Open methods load take a different approach: load the central directory first (at the end of the zipfile) and provide the ability to pick and choose which zipfiles to extract, even extracting them in parallel. The open methods return a promise on the contents of the directory, with individual `files` listed in an array. Each file element has the following methods:
* `stream([password])` - returns a stream of the unzipped content which can be piped to any destination
* `buffer([password])` - returns a promise on the buffered content of the file.
If the file is encrypted you will have to supply a password to decrypt, otherwise you can leave blank.
Unlike `adm-zip` the Open methods will never read the entire zipfile into buffer.

The last argument is optional `options` object where you can specify `tailSize` (default 80 bytes), i.e. how many bytes should we read at the end of the zipfile to locate the endOfCentralDirectory. This location can be variable depending on zip64 extensible data sector size. Additionally you can supply option `crx: true` which will check for a crx header and parse the file accordingly by shifting all file offsets by the length of the crx header.

### Open.file([path], [options])
Returns a Promise to the central directory information with methods to extract individual files. `start` and `end` options are used to avoid reading the whole file.

Example:
```js
async function main() {
const directory = await unzipper.Open.file('path/to/archive.zip');
console.log('directory', directory);
return new Promise( (resolve, reject) => {
directory.files[0]
.stream()
.pipe(fs.createWriteStream('firstFile'))
.on('error',reject)
.on('finish',resolve)
});
}

main();
```

### Open.url([requestLibrary], [url | params], [options])
This function will return a Promise to the central directory information from a URL point to a zipfile. Range-headers are used to avoid reading the whole file. Unzipper does not ship with a request library so you will have to provide it as the first option.

Live Example: (extracts a tiny xml file from the middle of a 500MB zipfile)

```js
const request = require('request');
const unzipper = require('./unzip');

async function main() {
const directory = await unzipper.Open.url(request,'http://www2.census.gov/geo/tiger/TIGER2015/ZCTA5/tl_2015_us_zcta510.zip');
const file = directory.files.find(d => d.path === 'tl_2015_us_zcta510.shp.iso.xml');
const content = await file.buffer();
console.log(content.toString());
}

main();
```


This function takes a second parameter which can either be a string containing the `url` to request, or an `options` object to invoke the supplied `request` library with. This can be used when other request options are required, such as custom headers or authentication to a third party service.

```js
const request = require('google-oauth-jwt').requestWithJWT();

const googleStorageOptions = {
url: `https://www.googleapis.com/storage/v1/b/m-bucket-name/o/my-object-name`,
qs: { alt: 'media' },
jwt: {
email: google.storage.credentials.client_email,
key: google.storage.credentials.private_key,
scopes: ['https://www.googleapis.com/auth/devstorage.read_only']
}
});

async function getFile(req, res, next) {
const directory = await unzipper.Open.url(request, googleStorageOptions);
const file = zip.files.find((file) => file.path === 'my-filename');
return file.stream().pipe(res);
});
```

### Open.s3([aws-sdk], [params], [options])
This function will return a Promise to the central directory information from a zipfile on S3. Range-headers are used to avoid reading the whole file. Unzipper does not ship with with the aws-sdk so you have to provide an instantiated client as first arguments. The params object requires `Bucket` and `Key` to fetch the correct file.

Example:

```js
const unzipper = require('./unzip');
const AWS = require('aws-sdk');
const s3Client = AWS.S3(config);

async function main() {
const directory = await unzipper.Open.s3(s3Client,{Bucket: 'unzipper', Key: 'archive.zip'});
return new Promise( (resolve, reject) => {
directory.files[0]
.stream()
.pipe(fs.createWriteStream('firstFile'))
.on('error',reject)
.on('finish',resolve)
});
}

main();
```

### Open.buffer(buffer, [options])
If you already have the zip file in-memory as a buffer, you can open the contents directly.

Example:

```js
// never use readFileSync - only used here to simplify the example
const buffer = fs.readFileSync('path/to/arhive.zip');

async function main() {
const directory = await unzipper.Open.buffer(buffer);
console.log('directory',directory);
// ...
}

main();
```

### Open.custom(source, [options])
This function can be used to provide a custom source implementation. The source parameter expects a `stream` and a `size` function to be implemented. The size function should return a `Promise` that resolves the total size of the file. The stream function should return a `Readable` stream according to the supplied offset and length parameters.

Example:

```js
// Custom source implementation for reading a zip file from Google Cloud Storage
const { Storage } = require('@google-cloud/storage');

async function main() {
const storage = new Storage();
const bucket = storage.bucket('my-bucket');
const zipFile = bucket.file('my-zip-file.zip');

const customSource = {
stream: function(offset, length) {
return zipFile.createReadStream({
start: offset,
end: length && offset + length
})
},
size: async function() {
const objMetadata = (await zipFile.getMetadata())[0];
return objMetadata.size;
}
};

const directory = await unzipper.Open.custom(customSource);
console.log('directory', directory);
// ...
}

main();
```

### Open.[method].extract()

The directory object returned from `Open.[method]` provides an `extract` method which extracts all the files to a specified `path`, with an optional `concurrency` (default: 1).

Example (with concurrency of 5):

```js
unzip.Open.file('path/to/archive.zip')
.then(d => d.extract({path: '/extraction/path', concurrency: 5}));
```

## Licenses
See LICENCE
Loading

0 comments on commit 2608bc2

Please sign in to comment.