Skip to content

Latest commit

 

History

History
44 lines (24 loc) · 2.61 KB

README.MD

File metadata and controls

44 lines (24 loc) · 2.61 KB

#The Bundler Service Monitor a directory for completed bundles of files. A complete bundle of files includes a given and .manifest file. Once reaching a threshold of complete bundles either by size, count, or age, the bundles will be merged.

Each is expected to be a SequenceFile<BulkIngestKey,Value>. They will be read one at a time and concatenated to the working directory. Once all files have been written into a single file in the working directory the file will be moved to the targetDirectory. If configured, the file will have the date added to the target directory.

Once the concatenated file has been successfully moved to the target directory, each of the manifests will be read and the files referenced in the manifests will have their file paths transformed by the configured values.

##Required configuration

###FileScannerProperties (prefix: file) file.inputDir - Directory to monitor for new files

file.frequency - time in MS to scan file.inputDir for files to process

file.ignorePrefix - Files that match this prefix will not be processed

file.maxAge - (default -1) max age in MS before a file should be processed regardless of count or size

file.maxSize - (default -1) max size in bytes an aggregated set of files should be before they should be processed regardless of count or age

file.maxFiles - (default -1) max number of files before files should be processed regardless of age or size

file.recursive - (default false) if true recursively search inputDir for files

file.errorRetryInterval - (default 60000) interval that a file should not be retried if it resulted in a processing error

file.errorRetryTimeUnit - (default MILLISECONDS) default time unit for errorRetryInterval

file.fsConfigResources - List of files to be added to Configuration, applied in order. This should include hadoop core/site

###BundlerProperties (prefix: bundler) bundler.workDir - the work dir where files will be aggregated

bundler.bundleOutputDir - final destination of all aggregated files

bundler.dateBundleOutput - if true dateFormat will be appended to the bundler.bundleOutputDir property with the current date for each file

bundler.dateFormat - SimpleDateFormat parsable format for the date to be applied if bundler.dateBundleOutput is true

bundler.manifestPathToReplace - the piece of the manifest path that should be replaced with bundler.manifestPathReplacement

bundler.manifestPathReplacement - replaces bundler.manifestPathToReplace before the referenced files in the manifest are moved

Failure to move a referenced manifest file will not cause a files processing to fail.