Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multipart or incomplete uploads #251

Open
steve-mays opened this issue Nov 12, 2024 · 0 comments
Open

Add support for multipart or incomplete uploads #251

steve-mays opened this issue Nov 12, 2024 · 0 comments

Comments

@steve-mays
Copy link
Contributor

steve-mays commented Nov 12, 2024

  • We have a customer that has a SFTP Gateway that is connected to the "unscanned" storage bucket.
  • When a client uploads using a SFTP client (e.g. WinSCP, Filezilla, Snaplogic, etc) the client can break the file into multiple parts during upload.
  • These are written directly to the storage bucket and this triggers a 'google.cloud.storage.object.v1.finalized' event for each update.
  • This results in partial file scans and can break the upload process for the clients.

Need to add logic to deal with these multi-part uploads.

  • Option 1: Compare the file size from the req.body to the metadata.size from the object. These wont match until the file has completed upload.
  • Option 2: Add an array of "fileExclusionPatterns" in the config.json that are ignored during processing. Each could be a regular expression to allow complex pattern matching. E.g. "\\.filepart$" would exclude any files that have the ".filepart" extension - as used by WinSCP.
steve-mays added a commit to steve-mays/docker-clamav-malware-scanner that referenced this issue Nov 12, 2024
…ded files by comparing the object metadata size to the file size in the request body. If they don't match, then assume file is not finished uploading.

Add fileExclusionPatterns array to the config object (including config.json.tmpl) and associated procesing to ignore files that match any of the array items.

Add additional logging and error handling when moving files between buckets.
steve-mays added a commit to steve-mays/docker-clamav-malware-scanner that referenced this issue Nov 12, 2024
This commit introduces two enhancements:

Partial Upload Handling: Addresses issue GoogleCloudPlatform#251 by verifying the uploaded file size against the object metadata size. This prevents processing incomplete uploads.

File Exclusion Patterns: Adds a fileExclusionPatterns array to the configuration (including the template file) allowing specific files to be ignored during processing. This improves efficiency and avoids unnecessary scans.
steve-mays added a commit to steve-mays/docker-clamav-malware-scanner that referenced this issue Nov 14, 2024
This commit introduces two enhancements:

Partial Upload Handling: Addresses issue GoogleCloudPlatform#251 by verifying the uploaded file size against the object metadata size. This prevents processing incomplete uploads.

File Exclusion Patterns: Adds a fileExclusionPatterns array to the configuration (including the template file) allowing specific files to be ignored during processing. This improves efficiency and avoids unnecessary scans.
steve-mays added a commit to steve-mays/docker-clamav-malware-scanner that referenced this issue Nov 14, 2024
This commit introduces two enhancements:

Partial Upload Handling: Addresses issue GoogleCloudPlatform#251 by verifying the uploaded file size against the object metadata size. This prevents processing incomplete uploads.

File Exclusion Patterns: Adds a fileExclusionPatterns array to the configuration (including the template file) allowing specific files to be ignored during processing. This improves efficiency and avoids unnecessary scans.
nielm added a commit that referenced this issue Nov 29, 2024
* feat: Handle partial uploads, zero-length files and file exclusion patterns

This commit introduces three enhancements:

Partial Upload Handling: Addresses issue #251 by verifying the uploaded file size against the object metadata size. This prevents processing incomplete, or still in progress chunked uploads.

Option: Ignore zero-length-files: Some upload tools create a zero-length file to verify permissions then later update the file. Ignoring zero length files prevents the file from being processed until it is ready

File Exclusion Patterns: Adds a fileExclusionPatterns array to the configuration allowing specific files to be ignored during processing. This can help with some upload tools which parallelize uploads by creating several temp files, then concatenate them. This improves efficiency, avoids unnecessary scans, and prevents these temp files from being moved before the upload tool  has finished.



Co-authored-by: Niel Markwick <[email protected]>
Co-authored-by: Niel Markwick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant