Skip to content

GoogleCloudPlatform/docker-clamav-malware-scanner

Repository files navigation

Malware Scanner Service

This repository contains the code to build a pipeline that scans objects uploaded to GCS for malware, moving the documents to a clean or quarantined bucket depending on the malware scan status.

It illustrates how to use Cloud Run and Eventarc to build such a pipeline.

Architecture diagram

How to use this example

Use the tutorial to understand how to configure your Google Cloud Platform project to use Cloud Run and Eventarc.

Using Environment variables in the configuration

The tutorial above uses a configuration file config.json built into the Docker container for the configuration of the unscanned, clean, quarantined and CVD updater cloud storage buckets.

Environment variables can be used to vary the deployment in 2 ways:

Expansion of environment variables

Any environment variables specified using shell-format within the config.json file will be expanded using envsubst.

Passing entire configuration as environment variable

An alternative to building the configuration file into the container is to use environmental variables to contain the configuration of the service, so that multiple deployments can use the same container, and configuration updates do not need a container rebuild.

This can be done by setting the environmental variable CONFIG_JSON containing the JSON configuration, which will override any config in the config.json file.

If using the gcloud run deploy command line, this environment variable must be set using the --env-vars-file argument, specifying a YAML file containing the environment variable definitions (This is because the commas in JSON would break the parsing of --set-env-vars)

Take care when embedding JSON in YAML - it is recommended to use the Literal Block Scalar style using |, as this preserves newlines and quotes

For example, the CONFIG_JSON environment variable could be defined in a file config-env.yaml as follows:

CONFIG_JSON: |
  {
    "buckets": [
      {
        "unscanned": "unscanned-bucket-name",
        "clean": "clean-bucket-name",
        "quarantined": "quarantined-bucket-name"
      }
    ],
    "ClamCvdMirrorBucket": "cvd-mirror-bucket-name",
    "fileExclusionPatterns": [],
    ignoreZeroLengthFiles: false
  }

An example commandline using this file to specify the environment:

gcloud beta run deploy "${SERVICE_NAME}" \
  --source . \
  --region "${REGION}" \
  --no-allow-unauthenticated \
  --memory 4Gi \
  --cpu 1 \
  --concurrency 20 \
  --min-instances 1 \
  --max-instances 5 \
  --no-cpu-throttling \
  --cpu-boost \
  --service-account="${SERVICE_ACCOUNT}" \
  --env-vars-file=config-env.yaml

If you are using Terraform to deploy, then the equivalent way to specify the environment variable using the google_cloud_run_v2_service resource is by using the env block and jsonencode:

resource "google_cloud_run_v2_service" "malware-scanner" {
  name = "malware-scanner"
  // other service parameters...
  template {
    // other template parameters...
    containers {
      // other container parameters...
      env {
        name = "CONFIG_JSON"
        value = jsonencode({
          buckets = [
            {
              unscanned   = "unscanned-bucket-name",
              clean       = "clean-bucket-name",
              quarantined = "quarantined-bucket-name"
            }
          ]
          ClamCvdMirrorBucket = "cvd-mirror-bucket-name",
          fileExclusionPatterns = [],
          ignoreZeroLengthFiles = false
        })
      }
    }
  }
}

Notes on fileExclusionPatterns

The fileExclusionPatterns array in the config file can be used to ignore any uploaded files matching a Regular Expression.

This can be used for example if you have an upload system that creates temporary files, then renames them once the files are fully uploaded.

The elements in the fileExclusionPatterns array can either be simple strings, for example:

"fileExclusionPatterns": [
  "\\.tmp$",
  "^ignore_me.*\\.txt$"
]

or they can be an array of 2 string values, allowing regular expression flags to be specified, for example "i" for case-insensitive matches:

"fileExclusionPatterns": [
  [ "\\.tmp$", "i" ],
  [ "tempfile.*.upload$", "i" ]
]

Files matching these patterns will be ignored by the scanner, and left in the unscanned bucket, and an ignored-files counter incremented.

Helpful tools for regular expressions include the Regular Expression Cheatsheet, and the Regex101 playground (ensure ECMAScript flavor is selected).

Note that when adding regular expressions into the config file, care must be taken with \ and " characters -- any of these characters in the regular expression must be escaped with another \.

Change history

See CHANGELOG.md

Upgrading from v2.x to v3.x

In Version 3.x, the metrics reporting was changed to OpenTelemetry which uses a different naming convention for metrics, so the metric names have changed from:

custom.googleapis.com/opencensus/malware-scanning/METRIC-NAME

to

workload.googleapis.com/googlecloudplatform/gcs-malware-scanning/METRIC-NAME

Any dashboards or alerts using these metrics must be updated

Upgrading from v1.x to v2.x

Version 2 has a different way of handling ClamAV updates to avoid issues with the ClamAV content distribution network.

See upgrade_from_v1.md for upgrading instructions.

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.