Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple outputs in a single run #3243

Closed
jawnsy opened this issue Dec 1, 2022 · 22 comments · Fixed by #4452
Closed

Support multiple outputs in a single run #3243

jawnsy opened this issue Dec 1, 2022 · 22 comments · Fixed by #4452
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@jawnsy
Copy link

jawnsy commented Dec 1, 2022

Thanks so much for producing and maintaining this excellent tool!

Summary

When running in build systems, it would be convenient to generate a report to output the UI as well as save a report to a file, sometimes also in a different format.

Current behavior

Only one format/output pair can be specified, so we can output to a table or JSON in a given trivy run, but not both. Additionally, we can output results to the terminal or to a file, but not both.

Desired behavior

Configure format/file as a single variable, and allow multiple such values to be passed. For example: trivy image --output=json=out.json --output=table=- --output=cyclonedx=sbom.cdx

Workaround

If we want to log and show the same output format (for example, a table shown to stdout as well as recorded in a txt file), then we can use tee.

If we have different desired output formats, then there are a few workarounds:

  1. Run the scan multiple times. Trivy is usually pretty fast, and if the image already exists, it's not too much work to scan the file contents twice.
  2. Run with an intermediate SBOM format: we can use Trivy to generate an SBOM, then immediately "scan" the SBOM for the desired output format (e.g. generate a cyclonedx file, then scan the SBOM and output a table). However, this approach only works for vulnerability scanning, since the SBOM format is not meaningful for secret checks or config checks.
@jawnsy jawnsy added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 1, 2022
@github-actions
Copy link

This issue is stale because it has been labeled with inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Jan 31, 2023
@jawnsy
Copy link
Author

jawnsy commented Feb 1, 2023

It'd still be nice to have this feature, judging by the 👍 this is something that lots of people would find useful

@romainsuire
Copy link

We also need this feature

@cnaslain
Copy link

cnaslain commented Feb 1, 2023

I would love to have this too. JSON + HTML or TXT.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and will be auto-closed. label Feb 2, 2023
@manzsolutions-lpr
Copy link

manzsolutions-lpr commented Feb 15, 2023

We actually run Trivy five times for that reason:

  1. gitlab.tpl
  2. html.tpl
  3. junit.tpl
  4. Full table on stdout
  5. --exit-code 1 --ignore-unfixed --severity CRITICAL to cause the CI job to fail on critical vulnerabilities.

So while with most projects the duration is still reasonable a slow scan literally multiples itself.

Besides multiple outputs during a single run somehow caching the results for re-runs would also work but has other caveats obviously.

//edit: Hmm, bad research on my end:
Apparently there is a cache but somehow it's not working for us yet:

https://aquasecurity.github.io/trivy/v0.37/docs/vulnerability/examples/cache/#cache-directory

#2750

@Z4ck404
Copy link

Z4ck404 commented Mar 10, 2023

+1
Having this is super useful.

@Mo0rBy
Copy link

Mo0rBy commented Mar 14, 2023

+1
My team would find this feature super useful instead of needing to run Trivy multiple times

@itaysk
Copy link
Contributor

itaysk commented Mar 15, 2023

Trivy is using a robust cache so running the same scan multiple time essentially doesn't perform a rescan, just reformats the output. Given this information, do you still think multiple outputs are necessary or it's reasonable to run trivy again to get another output (will not rescan).

@exiett
Copy link

exiett commented Mar 24, 2023

Trivy is using a robust cache so running the same scan multiple time essentially doesn't perform a rescan, just reformats the output. Given this information, do you still think multiple outputs are necessary or it's reasonable to run trivy again to get another output (will not rescan).

IMHO I think is better to have a support for multi-outputs in a single run because this makes it easier to maintain the command that is being run in the pipeline for when it comes to flags being deprecated (as the --scanners flag recently did with the --security-checks flag).

@knqyf263 knqyf263 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 3, 2023
@knqyf263
Copy link
Collaborator

knqyf263 commented Apr 3, 2023

A lot of people seem to want it, so we decided to change our minds and support this feature. What if adding a new flag --outputs?

trivy image --outputs json=out.json --outputs table=- --outputs cyclonedx=sbom.cdx

The existing flags --format and --output cannot be used with --outputs.

We need to think about templates as it also needs template strings or files. We'd love to hear your thoughts. Thanks.

@exiett
Copy link

exiett commented Apr 3, 2023

On our end, it would greatly improve the experience to have a stdout as a given format (we use table so the developer can easy spot the packages and their respective fixes) and also generate a JSON file containing the vulnerabilities that were found, so we can create security tickets for the developers to fix their repositories. The --outputs would work great.

@knqyf263 knqyf263 added this to the v0.40.0 milestone Apr 4, 2023
@knqyf263
Copy link
Collaborator

knqyf263 commented Apr 4, 2023

ChatGPT suggested this UI, and it looks good.

$ trivy image --outputs json:result.json --outputs table --outputs template:@junit.tpl:junit.xml --outputs template:@gitlab.tpl:gitlab.json

The default output is stdout, like the table in the above example. For templates, you can pass the template path in the form of template:/path/to/template_file[:/path/to/output].

@itaysk
Copy link
Contributor

itaysk commented Apr 7, 2023

it's close to what we have in tracee, only difference is in the template example which in tracee is gotemplate=/my.tmpl:res.json. I think I like the tracee one better since it's clear the template file is on the side of the format and not the out file, and also it's clear what kind of template this is (gotemplate). just my thoughts if people prefer the suggested version it's also fine.

about the flag name - is there a way we can keep it --output/-o? I think pretty much every tool I know uses this flag name for controlling output format, especially -o json is a muscle memory for many folks. Actually, isn't the proposed --outputs compatible with the current --output?

  1. If the flag value is format:file then there's no risk of conflict since this wasn't supported before.
  2. If the flag value is a file path then it's same behavior as previously with --output.
  3. if the flag value is format then this is the potential issue but it's easy to see if the value is a file or a format.

Yes we will need do some smart detection of the flag value, but as far as I understand the proposal we will need to do it anyway in the new --outputs flag.

@knqyf263
Copy link
Collaborator

After thinking for a while, I'm leaning towards the Buildkit approach. Something like the following:

$ trivy image --outputs format=json,out_file=result.json --outputs format=template,[email protected],out_file=result.junit

This is because we might have more options for each output. For example, we might add support for template URLs.

$ trivy image --outputs format=template,template_url=https://example.com/trivy/templates/my_custom_report.tpl,out_file=result.txt

Also, I have a plan to generate SBOM and VEX referencing the SBOM.

$ trivy image --vex-template /path/to/vex.template --outputs format=spdx-json,out_file=trivy.spdx.json,vex_format=openvex,out_vex_file=trivy.openvex --outputs format=cyclonedx,out_fle=trivy.sbom.cdx,vex_format=cyclonedx,out_vex_file=trivy.vex.cdx

SBOM and VEX formats can be specified independently:

  • SBOM: SPDX JSON, VEX: OpenVEX
  • SBOM: CycloneDX, VEX: CycloneDX
  • SBOM: CycloneDX, VEX: OpenVEX

It is hard to represent these structured options with --outputs json:result.json.

about the flag name - is there a way we can keep it --output/-o? I think pretty much every tool I know uses this flag name for controlling output format, especially -o json is a muscle memory for many folks.

I'm also sure many Linux tools use the --output <file> style, such as curl, sort, base64, git, etc. For example, I've been using the following flag in cURL millions of times more than kubectl and aws.

$ curl -h
 -o, --output <file>        Write to file instead of stdout

I want to keep the current behavior of --output so it will not add a breaking change.

Yes we will need do some smart detection of the flag value

I thought a new flag was more intuitive for users, but we can use the existing --output for the structured options.

  1. If --output doesn't contain =, consider the value as a file path.
$ trivy image --output result.json --format json IMAGE_NAME
  1. If --output contains =, consider the value as structured options.
$ trivy image --output format=json,out_file=result.json --output format=template,[email protected],out_file=result.junit

--format will be ignored, and Trivy will show a warning message.

The downside of the detection is file paths can include =, and it leads to false detection. We can probably do that smarter, though.

@itaysk
Copy link
Contributor

itaysk commented Apr 19, 2023

Makes sense, I think the two suggestions are closer than it seems, except the colon divider.
I think we should plan this with plugins in mind. maybe plugins can be designated for formatting or outputting and then used seamlessly with trivy.
I'll think of a suggestion that considers all that and post here

@itaysk
Copy link
Contributor

itaysk commented May 19, 2023

@knqyf263 I'm summarizing your suggestion and tweaking it a bit to address my wishlist, let me know what you think:

Requirements

  1. A single "outputs" flag should contain all the information for one output scenario.
  2. A typical output scenario includes:
    1. Format - how to serialize the results
    2. Destination - where to write the results to
  3. Specific formats, and destinations can have their own configuration.
  4. TBD, Formats and destinations can be builtin (i.e json,stdout) or plugins (i.e html,aws-securityhub)

Usage

  1. General form: --outputs format=myformat[,myformat_setting=value...],dest=mydestination[,mydestination_setting=value]
  2. At the very least, outputs define format= and dest=
  3. Specific configurations are depending on the format and dest used. For example if dest=file is specified, then file_path= is mandatory. But if format=table is specified, then table_width= is optional.
  4. Specific configurations are conventionally prefixed with the file/dest they refer to.
  5. by default, format=table,dest=stdout is selected, so omitting either is fine
  6. a special shorthand is available, if the content of --outputs is a string with no =, then it is interpreted as the value to format= or file_path=, depending if it contains a backslash / or dot . character. For example --outputs json is same as --outputs format=json, and --outputs /path/to/file is same as --outputs dest=file, file_path=/path/to/file.
  7. TBD, given the previous is --outputs compatible with the current --output ?

Plugins

We need to discuss plugins in a separate issue, but since this proposal takes into account the future design of plugins, I'd address the relevant assumptions I'm making:

  1. Trivy will support "format" and "destination" plugins.
  2. User register plugin with Trivy before running scan. Possibly using the existing trivy plugin install mechanism.
  3. User can utilize plugin for formatting or destination just like a builtin. For example, --outputs dest=webhook, webhook_url=http://myendpoint, or --outputs format=html, html_usejavascript=true

Builtin formats

  1. table (default)
    1. table_colors (true/false)
  2. json
  3. sarif (?)
  4. spdx-json
  5. cyclonedx
  6. gotemplate
    1. gotemplate_file (/path/to/file.tmpl)

Builtin destinations

  1. stdout (default)
  2. file
    1. file_path (/path/to/file)

@knqyf263
Copy link
Collaborator

It basically looks good. There are some things to discuss.

Usage

  1. Specific configurations are depending on the format and dest used. For example if dest=file is specified, then file_path= is mandatory. But if format=table is specified, then table_width= is optional.

dest is enough, no? --outputs dest=foo.json (file) or --outputs dest=- (stdout) can describe all the destinations.

  1. a special shorthand is available, if the content of --outputs is a string with no =, then it is interpreted as the value to format= or file_path=, depending if it contains a backslash / or dot . character. For example --outputs json is same as --outputs format=json, and --outputs /path/to/file is same as --outputs dest=file, file_path=/path/to/file.

It doesn't seem to be very easy. I want to keep the current behavior of --format and --output, so it covers this shorthand. It means there is no special shorthand.

  1. At the very least, outputs define format= and dest=

The above rule must be satisfied.

Plugins

  1. User can utilize plugin for formatting or destination just like a builtin. For example, --outputs dest=webhook, webhook_url=http://myendpoint, or --outputs format=html, html_usejavascript=true

Is there any advantage to distinguishing between formatting and destination? What if using --plugin= like --outputs plugin=webhook,plugin.webhook_url=http://myendpoint --outputs plugin=html, plugin.html_usejavascript=true? The plugin will be executed with the JSON result passed through stdout, and plugin.xxx will be passed to the plugin.

$ trivy image debian:11 --outputs plugin=webhook,plugin.webhook_url=http://myendpoint,plugin.use_ssl

is the same as

$ trivy image debian:11 -f json | trivy-plugin-webhook --webhook-url=http://myendpoint --use_ssl

Applying to formatting plugins as well.

$ trivy image debian:11 --outputs plugin=csv,plugin.delimiter=;

would be

$ trivy image debian:11 -f json | trivy-plugin-csv --delimiter=;

We would expect the plugin also works standalone.

$ trivy image debian:11 -f json -o debian11.json
$ trivy csv --delimiter=; ./debian11.json

Several plugins are accepted.

$ trivy image debian:11 --outputs plugin=webhook,plugin.webhook_url=http://myendpoint,plugin.use_ssl --outputs plugin=csv,plugin.delimiter=;

Template URLs

Also, we have to think about remote templates.
#4079

Or we deny this suggestion and ask people to create plugins rather than templates?

@knqyf263
Copy link
Collaborator

knqyf263 commented May 21, 2023

After starting the implementation, I realized that in spf13/cobra and spf13/viper, repeated flags are comma-separated and concatenated.

$ trivy image --outputs format=table,dest=table.txt --outputs format=json,dest=foo.json debian:11

It is treated as format=table,dest=table.txt,format=json,dest=foo.json. It is a bit difficult to determine which output group a key/value pair belongs to.

$ trivy image --outputs format=table --outputs dest=foo.txt,format=json debian:11

In the above case, the outputs would be format=table,dest=foo.txt,format=json. Is this destination for table or json?

I have some ideas.

  1. The output must start with format=.
    • It must be --outputs format=table,dest=foo.txt --outputs dest=foo.txt,format=table is not allowed.
  2. Use {}
    • AWS CLI uses this syntax.
    • e.g. $ trivy image --outputs {format=table,dest=table.txt} --outputs {format=json,dest=foo.json} debian:11
  3. Use [] or ()
    • gcloud uses this syntax.
    • e.g. $ trivy image --outputs table(dest=table.txt) --outputs plugin(name=csv,dest=foo.json) debian:11
  4. Use a different separator such as ; and &
    • e.g. $ trivy image --outputs format=table;dest=table.txt --outputs format=json;dest=foo.json debian:11
  5. Escape double quotes
    • e.g. $ trivy image --outputs \"format=table,dest=table.txt\" --outputs \"format=json,dest=foo.json\" debian:11

I'm not sure if 2 and 3 work, as the comma might be separated inside the brackets. Any idea is welcome.

UPDATE: 2 and 3 didn't work. Viper reads values as csv.

@knqyf263
Copy link
Collaborator

@itaysk
Copy link
Contributor

itaysk commented May 22, 2023

dest is enough, no? --outputs dest=foo.json (file) or --outputs dest=- (stdout) can describe all the destinations.

I had in mind more destinations other than file that some users asked for and can be plugin. Examples: defectdojo, sonarqube, webhook and even aqua (future integration). Also, dest=file is default, so I think you can still configure file output with a single setting: --outputs file_path=/path/to/file.

Is there any advantage to distinguishing between formatting and destination?

I thought it's the same motivation as having separate --format and --output flags. If a single plugin does both, there might be alot of redundant work. For example json serialization can be implemented once (--outputs format=json) and send to different destinations (--outputs dest=file / --outputs dest=webhook / --outputs dest=aqua). Another example is SBOM formats, --outputs format=spdx-json, dest=webhook no need to reimplement spdx in the plugin.

@itaysk
Copy link
Contributor

itaysk commented May 23, 2023

after discussing this in length offline, we have realized that we were conflating different solutions.
Providing multiple outputs in Trivy is quite complicated, because a single run of Trivy can produce different kinds of outputs depending on the scanners involved. But the underlying use case of running a scan once, and repurposing the results for different use cases is something we can improve. This has been discussed in the past in Multiple report options · Issue #720 · aquasecurity/trivy · GitHub and also implemented in feat(conversion): from a json report generate other repo by utix · Pull Request #3014 · aquasecurity/trivy · GitHub. We will follow up on that proposal and add it to Trivy.
This should answer the problem without the complexity of multiple outputs, hence I will be closing this issue.
Unrelated, we will discuss plugins as outputs in plugin as output option · aquasecurity/trivy · Discussion #4451 · GitHub

@reitzig
Copy link

reitzig commented Dec 13, 2023

For reference -- all those issue links sent me down some loopy loops! -- the answer as of today is: create Trivy JSON format, then use trivy convert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
Status: No status