Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datadog certifier #2366

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robert-cronin
Copy link
Contributor

@robert-cronin robert-cronin commented Dec 12, 2024

Description of the PR

Fixes #2345

I am not sure if there is a need for a parser or attestation since were just ingesting CertifyBad for a particular pURL, but if there is a need to represent the source information in a predicate, I'd be happy to try and figure out how to add that in.

PR Checklist

  • All commits have a Developer Certificate of Origin (DCO) -- they are generated using -s flag to git commit.
  • All new changes are covered by tests
  • If GraphQL schema is changed, make generate has been run
  • If GraphQL schema is changed, GraphQL client updates/additions have been made
  • If OpenAPI spec is changed, make generate has been run
  • If ent schema is changed, make generate has been run
  • If collectsub protobuf has been changed, make proto has been run
  • All CI checks are passing (tests and formatting)
  • All dependent PRs have already been merged

Signed-off-by: robert-cronin <[email protected]>
@funnelfiasco
Copy link
Contributor

As a general comment, I wonder if we want to call it something more specific than "DataDog"? "DataDog Malicious Packages DataSet" is unwieldy, but I'm concerned that there might be some future thing that pulls from DataDog proper and the name is already taken. I don't have any great ideas and this may not be a concern worth worrying about right now, but I wanted to raise it.

Comment on lines +167 to +177
// if no versions specified in dataset, skip
if len(maliciousVersions) == 0 {
// package known but no malicious versions listed?
continue
}

// certify only if the package has a specified version and that exact version is known malicious
if pkgInput.Version == nil {
logger.Debugf("Package %s has no version specified, skipping...", purl)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading it correctly that we'll ignore things like "aiohtttps" (in PyPI) because it applies to all versions?

Copy link
Contributor Author

@robert-cronin robert-cronin Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, I initially thought that an empty malicious version list meant no malicious versions found and thats what the code currently does, but is it instead the case that an empty list means all versions are known malicious? I couldn't find anything in the datadog malicious dataset repo to suggest this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the empty list means all versions are known malicious. For example "aiohtttps" is a typosquat of the "aiohttps" (notice the third 't'). I opened DataDog/malicious-software-packages-dataset#135 to request clarification, but I'm confident in my interpretation.

@robert-cronin
Copy link
Contributor Author

robert-cronin commented Dec 13, 2024

As a general comment, I wonder if we want to call it something more specific than "DataDog"? "DataDog Malicious Packages DataSet" is unwieldy, but I'm concerned that there might be some future thing that pulls from DataDog proper and the name is already taken. I don't have any great ideas and this may not be a concern worth worrying about right now, but I wanted to raise it.

yeah, that is a solid point, if DataDog eventually spin out other datasets, I can see how that might cause some confusion. The data itself mostly comes from GuardDog but I think not exclusively. Maybe we can go with something like datadog-malware-dataset or datadog-mspd but mspd is not a known acronym. The alternative is datadog-malicious-software-packages-dataset but like you said that is a bit unwieldy.
The datadog-malware-dataset one sounds like the best compromise to me between clarity and brevity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Add support for DataDog's malicious software package dataset
2 participants