-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internally version all detected content by extension #620
Conversation
Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self comments
} | ||
|
||
// detector is analysis extensions used by the worker. | ||
detector = dbutil.MigrationQuery{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Table: Detectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detector should contain all metadata information of all the extensions used to scan a layer.
var ( | ||
// entities are the basic building blocks to relate the vulnerabilities with | ||
// the ancestry. | ||
entities = dbutil.MigrationQuery{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entities are the real immutable content in the database.
On a higher level, Layer ( layer hash with layer content ) is not really immutable because a new extension can scan a layer and add new associated features.
ext/featurefmt/driver.go
Outdated
if name == "" { | ||
panic("featurefmt: could not register a Lister with an empty name") | ||
} | ||
func RegisterLister(l Lister) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change Lister to detector in the following PRs.
database/detector.go
Outdated
|
||
// Detector is an extention to scan a layer's content. | ||
type Detector struct { | ||
// Name of the detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review this design to use string to represent the Detector Type.
I can also define FeatureDetector
and NamespaceDetector
struct types for the same purpose.
database/models.go
Outdated
@@ -71,6 +65,7 @@ type Layer struct { | |||
type Namespace struct { | |||
Name string | |||
VersionFormat string | |||
DetectedBy Detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Detector here maybe not valid. As the reason shows in the database, namespace/feature may be detected by two different detectors in two different layers.
database/detector.go
Outdated
// Version of the detector | ||
Version string | ||
// Type of the detector | ||
Type DetectorType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
is a keyword so I'm a little uneasy about using this name.
As a counter-point, the reflect
standard library package does define their own type Type interface {...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
database/detector.go
Outdated
} | ||
|
||
// NewNamespaceDetector returns a new namespace detector. | ||
func NewNamespaceDetector(name string, version string) Detector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name, verison string
database/detector.go
Outdated
} | ||
|
||
// NewFeatureDetector returns a new feature detector. | ||
func NewFeatureDetector(name string, version string) Detector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name, version string
api/v3/clairpb/clair.proto
Outdated
} | ||
// The name of the detector. | ||
string name = 1; | ||
// The Version of the detector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version -> version
RegisterMigration(dbutil.NewSimpleMigration(1, | ||
[]dbutil.MigrationQuery{ | ||
entities, detector, layer, ancestry, | ||
vulnerability, updaterLock, notification, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one line each
pkg/dbutil/migrationutil.go
Outdated
@@ -0,0 +1,30 @@ | |||
package dbutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright headers
and let's move this into database/psql/migrations
ff97d47
to
3ff6ac9
Compare
All processors will now be used to process the layers.
b67ac8c
to
11e5efa
Compare
database/models.go
Outdated
type Layer struct { | ||
Hash string | ||
Namespaces DetectedNamespaces | ||
DetectedFeatures DetectedFeatures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DetectedFeatures -> Features
3e02efa
to
8938ec9
Compare
database/models.go
Outdated
// ProcessedBy. | ||
// DetectedNamespaces contains a map from detector to the detected namespace. | ||
// Nil detected namespace means it's not found. | ||
type DetectedNamespaces map[Detector]*Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new map type introduces a lot more complexity. It makes the worker complicated and also the database implementation. Maybe, we don't want this map to organize since the map will at most contain around 1000 features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flatten the map to be []DetectedNamespace, which contains Detector and Namespace.
This is for simplifying the data structure, we can then use pointers and a new data structure if we need to improve the performance.
c6d9b77
to
51a6a87
Compare
api/v3/clairpb/clair.proto
Outdated
@@ -44,17 +44,42 @@ message Vulnerability { | |||
repeated Feature affected_versions = 8; | |||
} | |||
|
|||
message Detector { | |||
enum Type { | |||
DETECTOR_TYPE_INVALID = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is a requirement by the prototool linter.
api/v3/util.go
Outdated
// this feature for now, we should refactor the implementation if there's | ||
// any performance issue. It's expected that the number of features is less | ||
// than 1000. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove line
cmd/clair/main.go
Outdated
clair.EnabledUpdaters = strutil.CompareStringListsInBoth(config.Updater.EnabledUpdaters, updaters) | ||
"Detectors": strings.Join(database.SerializeDetectors(clair.EnabledDetectors), ","), | ||
"Updaters": strings.Join(clair.EnabledUpdaters, ","), | ||
}).Info("Enabled Clair components") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lower case
ext/featurens/driver.go
Outdated
nslist = append(nslist, *ns) | ||
} | ||
return nslist, nil | ||
return namespaces, nil | ||
} | ||
|
||
// RequiredFilenames returns the total list of files required for all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comment
ext/featurens/driver_test.go
Outdated
@@ -3,11 +3,12 @@ package featurens_test | |||
import ( | |||
"testing" | |||
|
|||
"github.com/coreos/clair/ext/featurens" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re order
pkg/dbutil/dbutil.go
Outdated
return nil | ||
} | ||
|
||
func CacheRelatedVulnerability(datastore database.Datastore, features []database.NamespacedFeature) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I start to think that the abstraction on database layer is wrong. Somehow we need to figure out a way to get ride of the implicit database feature relationship.
pkg/dbutil/dbutil.go
Outdated
return FilterDetectors(detectors, database.NamespaceType) | ||
} | ||
|
||
func FilterDetectors(detectors []database.Detector, dtype database.DetectorType) []database.Detector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this being used? remove
9e42bed
to
4804d41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to other PR reviewer
TODO means it's a thing for the future PR not this one
NIT means it's not blocking merging the PR
Not related means it's a old issue and can be not addressed in this PR
Otherwise it means blocking the PR.
api/v3/clairpb/clair.proto
Outdated
repeated string scanned_listers = 4; | ||
// The configured list of namespace detectors used to scan an ancestry. | ||
repeated string scanned_detectors = 5; | ||
// The detectors used to scan this Ancestry. It maybe not the current set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase: The detectors used to scan this Ancestry. It may not be the same set of detectors in the Clair instance.
worker.go
Outdated
} | ||
if !ok { | ||
return false, nil | ||
ancestry, ok, err := dbutil.FindAncestry(datastore, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's wasteful to find ancestry and all the features associated here. We should later consider to have this function on database interface.
4804d41
to
52a3e03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first pass. I'm going to need to take a look much harder at the database changes.
api/v3/clairpb/convert.go
Outdated
return &Detector{ | ||
Name: detector.Name, | ||
Version: detector.Version, | ||
Type: Detector_Type(Detector_Type_value[string(detector.DType)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this kinda sucks. I wonder if gogo protobuf generates something more ergonomic than this.
database/detector.go
Outdated
// NamespaceDetectorType is a type of detector that extracts the namespaces. | ||
NamespaceDetectorType DetectorType = "DETECTOR_TYPE_NAMESPACE" | ||
// FeatureDetectorType is a type of detector that extracts the features. | ||
FeatureDetectorType DetectorType = "DETECTOR_TYPE_FEATURE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to even use a string? Why not have an integer?
const (
NamespaceDectectorType DetectorType = iota
FeatureDetectorType
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I want to store enum in the database. Well I can write a conversion to make this work anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enums are most often just integers in the database that have special meaning for the application. You use enums because it makes the column more simple (thus easier to migrate, copy, compare values etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the point isn't it. DetectorType (Detector Enum) is just like Severity Enum. You can look up the database schema for that.
} | ||
|
||
// SerializeDetectors returns the string representation of given detectors. | ||
func SerializeDetectors(detectors []Detector) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put this in strutil if you change the function to be:
func StringifyList(xs []fmt.Stringer) (ys []string) {
for _, x := range xs {
ys = append(ys, x.String())
}
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then it will require the type conversion from []database.Detector to []fmt.Stringer, which sucks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is super unfortunate... It's because interfaces are different memory layouts and creating a list of them is O(N), so you have to explicitly make the copy yourself rather than having Go do it for you when you pass it into the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me think if this kind of design is wrong after all. I think we are thinking about functions using generics but interfaces are not generics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should keep the original design for the serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
// DetectedFeatures are the features introduced by this layer when it was | ||
// Hash is the name of the ancestry's layer. | ||
Hash string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is comment also wrong. Hash is the sha256 digest of the manifest for a particular image.
pkg/dbutil/dbutil.go
Outdated
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package dbutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything in here makes sense to just have in the database package rather than in a utility package.
pkg/testutil/testutil.go
Outdated
@@ -0,0 +1,285 @@ | |||
package testutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of this is useful for only the database. I think this would also be fine being just in the database package.
Also, copyright header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those are for comparing the database models but it's not only for the database tests. It's also used in the worker test and in the future updater test.
52a3e03
to
c1c8b29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general approach for the database is good. We discussed it enough beforehand that it's pretty much exactly what I expected. I think that there were a bunch of things I noticed for database optimizations, but I don't think we should really spend time on that until we're positive it's the finalized schema.
some additional thoughts... i ran the codebase and sat pondering our current schema for a bit and came up with a few things that might not necessarily need to be done in this PR:
Get rid of namespaced_feature
ancestry_feature -> (id, ancestry_layer_id, feat_id, feat_detector_id, ns_id, ns_detector_id)
make kv table consistent
- rename fields to
updater/$NAMESPACE
- prefix value to include the SCM:
git-sha:hf92dk9
detector type
- rename column type -> dtype
- change column to this enum:
CREATE TYPE detector_type AS ENUM ('feature', 'namespace');
debian:unstable ns needs to be mapped to a number
we should lstrip "v" off of the alpine namespace version numbers, no other namespaces do that
cmd/clair/main.go
Outdated
@@ -180,6 +154,18 @@ func Boot(config *Config) { | |||
st.Stop() | |||
} | |||
|
|||
func initLogger(flagLogLevel *string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoid using the word init
to avoid confusion around the keyword.
cmd/clair/main.go
Outdated
|
||
clair.EnabledUpdaters = strutil.CompareStringListsInBoth(config.Updater.EnabledUpdaters, updaters) | ||
"Detectors": strings.Join(database.SerializeDetectors(clair.EnabledDetectors), ","), | ||
"Updaters": strings.Join(clair.EnabledUpdaters, ","), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these have to be strings? I'd be nice to keep them as arrays in the JSON logs.
Also the detectors string is borderline unreadable:
{"Detectors":"DETECTOR_TYPE_FEATUREDetector/apk/1.0,DETECTOR_TYPE_FEATUREDetector/dpkg/1.0,DETECTOR_TYPE_FEATUREDetector/rpm/1.0,DETECTOR_TYPE_NAMESPACEDetector/alpine-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/apt-sources/1.0,DETECTOR_TYPE_NAMESPACEDetector/lsb-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/os-release/1.0,DETECTOR_TYPE_NAMESPACEDetector/redhat-release/1.0"}
database/dbutil.go
Outdated
|
||
// PersistNamespaces wraps session PersistNamespaces function with begin and | ||
// commit. | ||
func PersistNamespaces(datastore Datastore, namespaces []Namespace) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should come up with a naming convention for the functions that are just wrapping calls in a transaction.
database/detector.go
Outdated
} | ||
} | ||
|
||
return "", ErrFailedToParseSeverity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copypasta?
database/detector.go
Outdated
|
||
// String returns a unique string representation of the detector. | ||
func (d Detector) String() string { | ||
return fmt.Sprintf("%sDetector/%s/%s", d.DType, d.Name, d.Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I comment on main, but yeah this is pretty verbose and hard to read
} | ||
|
||
// SerializeDetectors returns the string representation of given detectors. | ||
func SerializeDetectors(detectors []Detector) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
database/models.go
Outdated
// Ancestry is a manifest that keeps all layers in an image in order. | ||
type Ancestry struct { | ||
// Name is expected to be the sha-256 digest of the manifest for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name is a globally unique value for a set of layers. This is often the sha256 digest of an OCI/Docker manifest.
All detected features and namespaces under the context of Layer and Ancestry will now have the detectors associated, so that the API can provide the detection information to the Client.
'detector' table is added to store the metadata of detectors. 'layer_feature', 'layer_namespace', and 'ancestry_feature' tables are modified to store the detection relationship between the feature/namespace with the detector.
Golang-set library is added to make it easier to support set operations.
1. Every Lister and Detector are versioned 2. detected content, are returned in a map with detector info as the key
Aggregate queries in their corresponding files instead of having the single file for every queries because the database is more complicated.
Change the V3 implementation to accommondate the detectors.
The worker is changed to accommodate the new database model and API. Worker is refactored to move the database query helper functions to pkg.
c1c8b29
to
8b60a78
Compare
* Refactor layer and ancestry * Add tests * Fix bugs introduced when the queries were moved
8b60a78
to
b50a745
Compare
This is done: |
Move dbutil and testutil to database from pkg Rename all "result"
b50a745
to
e657d26
Compare
All database utility functions are renamed to explicitly say if it will commit changes or rollback changes on success.
Rename detector type to DType because all reserved key words should be avoided used as type name or variable name.
851c8ec
to
69c0c84
Compare
Conceptual change:
Detector
now represents anything that extracts information from a layer blob.Detector
has two different types for now:Namespace
( Detector to extract namespace) andFeature
( Detector to extract features )Processor
andLister
are renamed toDetector
Changes in API:
scanned_*
to be justdetector
Changes in Database Model:
Entities
: objects relating the vulnerabilities and the ancestries as the result of Analysis.Ancestries
: layers and ancestries scanned with detectors with metadata (name/structure/relation to entities/detectors) saved in the database.Vulnerabilities
: vulns fetched by the updaters.Notification
: vulnerability notifications spawned by vulnerability changes.UpdaterLock
: lock and keyValue.Changes in Behavior:
Additional responsibility to Client: