Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: install grype from path #111

Merged
merged 15 commits into from
Aug 14, 2023
Merged

feat: install grype from path #111

merged 15 commits into from
Aug 14, 2023

Conversation

willmurphyscode
Copy link
Contributor

Add the ability to install grype from a local path. Fixes #108.

Before this change, yardstick could be configured to install grype from a remote repository, or from a published released. But yardstick is used as part of the quality tests in grype. Therefore, when iterating on those tests, it's valuable to have a mechanism to point yardstick at a local directory that can build grype; otherwise changes have to be committed and pushed at least to a branch before yardstick can use them.

Example of the new config:

        - name: grype
          # version: git:current-commit
          version: path:/Users/willmurphy/work/grype
          takes: SBOM

The above config will cause yardstick to compile grype from the source found at /Users/willmurphy/work/grype.

Here's an example of how the version is logged:

 Results used:
    ├── 2a3d3f2f-ea4f-48ca-a826-176f77a5c302 : grype@path:/Users/willmurphy/work/grype against ...

Yardstick replaces / with _ in the path when saving the tool:

❯ ls .yardstick/tools/grype
_Users_willmurphy_work_grype v0.65.1

Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
**kwargs,
) -> "Grype":
# get the description and head ref from the repo
src_repo_path = os.path.abspath(src_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticmethod
def _run_go_build(
abspath: str,
abspath: str, # TODO: rename; what is an absolute path to?
Copy link
Contributor

@wagoodman wagoodman Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the directory the binary will be installed to.. maybe abs_bin_dir_path ? or abs_install_dir?

os.makedirs(dest_path, exist_ok=True)
cls._run_go_build(
abspath=os.path.abspath(dest_path),
description=f"local_build:{src_path}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description is typically used as the version specified in the ld-flags as well as the version on the python Grype object (in the constructor the version_details) like here https://github.com/anchore/yardstick/pull/111/files#diff-eba0a362cbd33d93d1a6ed803e109b7da6fde23ce94f3118f577ad8140ce8b3aL155 . This way the version makes it into the scan configuration.

The goal is to try and make the description distinguishable from other grype versions. This is why the git describe is used in other approaches (maybe with --dirty makes sense too here).

fake_repo.git.describe.return_value = "test-version"
repo.return_value = fake_repo
tool = Grype.install(
version="path:/where/grype/is/cloned", path=".yardstick/tools/grype/path:_where_grype_is_cloned", update_db=False
Copy link
Contributor

@wagoodman wagoodman Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little confusing (there is an encoding of path: within a path within the path variable). Also, one thing that's important is to make certain that subsequent runs wont try an reuse the same (already installed) binary just because the path matches... in case you make code changes and try and run this again. There is some of this logic in other places, but I don't see it in this execution path, so you're probably fine (but wanted to mention it just in case).

Lastly, maybe instead of the full path within path taking the sha256sum of the path string to get a single directory would flatten out where it is stored within the tools directory some (I could be misunderstanding something here though, but it looks like it could make a deeply nested path here). (edit: see #111 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other code in yardstick normalizes the path by replacing / with _:

return os.path.join(
store_root,
TOOL_DIR,
config.tool_name.replace("/", "_"),
config.tool_version.replace("/", "_"),
)

So this ends up with a potentially long path, but there's no additional nesting.

For example, where I've been using this in grype:

❯ ls -1 .yardstick/tools/grype
_Users_willmurphy_work_grype
v0.65.1

I think keeping the path human-readable is important, since it enables comparing two different local checkouts of grype and being able to tell at a glance which is which.

The concern that a subsequent run with a changed grype at the same path is valid. Right now, _install_from_path builds grype unconditionally, so I don't think anything should go wrong there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point -- we are already normalizing these in the store 🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from an offline conversation we think that path:<the-path>-<git description w/dirty>[-<git diff hash if dirty>] will work pretty well here as the resolved version value.

Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
@willmurphyscode
Copy link
Contributor Author

A combination of git status and a hash of git diff end up in the path where yardstick installs grype:

❯ ls .yardstick/tools/grype/~_work_grype                                                                                                               
v0.65.1-19-g6a36695-clean                                          
v0.65.1-19-g6a36695-dirty-266b74736bef4f1c0aebb531723cc5320cf6d993
v0.65.1-19-g6a36695-dirty-bcaa98c197ed2992b4f444b6759b0a283de829e8

The current drawback to this approach is that if someone re-runs yardstick and the only changes in grype between the runs are in untracked files, the artifacts might be reused. I might look for a quick way to fix that.

To make it more difficult to accidentally reuse build artifacts if
something has changed.

Signed-off-by: Will Murphy <[email protected]>
Signed-off-by: Will Murphy <[email protected]>
hash_obj = hashlib.sha1()
for untracked in repo.untracked_files:
with open(os.path.join(repo.working_dir, untracked), "rb") as untracked_file:
for chunk in iter(lambda: untracked_file.read(4096), b""): # pylint: disable=cell-var-from-loop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like some advice from someone with a little more Python experience here; this pylint: disable doesn't make me very happy.

Copy link
Contributor

@wagoodman wagoodman Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lambda is getting defined in a loop, which is what the linter is complaining about. I think breaking it out will make the linter happy and the code more readible:

def hash_file(path: str) -> str:
    hasher = hashlib.sha1()

    with open(path, "rb") as f:
        while True:
            data = f.read(65536)
            if not data:
                break
            hasher.update(data)

    return hasher.hexdigest()

git_desc = repo.git.describe("--tags", "--always", "--long", "--dirty")
if repo.is_dirty():
hash_obj = hashlib.sha1()
for untracked in repo.untracked_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one more fileset that needs to get included: modified files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will be included in the output on git diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh! missed that, thanks 🙌

@willmurphyscode willmurphyscode merged commit 34d2530 into main Aug 14, 2023
@willmurphyscode willmurphyscode deleted the feat-install-from-path branch August 14, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make it possible to install grype from a local path
2 participants