-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APPAI-1776] Fixed sentry error due to invalid version #737
Conversation
bayesian/utils.py
Outdated
if ecosystem == "maven": | ||
"""Needed for maven version like 1.5.2.RELEASE to be converted to | ||
1.5.2 - RELEASE for semantic version to work.""" | ||
version = version.replace('.', '-', 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what value will be sent for ingestion in case the pkg is unknown? Also, is the version in graph going to be stored with . or - ? While fetching details from graph, will it expect in . or - form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions is used to format the version we report in recommended version field of CA POST call. This is not used for any query graph / ingestion purpose. Also I don't know why maven
version are converted from '.' to '-' forms, this generic conversion was creating issue for golang for sure, so moved maven specific code under ecosystem check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. can you ask the original coder to check why do we need this in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepak1725 Can you add oldie who can provide us reasoning behind this strange version transformation for Maven ecosystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was written 3 years ago, You will hardly found the author of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzainee You suggest to check on version_comparator to remove this customize version transformation in API server. I did integrated version_comparator, but we cannot use this library directly because of following reasons.
- Version comparator is used for version comparison only we cannot transform them to form a tuple with major, minor, patch and build. This tuple is refereed further in complete flow of CA response builder.
- Invalid version string validation is missing in version comparator. This functionality is done by existing function in API server, which is a valid use-case.
- With golang, we have to support semver and pseudo version format, which is taken care by existing function.
@dgpatelgit |
We need to validate version string for invalid version like 'upstream/1.0.0' or 'kubernetes.1.43.0' that we get in the request. Such validation are not provided by version comparator. I verified the current test cases using version comparator, not every thing can be handled. One such example is if we have 'upstream/1.0.1' (invalid version) and empty version (which is default HIGH version), current code return empty version as other version is invalid, but same code in version comparator will return 'upstream/1.0.1' as high version. This is not correct. |
version comparator, as the name suggests, is used to compare versions. Its not a version validator. If there is any validation which is needed, it can be done in the code outside its purview. The comment is just to stop using tuple so that comparison can be done easily. how is tuple anyways helping you solve and check version;s validation? |
@yzainee The issue is not in version comparison, this bug or sentry error was due to version validation only. So, as part of the investigation, this PR fixes only version parsing / validation part. No modifying any part of version comparison logic. If required we can have separate task for this. |
@yzainee Finally your wish came true, I have removed customer function and use version comparator to get highest non-cve version for recommendation. |
highest_version = version | ||
try: | ||
input_version_comparable = ComparableVersion(self.version) | ||
for version in latest_non_cve_versions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest non cve version will never return a list/set. Its always a single value. Why are we using a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the existing flow this is function variable and comes as a list. Below was the code that populates this parameter:
latest_non_cve_versions = result_data[0].get('package', {}).get('latest_non_cve_version', [])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest_non_cve_version = mgmt.makePropertyKey('latest_non_cve_version').dataType(String.class).make();
schema for this value. it will not allow to store a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use gremlin to fetch the data, below is batch gremlin query we are using in CA POST flow.
ca_batch_query = """
epv = [];packages.each {g.V().has('pecosystem', ecosystem).has('pname', it.name)
.has('version', it.version).as('version', 'cve').select('version').in('has_version')
.dedup().as('package').select('package', 'version', 'cve')
.by(valueMap()).by(valueMap()).by(out('has_snyk_cve')
.valueMap().fold()).fill(epv);};epv;
"""
Out of this query provides packages with non-cve version as an list, though it has always one value, but technically its a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact most of the properties are list as shown in snypet of response below:
"package": {
"ecosystem": [
"npm"
],
"gh_subscribers_count": [
5
],
"gh_contributors_count": [
-1
],
"latest_version_last_updated": [
"20191117"
],
"vertex_label": [
"Package"
],
"libio_dependents_repos": [
"634788"
],
"latest_non_cve_version": [
"7.0.3"
],
"gh_issues_last_year_opened": [
-1
],
"gh_issues_last_month_closed": [
-1
],
"gh_open_issues_count": [
5
],
"libio_dependents_projects": [
"290"
],
"latest_version": [
"7.0.3"
],
"tokens": [
"cliui"
],
"package_relative_used": [
"not used"
],
"gh_stargazers": [
210
],
"gh_forks": [
16
],
"package_dependents_count": [
-1
],
"gh_prs_last_month_opened": [
-1
],
"gh_issues_last_year_closed": [
-1
],
"last_updated": [
1.6046478431252089E9
],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case you can pick [0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzainee I would have not touched the existing logical flow, however as per above discussion I have removed latest non-cve into simple comparison check. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if latest_non_cve_version_comparable > input_version_comparable: | ||
highest_version = latest_non_cve_versions[0] | ||
|
||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't cache generic exception, Catch the specific one which will be thrown by ComparableVersion.
highest_version = latest_non_cve_versions[0] | ||
|
||
except Exception as e: | ||
logger.error(f"Package {self.package} @ {self.version} raised exception {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not recommended to use preformatted string in logging. it would expanded regardless of log level which is not good.
logger.error(f"Package {self.package} @ {self.version} raised exception {e}") | |
logger.exception("Unable to parse version %s@%s", self.package, self.version) |
except Exception as e: | ||
logger.error(f"Package {self.package} @ {self.version} raised exception {e}") | ||
|
||
logger.info("Highest non-cve version for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Codecov Report
@@ Coverage Diff @@
## master #737 +/- ##
==========================================
+ Coverage 84.09% 84.59% +0.50%
==========================================
Files 21 21
Lines 1603 1584 -19
==========================================
- Hits 1348 1340 -8
+ Misses 255 244 -11
Continue to review full report at Codecov.
|
Edit: ✌️ Image Build Successfull @dgpatelgit, Avaliable at: |
try: | ||
input_version_comparable = ComparableVersion(self.version) | ||
|
||
# latest non-cve is list with only one entry. | ||
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0]) | ||
except TypeError: | ||
logger.error("Package %s@%s raised a TypeError", self.package, self.version) | ||
|
||
if latest_non_cve_version_comparable > input_version_comparable: | ||
highest_version = latest_non_cve_versions[0] | ||
|
||
logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version, | ||
highest_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've stated that the comparison must go into exception else block, this is incorrect.
try: | |
input_version_comparable = ComparableVersion(self.version) | |
# latest non-cve is list with only one entry. | |
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0]) | |
except TypeError: | |
logger.error("Package %s@%s raised a TypeError", self.package, self.version) | |
if latest_non_cve_version_comparable > input_version_comparable: | |
highest_version = latest_non_cve_versions[0] | |
logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version, | |
highest_version) | |
try: | |
input_version_comparable = ComparableVersion(self.version) | |
# latest non-cve is list with only one entry. | |
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0]) | |
except TypeError: | |
logger.error("Package %s@%s raised a TypeError", self.package, self.version) | |
else: | |
if latest_non_cve_version_comparable > input_version_comparable: | |
highest_version = latest_non_cve_versions[0] | |
logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version, | |
highest_version) |
tests/test_utils.py
Outdated
@patch('bayesian.utils.init_celery', return_value=None) | ||
@patch('bayesian.utils.run_flow', return_value=1234) | ||
def test_server_run_flow(self, _rf_mock, _cf_mock): | ||
"""Test basic run flow function.""" | ||
assert server_run_flow('RUN_FLOW_NAME', {}) == 1234 | ||
|
||
@patch('bayesian.utils.init_celery', return_value=None) | ||
@patch('bayesian.utils.run_flow', return_value=1234) | ||
def test_create_component_bookkeeping(self, _rf_mock, _cf_mock): | ||
"""Verify create componenet book keeping utility function.""" | ||
assert create_component_bookkeeping('pypi', ['pkg1', 'pkg2'], {}, {}) == 1234 | ||
|
||
@patch('bayesian.utils.init_celery', return_value=None) | ||
@patch('bayesian.utils.run_flow', return_value=1234) | ||
def test_server_create_analysis(self, _rf_mock, _cf_mock): | ||
"""Verify various combinations of create analysis function.""" | ||
assert server_create_analysis('pypi', 'pkg1', '1.5.4', None, False, False, False) == 1234 | ||
|
||
assert server_create_analysis('pypi', 'pkg1', '1.5.4', None, True, False, False) == 1234 | ||
|
||
pkg = 'debug:artifact:0.4.3' | ||
assert server_create_analysis('maven', pkg, '0.4.3', None, False, False, False) == 1234 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of these tests? why does it only asserts mocked func return values which will be always same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason was Codecov, after removing existing util functions codecov was failing due to reduction in coverage percentage (it was as low as 18% for utility file). You can see few commit before error on Codecov. To improve unit test coverage I added some of missing test cases for run-flow utility functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine as the number of LOCs are reduced. But these tests doesn't make any sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arajkumar what do you suggest? shall we reduce Codecov level from B to C. Currently we fail Codecov if any file has level less than 'B' which is 19% coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fixes issue with version parsing during CA post call. There was version string clean up required for
maven
ecosystem, but this generic code was applied to all ecosystem. Also reduced log level from error to warning as CA response is served with return status 202 including other packages with valid version. Package with invalid version will be returned asunknown_packages
in the response.Fixes jira issue :: https://issues.redhat.com/browse/APPAI-1776
Reduced vulnerabilities from
22
to1
and resolved1
exploits by upgrading dependent package versions in manifest file.Address below sentry errors:
Stage :: https://sentry.stage.devshift.net/sentry/fabric8-analytics-stage/issues/202486/
Prod :: https://sentry.devshift.net/sentry/fabric8-analytics-production/issues/250686/