Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPAI-1776] Fixed sentry error due to invalid version #737

Merged
merged 16 commits into from
Mar 22, 2021

Conversation

dgpatelgit
Copy link
Contributor

@dgpatelgit dgpatelgit commented Mar 8, 2021

Fixes issue with version parsing during CA post call. There was version string clean up required for maven ecosystem, but this generic code was applied to all ecosystem. Also reduced log level from error to warning as CA response is served with return status 202 including other packages with valid version. Package with invalid version will be returned as unknown_packages in the response.

Fixes jira issue :: https://issues.redhat.com/browse/APPAI-1776

Reduced vulnerabilities from 22 to 1 and resolved 1 exploits by upgrading dependent package versions in manifest file.

Address below sentry errors:
Stage :: https://sentry.stage.devshift.net/sentry/fabric8-analytics-stage/issues/202486/
Prod :: https://sentry.devshift.net/sentry/fabric8-analytics-production/issues/250686/

if ecosystem == "maven":
"""Needed for maven version like 1.5.2.RELEASE to be converted to
1.5.2 - RELEASE for semantic version to work."""
version = version.replace('.', '-', 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what value will be sent for ingestion in case the pkg is unknown? Also, is the version in graph going to be stored with . or - ? While fetching details from graph, will it expect in . or - form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions is used to format the version we report in recommended version field of CA POST call. This is not used for any query graph / ingestion purpose. Also I don't know why maven version are converted from '.' to '-' forms, this generic conversion was creating issue for golang for sure, so moved maven specific code under ecosystem check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. can you ask the original coder to check why do we need this in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepak1725 Can you add oldie who can provide us reasoning behind this strange version transformation for Maven ecosystem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was written 3 years ago, You will hardly found the author of that.

Copy link
Contributor Author

@dgpatelgit dgpatelgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzainee You suggest to check on version_comparator to remove this customize version transformation in API server. I did integrated version_comparator, but we cannot use this library directly because of following reasons.

  1. Version comparator is used for version comparison only we cannot transform them to form a tuple with major, minor, patch and build. This tuple is refereed further in complete flow of CA response builder.
  2. Invalid version string validation is missing in version comparator. This functionality is done by existing function in API server, which is a valid use-case.
  3. With golang, we have to support semver and pseudo version format, which is taken care by existing function.

@yzainee-zz
Copy link
Member

@dgpatelgit
1- why do you need major, minor patch etc? all you should care is about the version. how does it matter where it fits?
2- Please specify what type of validations are you referring to
3- FYI, im asking you to use version comparator just to find which is the latest etc. where else do you see a use of tuple and all?

@dgpatelgit
Copy link
Contributor Author

@dgpatelgit
1- why do you need major, minor patch etc? all you should care is about the version. how does it matter where it fits?
2- Please specify what type of validations are you referring to
3- FYI, im asking you to use version comparator just to find which is the latest etc. where else do you see a use of tuple and all?

We need to validate version string for invalid version like 'upstream/1.0.0' or 'kubernetes.1.43.0' that we get in the request. Such validation are not provided by version comparator. I verified the current test cases using version comparator, not every thing can be handled. One such example is if we have 'upstream/1.0.1' (invalid version) and empty version (which is default HIGH version), current code return empty version as other version is invalid, but same code in version comparator will return 'upstream/1.0.1' as high version. This is not correct.

@yzainee-zz
Copy link
Member

version comparator, as the name suggests, is used to compare versions. Its not a version validator. If there is any validation which is needed, it can be done in the code outside its purview. The comment is just to stop using tuple so that comparison can be done easily. how is tuple anyways helping you solve and check version;s validation?

@dgpatelgit
Copy link
Contributor Author

@yzainee The issue is not in version comparison, this bug or sentry error was due to version validation only. So, as part of the investigation, this PR fixes only version parsing / validation part. No modifying any part of version comparison logic. If required we can have separate task for this.

@dgpatelgit
Copy link
Contributor Author

dgpatelgit commented Mar 17, 2021

@yzainee Finally your wish came true, I have removed customer function and use version comparator to get highest non-cve version for recommendation.

highest_version = version
try:
input_version_comparable = ComparableVersion(self.version)
for version in latest_non_cve_versions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest non cve version will never return a list/set. Its always a single value. Why are we using a loop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing flow this is function variable and comes as a list. Below was the code that populates this parameter:
latest_non_cve_versions = result_data[0].get('package', {}).get('latest_non_cve_version', [])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest_non_cve_version = mgmt.makePropertyKey('latest_non_cve_version').dataType(String.class).make();
schema for this value. it will not allow to store a list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use gremlin to fetch the data, below is batch gremlin query we are using in CA POST flow.

ca_batch_query = """
epv = [];packages.each {g.V().has('pecosystem', ecosystem).has('pname', it.name)
.has('version', it.version).as('version', 'cve').select('version').in('has_version')
.dedup().as('package').select('package', 'version', 'cve')
.by(valueMap()).by(valueMap()).by(out('has_snyk_cve')
.valueMap().fold()).fill(epv);};epv;
"""

Out of this query provides packages with non-cve version as an list, though it has always one value, but technically its a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact most of the properties are list as shown in snypet of response below:

"package": {
"ecosystem": [
"npm"
],
"gh_subscribers_count": [
5
],
"gh_contributors_count": [
-1
],
"latest_version_last_updated": [
"20191117"
],
"vertex_label": [
"Package"
],
"libio_dependents_repos": [
"634788"
],
"latest_non_cve_version": [
"7.0.3"
],
"gh_issues_last_year_opened": [
-1
],
"gh_issues_last_month_closed": [
-1
],
"gh_open_issues_count": [
5
],
"libio_dependents_projects": [
"290"
],
"latest_version": [
"7.0.3"
],
"tokens": [
"cliui"
],
"package_relative_used": [
"not used"
],
"gh_stargazers": [
210
],
"gh_forks": [
16
],
"package_dependents_count": [
-1
],
"gh_prs_last_month_opened": [
-1
],
"gh_issues_last_year_closed": [
-1
],
"last_updated": [
1.6046478431252089E9
],

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case you can pick [0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzainee I would have not touched the existing logical flow, however as per above discussion I have removed latest non-cve into simple comparison check. PTAL

Copy link
Member

@yzainee-zz yzainee-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bayesian/utility/v2/ca_response_builder.py Show resolved Hide resolved
if latest_non_cve_version_comparable > input_version_comparable:
highest_version = latest_non_cve_versions[0]

except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't cache generic exception, Catch the specific one which will be thrown by ComparableVersion.

highest_version = latest_non_cve_versions[0]

except Exception as e:
logger.error(f"Package {self.package} @ {self.version} raised exception {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not recommended to use preformatted string in logging. it would expanded regardless of log level which is not good.

Suggested change
logger.error(f"Package {self.package} @ {self.version} raised exception {e}")
logger.exception("Unable to parse version %s@%s", self.package, self.version)

except Exception as e:
logger.error(f"Package {self.package} @ {self.version} raised exception {e}")

logger.info("Highest non-cve version for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

bayesian/utility/v2/ca_response_builder.py Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 18, 2021

Codecov Report

Merging #737 (8bb605a) into master (f4edec1) will increase coverage by 0.50%.
The diff coverage is 77.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #737      +/-   ##
==========================================
+ Coverage   84.09%   84.59%   +0.50%     
==========================================
  Files          21       21              
  Lines        1603     1584      -19     
==========================================
- Hits         1348     1340       -8     
+ Misses        255      244      -11     
Impacted Files Coverage Δ
bayesian/utils.py 71.68% <ø> (+6.29%) ⬆️
bayesian/utility/v2/ca_response_builder.py 97.43% <77.77%> (-0.87%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4edec1...8bb605a. Read the comment docs.

@dgpatelgit
Copy link
Contributor Author

dgpatelgit commented Mar 19, 2021

Edit: ✌️ Image Build Successfull @dgpatelgit, Avaliable at: ghcr.io/fabric8-analytics/fabric8-analytics-server/bayesian-bayesian-api:SNAPSHOT-PR-737

Comment on lines 164 to 176
try:
input_version_comparable = ComparableVersion(self.version)

# latest non-cve is list with only one entry.
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0])
except TypeError:
logger.error("Package %s@%s raised a TypeError", self.package, self.version)

if latest_non_cve_version_comparable > input_version_comparable:
highest_version = latest_non_cve_versions[0]

logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version,
highest_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stated that the comparison must go into exception else block, this is incorrect.

Suggested change
try:
input_version_comparable = ComparableVersion(self.version)
# latest non-cve is list with only one entry.
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0])
except TypeError:
logger.error("Package %s@%s raised a TypeError", self.package, self.version)
if latest_non_cve_version_comparable > input_version_comparable:
highest_version = latest_non_cve_versions[0]
logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version,
highest_version)
try:
input_version_comparable = ComparableVersion(self.version)
# latest non-cve is list with only one entry.
latest_non_cve_version_comparable = ComparableVersion(latest_non_cve_versions[0])
except TypeError:
logger.error("Package %s@%s raised a TypeError", self.package, self.version)
else:
if latest_non_cve_version_comparable > input_version_comparable:
highest_version = latest_non_cve_versions[0]
logger.info("Highest non-cve version for %s@%s is %s", self.package, self.version,
highest_version)

Comment on lines 218 to 239
@patch('bayesian.utils.init_celery', return_value=None)
@patch('bayesian.utils.run_flow', return_value=1234)
def test_server_run_flow(self, _rf_mock, _cf_mock):
"""Test basic run flow function."""
assert server_run_flow('RUN_FLOW_NAME', {}) == 1234

@patch('bayesian.utils.init_celery', return_value=None)
@patch('bayesian.utils.run_flow', return_value=1234)
def test_create_component_bookkeeping(self, _rf_mock, _cf_mock):
"""Verify create componenet book keeping utility function."""
assert create_component_bookkeeping('pypi', ['pkg1', 'pkg2'], {}, {}) == 1234

@patch('bayesian.utils.init_celery', return_value=None)
@patch('bayesian.utils.run_flow', return_value=1234)
def test_server_create_analysis(self, _rf_mock, _cf_mock):
"""Verify various combinations of create analysis function."""
assert server_create_analysis('pypi', 'pkg1', '1.5.4', None, False, False, False) == 1234

assert server_create_analysis('pypi', 'pkg1', '1.5.4', None, True, False, False) == 1234

pkg = 'debug:artifact:0.4.3'
assert server_create_analysis('maven', pkg, '0.4.3', None, False, False, False) == 1234
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of these tests? why does it only asserts mocked func return values which will be always same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason was Codecov, after removing existing util functions codecov was failing due to reduction in coverage percentage (it was as low as 18% for utility file). You can see few commit before error on Codecov. To improve unit test coverage I added some of missing test cases for run-flow utility functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine as the number of LOCs are reduced. But these tests doesn't make any sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arajkumar what do you suggest? shall we reduce Codecov level from B to C. Currently we fail Codecov if any file has level less than 'B' which is 19% coverage.

Copy link
Member

@arajkumar arajkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dgpatelgit dgpatelgit merged commit c07d525 into fabric8-analytics:master Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants