Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two Schemas for workflows/config.yaml, package validation, default package name #2364

Merged
merged 63 commits into from
Oct 14, 2021

Conversation

fiskus
Copy link
Member

@fiskus fiskus commented Sep 28, 2021

  • Base JSON Schema was updated
  • Added catalog specific JSON Schema
  • Added default package name via package_handle in catalog Schema
  • Added package name validation via handle_pattern in base Schema
  • Added entries validation via entries_schema in base Schema
  • Updated Ajv and dependent code

@codecov
Copy link

codecov bot commented Sep 29, 2021

Codecov Report

Merging #2364 (cb212cf) into master (9b7f9c0) will increase coverage by 0.05%.
The diff coverage is 49.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2364      +/-   ##
==========================================
+ Coverage   42.96%   43.02%   +0.05%     
==========================================
  Files         496      496              
  Lines       23601    23725     +124     
  Branches     3081     3128      +47     
==========================================
+ Hits        10141    10208      +67     
- Misses      12645    12699      +54     
- Partials      815      818       +3     
Flag Coverage Δ
api-python 89.82% <ø> (ø)
catalog 14.35% <49.09%> (+0.33%) ⬆️
lambda 93.87% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
catalog/app/containers/Bucket/PackageCopyDialog.js 0.00% <0.00%> (ø)
...alog/app/containers/Bucket/PackageCreateDialog.tsx 0.00% <ø> (ø)
...iners/Bucket/PackageDialog/MetaInputErrorHelper.js 35.71% <0.00%> (ø)
...iners/Bucket/PackageDialog/PackageCreationForm.tsx 0.00% <0.00%> (ø)
...g/app/containers/Bucket/PackageDirectoryDialog.tsx 0.00% <0.00%> (ø)
.../containers/Bucket/PackageDialog/PackageDialog.tsx 24.33% <20.00%> (-0.67%) ⬇️
catalog/app/containers/Bucket/requests/package.ts 34.09% <25.00%> (-1.05%) ⬇️
catalog/app/containers/Bucket/errors.tsx 47.82% <66.66%> (+0.57%) ⬆️
catalog/app/utils/workflows.ts 89.61% <81.81%> (-10.39%) ⬇️
catalog/app/utils/json-schema/json-schema.ts 57.65% <90.90%> (+4.65%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b7f9c0...cb212cf. Read the comment docs.

@fiskus fiskus requested a review from nl0 October 11, 2021 05:22
Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. package entries validation still doesn't check all the entries, just the new ones;
  2. there's a bug in the config validation/parsing related to presence/absence of the catalog version (more details inline);
  3. probably it'd be cleaner to get the current schema versions from the actual schema objects instead of hardcoding them.

if (!semver.satisfies(semver.coerce(baseVersion) || '0.0.0', baseSchemaVersion)) {
errors.push(new Error(`Catalog supports ${baseSchemaVersion} version of base Schema`))

const invalidBaseVersion = validateConfigVersion(baseVersion, '1.1.0')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this would've been a little cleaner if validateConfigVersion threw an error (instead of returning it) and this block was wrapped in a try/catch, e.g.

try {
  validateConfigVersion(baseVersion, '1.1.0')
} catch (e) {
  errors.push(e)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, schema version should be taken directly from the schema object, not hardcoded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hardcoded the version in the filename, so I don't see a problem to hardcode it here too. If incrementing versions become frequent operation, we could optimize it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both validate functions return undefined or error, as validate functions for form. I think, consistency make more sense here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateConfig throws error, so consistency doesn't matter here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hardcoded the version in the filename, so I don't see a problem to hardcode it here too. If incrementing versions become frequent operation, we could optimize it

well, the problem is that you're introducing second source of truth (which should be manually synced with the first one), where it could be easily avoided, but i'm fine with this as long as you're conscious about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i fully get it about consistency (receiving mixed signals from two messages), but this call is yours too, i've just suggested how to make the code a little more concise

shared/schemas/workflows-config-1.1.0.json Outdated Show resolved Hide resolved
catalog/app/containers/Bucket/PackageDirectoryDialog.tsx Outdated Show resolved Hide resolved
@@ -129,6 +216,7 @@ export function parse(workflowsYaml: string): WorkflowsConfig {
const successors = data.successors || {}
return {
isWorkflowRequired: data.is_workflow_required !== false,
packageName: parsePackageNameTemplates(data.catalog?.package_handle),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a logic bug here: when there's no catalog version, config gets validated against the base schema (w/o catalog extension), but catalog-specific fields get parsed anyways, which can lead to accessing invalid data stored in catalog-specific fields (since they are always considered valid by the base schema). i see two ways to resolve this: either always validate against the catalog schema (assume it's of correct version by default and just check the data) or add some conditions to the config parsing logic that will prevent from reading catalog fields when there's no catalog version provided

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to remove catalog property if user didn't proved catalog version in config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tho probably it makes sense to tell the user about it, at least a warning in the console would help

shared/schemas/workflows-config-1.1.0.json Outdated Show resolved Hide resolved
shared/schemas/workflows-config-1.1.0.json Outdated Show resolved Hide resolved
shared/schemas/workflows-config-1.1.0.json Outdated Show resolved Hide resolved
shared/schemas/workflows-config-1.1.0.json Outdated Show resolved Hide resolved
@fiskus fiskus requested a review from nl0 October 11, 2021 14:16
Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor notes:
1. added files should take precedence over existing files when validating package contents;
2. some indication (e.g. console message) about dropping catalog-specific fields if catalog version is missing would be helpful.

Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual testing report

Catalog issues

  1. [Minor bug, possible security issue] version is not validated before being used, validation happens after that,
    bc a schema needs to be selected first based on that version.
    This allows passing anything into version and it will be used as is --
    this may be unsafe bc it's possible there are some vulnerabilites in semver
    lib that can be exploited by passing it some specially constructed malicious
    string, for example, so probably we should check version format first.

  2. [Minor UX enhancement] When catalog version is missing, catalog-specific fields get removed from
    the config, but it's not indicated to the user in any way -- this can be
    very confusing. We should tell the user about that -- for example, by
    printing a console message if their config has catalog-specific fields and
    doesn't specify catalog version.

  3. [Minor UX enhancement] Error message about incompatible config version doesn't specify which version
    is incompatible -- base or catalog.

  4. [Minor UX enhancement] Error message about invalid workflows config: consider separating errors
    with newlines instead of commas for readability.

  5. [Minor bug] When opening "package from folder" dialog while in the bucket root, "packages"
    handle template is used when present (because directory is an empty string,
    and the condition is mistakenly triggered -- see inline notes).

  6. [Minor UX enhancement] username should be passed to the "files" handle template as well -- this
    can make templates a little more flexible and useful.

  7. [Bug] package_handle overrides existing package name in "revise" and "promote"
    dialogs -- I believe it shouldn't, because we're working with the existing
    package and should keep the name by default.

  8. [Bug] Package name validation against handle_pattern isn't triggered when switching
    workflows -- it's only triggered when editing the name afterwards. This allows
    to bypass name validation.

  9. [Minor UX enhancement] When entries_schema contains invalid id (not pointing to any known schema),
    and also when schema contains invalid JSON (parsing error), validation is
    skipped entirely without a notice -- we should show some error message instead.

  10. [Bug] When schema url points to a non-existent file, and also when a schema file
    contains invalid schema object, UI gets permanently locked because of an unhandled
    exception after validation attempt.

  11. [Minor UX enhancement] Validation messages shown below the files input could use some margin between
    them or a common container.

  12. [Minor UX enhancement] Backend validation messages should be displayed the same way as the frontend
    messages -- below the files input, preferably with the same visual treatment.
    Also, the messages received from the backend are very uninformative, but that
    probably should be addressed in the backend PR.

  13. [Minor UX enhancement] Backend validation errors contain leading quote -- not sure if it's a backend
    or a catalog.

Backend issues (posting here just FYI)

  1. Package contents aren't validated properly: after validation passed in the
    catalog and files were uploaded, backend call returns an error:
    Package entries failed validation: [] is too short. This prevents from
    creating a package even tho the entries are valid.

  2. Validation is skipped when schema is invalid (not valid JSON schema),
    unrecognized fields (possible typos) are ignored -- probably should be more strict
    and throw errors to be consistent with the catalog.

  3. Very unclear message when one of allOf schemas fail, e.g.:
    Package entries failed validation: None of [{'logical_key': '1.md', 'size': 2}, {'logical_key': '2.md', 'size': 2}, {'logical_key': 'README.md', 'size': 14}] are valid under the given schema.,
    when in reality all of the items are valid, but missing one item required via contains.

  4. Backend validation errors are very uninformative, they should at least specify
    paths were errors were encountered (try validating min/max size, for example).

  5. Missing $ref support -- I realize it's a known limitation and lifting it is
    out of scope of the current effort, but it doesn't seem too hard to at least
    support local refs (for example, ajv supports them out of the box AFAIK).

@fiskus
Copy link
Member Author

fiskus commented Oct 14, 2021

[Bug] Package name validation against handle_pattern isn't triggered when switching
workflows -- it's only triggered when editing the name afterwards. This allows
to bypass name validation.

Validation is triggered, but we show error only when:

  • submit failed
  • value was modified
validation.mp4

@nl0
Copy link
Member

nl0 commented Oct 14, 2021

[Bug] Package name validation against handle_pattern isn't triggered when switching
workflows -- it's only triggered when editing the name afterwards. This allows
to bypass name validation.

Validation is triggered, but we show error only when:

* submit failed

* value was modified

it's not

Screen.Recording.2021-10-14.at.13.21.15.mov

@fiskus
Copy link
Member Author

fiskus commented Oct 14, 2021

Reproduced with your example, thanks. Technically, validation is triggered but with the previous workflow, didn't figure out why yet

@fiskus fiskus merged commit b4a1dfb into master Oct 14, 2021
@fiskus fiskus deleted the two-schemas branch October 14, 2021 17:49
This was referenced Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants