Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6558 validate files on publish #6790

Merged
merged 30 commits into from
Apr 14, 2020
Merged

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Apr 3, 2020

What this PR does / why we need it:
See the issue; and the release note: https://github.com/IQSS/dataverse/blob/6558-validate-files-on-publish/doc/release-notes/6558-validate-files-on-publish.md

Which issue(s) this PR closes:

Closes #6558

Special notes for your reviewer:

Suggestions on how to test this:
Create a dataset, upload some files, mess up one of the files in the dataset directory or in the S3 bucket... try to publish.
Try it with both the synchronous and asynchronous ("too many files") publishing modes.
Does this PR introduce a user interface change?:
There is a new UI error message notifying the user about the failure to publish, explaining to them that some of their files do not pass validation, and telling them that they must contact support in order to fix it, before they can make another attempt to publish.
It uses the standard "red" error banner system at the top of the page.

Is there a release notes update needed for this change?:
yes, above.
Additional documentation:

@coveralls
Copy link

coveralls commented Apr 3, 2020

Coverage Status

Coverage decreased (-0.03%) to 19.626% when pulling a2079d2 on 6558-validate-files-on-publish into 076bc32 on develop.

@sekmiller sekmiller self-assigned this Apr 3, 2020
JH.addMessage(FacesMessage.SEVERITY_ERROR, BundleUtil.getStringFromBundle("dataset.publish.file.validation.error.message"),
BundleUtil.getStringFromBundle("dataset.publish.file.validation.error.contactSupport"));
}
/* and now that we've shown the message to the user - remove the lock? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want the admin to intercede here we probably don't want to remove the lock, but should we update the edit data button so that if there is a file validation failure the user cannot add files? We would still allow them to make other edits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe use the file validation failure lock to disable the publish button?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't want to remove that lock automatically - that was an idea I was entertaining during the development. I should remove that comment.
As for the second comment - that's exactly what that "failed validation" lock does. It disables the publish button, and will prevent an execution of the publish command, if called via API. But it shouldn't prevent the contributor from making changes. But yes, this should be documented clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I missed that the lock disables publish because it's if it's locked for any reason, not just file validation failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that page is dealing with/checking for various types of locks is a little scary.
Added a quick note to the "dataset management"/"publish dataset" section of the user guide.

for (DataFile dataFile : dataset.getFiles()) {
// TODO: Should we validate all the files in the dataset, or only
// the files that haven't been published previously?
logger.log(Level.FINE, "validating DataFile {0}", dataFile.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be desirable to attempt to validate all of the files before throwing an exception. It would make the admins job easier if they knew more than one file failed validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I decided to quit on the first invalid file, so that it's faster. And then I tell the admin in the troubleshooting instruction that they need to verify all the files in the dataset.
I just had this idea: I should probably leave the finalizePublication command as is, stopping on the first file that fails - but then I should provide a one step API for the admin, that will validate all the files in the dataset (using the existing code), and spell out to them which ones need to be fixed - ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is done.

@landreev landreev self-assigned this Apr 6, 2020
@kcondon kcondon self-assigned this Apr 8, 2020
landreev added 3 commits April 9, 2020 16:02
... from the troubleshooting guide; also moved around the diag. API and the example sections.
@kcondon
Copy link
Contributor

kcondon commented Apr 10, 2020

@landreev
x1. Turning off validation does not seem to stop validation.
x2. There is an unusual error banner behavior, it displays initially as Error: with expected text about missing file and contact support as a link, then wait a few seconds, it refreshes, says Failed to Publish, then same text as before but says support as text, not link. Just seems like we'e duplicating functionality somewhere, otherwise works.
x 3. When in async mode (20 files for this test), dataset fails to publish with valid files, many logging entries, some regarding storage driver, ends with index error. No logging present on develop branch for same scenario. Update: I have also seen this fail but with index error only at end.
x4. Minor doc issue, preexisting but seems like we should fix since we now reference it 2x, :PIDAsynchRegFileCount does not have it's own entry in config/settings, only mentioned inline as a setting in file pid and now file validation settings.

@landreev
Copy link
Contributor Author

Turning off validation does not seem to stop validation.

OK, will investigate and fix.

@kcondon kcondon assigned kcondon and landreev and unassigned kcondon Apr 10, 2020
@landreev
Copy link
Contributor Author

landreev commented Apr 13, 2020

@kcondon
In item 3.:

No logging present on develop branch for same scenario. Update: I have also seen this fail but with index error only at end.

So does the dataset get published or not, w/ develop branch on dvn-build? (not 100 %sure how to read "same scenario" - so checking...)
As for the "logging", I'm going to assume that those were whitehat scan messages - is that safe?

@kcondon
Copy link
Contributor

kcondon commented Apr 13, 2020

@landreev Yes, develop works

@landreev
Copy link
Contributor Author

Checked in fixes, should be ready to re-test.

@landreev
Copy link
Contributor Author

@kcondon

4. Minor doc issue, preexisting but seems like we should fix since we now reference it 2x, :PIDAsynchRegFileCount does not have it's own entry in config/settings, only mentioned inline as a setting in file pid and now file validation settings.

It was really "preexisting", in that apparently it wasn't documented at all. I added that mention of it, to both the file reg. and the validation options sections. I wasn't sure it needed its own section, but sure, I just added its own paragraph.

@kcondon kcondon self-assigned this Apr 14, 2020
@landreev
Copy link
Contributor Author

There's one more thing - whether we want to validate every file, every time, where I was soliciting opinions on how to handle it. I'm leaning towards skipping it for minor versions; but validating every file, new and old on every major version. But this hasn't been implemented yet.

@landreev
Copy link
Contributor Author

I'm leaning towards skipping it for minor versions; but validating every file, new and old on every major version.

OK, this has been checked in as well.

@kcondon kcondon merged commit 75f3b69 into develop Apr 14, 2020
@kcondon kcondon deleted the 6558-validate-files-on-publish branch April 14, 2020 19:32
@djbrooke djbrooke added this to the Dataverse 5 milestone Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate the physical files when publishing a dataset
5 participants