Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary File API Enhancements #8237

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Nov 10, 2021

What this PR does / why we need it:
This PR adds a few enhancements to the Aux file API:

  • Auxiliary files can now also be associated with non-tabular files
  • Improved error reporting
  • The API will block attempts to create a duplicate auxiliary file
  • Delete and list-by-origin calls have been added

Which issue(s) this PR closes:

Closes #8235

Special notes for your reviewer:

  • w.r.t. stopping duplicates - this is an existing bug - you can add two aux files with the same format params which then causes downloads to fail (as there isn't a single response to the query to find the file.)
  • w.r.t. list-by-origin - this was all that QDR needed and I thought perhaps that requiring the origin to find files might be useful, but one could redesign to just list all aux files and require the caller to filter the results and/or to make the origin param optional (e.g. query param)
  • w.r.t. non-tabular files: QDR is using aux files with document file types (Word, PDF) and I'm not sure there's a good reason to limit aux files to certain mime types. Hopefully not controversial to just remove the restriction
  • w.r.t. error reporting - again hoping there's no controversy - just trying to report the various ways the add call can fail more clearly.

Suggestions on how to test this:
As usual, just following the doc examples should be good. Specific things to try would be

  • try to add an aux file twice (same formatTag/formatVersion - doesn't have to be same exact file) and verify that the second call fails
  • add an aux to a non-tabular file
  • try delete
  • try to list files from the origin you use in uploads

Does this PR introduce a user interface change? If mockups are available, please link/include them here: api only

Is there a release notes update needed for this change?: draft provided - this is a tweak to an experimental API so probably nothing that needs to be prominent

Additional documentation:

@djbrooke djbrooke self-assigned this Nov 15, 2021
@djbrooke djbrooke removed their assignment Nov 15, 2021
@djbrooke
Copy link
Contributor

I added a reminder to expand on use cases at release time (but also so the build would be re-tried automatically :)). I'll unassign myself so someone can review the code.

@pdurbin pdurbin self-assigned this Nov 15, 2021
@raprasad
Copy link
Contributor

thanks @qqmyers, 👍 for "API will block attempts to create a duplicate auxiliary file"

Conflicts (from PR IQSS#8192 it seems):
src/main/java/edu/harvard/iq/dataverse/api/Access.java
@pdurbin
Copy link
Member

pdurbin commented Nov 16, 2021

@qqmyers heads up that I resolved merged conflicts with develop in dedd9f8. As I mentioned in the commit, they seem to be caused by pull request #8192 and only affected src/main/java/edu/harvard/iq/dataverse/api/Access.java

@pdurbin
Copy link
Member

pdurbin commented Nov 16, 2021

That's odd, testForceReplaceAndUpdate failed with a 500 error according to https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-8237/3/testReport/junit/edu.harvard.iq.dataverse.api/FilesIT/testForceReplaceAndUpdate/

Screen Shot 2021-11-16 at 10 26 37 AM

I just ran it locally, individually, and it passed.

It might be a recurrence of this issue:

I'll probably push a small commit, maybe to docs, just to see if I can get the test suite passing.

@qqmyers
Copy link
Member Author

qqmyers commented Nov 16, 2021

FWIW: Looks like the 500 error would be from the UpdateDatasetVersionCommand failing - since the test does a replace file right before the metadata update, it could just be a conflict due to indexing still running on the dataset, etc. - i.e. a random race condition that a delay or checking and waiting for the lastindex timestamp to update would fix.

@pdurbin
Copy link
Member

pdurbin commented Nov 16, 2021

Whoops, when I merged develop I accidentally deleted the "list by origin" API endpoint. I added it back in 59d0cbb.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good.

Bugs:

  • "Listing Auxiliary Files for a Datafile by Origin" gives a 500 error and NullPointerException if no type is given. You could consider using NullSafeJsonBuilder but whatever solution is fine.

Would appreciate these:

  • Yes, it would be nice to have a "list all aux files given a file id" endpoint. Users will know the file id. They may not remember the origin they used.
  • I added a few tests in b66fb55 but only for the new "delete" and "list by origin" APIs. It would be nice to have tests for the new "non-tabular files can have aux files" rule and the "duplicate aux files" are prevented rule. Possibly the new error handling could be tested too. Maybe tests around permissions.

Things to think about:

  • Various comments I left in the review.
  • When should we move the aux file API docs from dev/experimental to prod?
  • 404 gives a somewhat confusing "no API endpoint exists" error. This happens for any API endpoint that returns a 404, I think, so it's not unique to this code. It looks like this: {"status":"ERROR","code":404,"message":"API endpoint does not exist on this server. Please check your code for typos, or consult our API guide at http://guides.dataverse.org.","requestUrl":"http://localhost:8080/api/v1/access/datafile/1441/auxiliary/myApp","requestMethod":"GET"}

Comment on lines 148 to 149
} catch(Exception ex) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle the exception from getResultList somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the exceptions for getResultList(); all indicate real server errors, I've removed the try block so they bubble up to be reported via the API. I also changed the existing getSingleResult() call where one exception (NoResultException) is OK and should return null whereas other exceptions represent a real server error.

@qqmyers
Copy link
Member Author

qqmyers commented Nov 19, 2021

re: main review comment:

  • I added an additional API call to list all aux files, regardless of origin. (FWIW: I sorta like the security-by-obscurity aspect of requiring the origin to be known - since these files have meaning to some app, making them available to other apps/users who may not know what constraints there are on aux file existence/contents/etc. seems potentially problematic. That said, it is only by-obscurity so some future mechanism would really be needed to stop apps from messing with each other's files, and perhaps at that point the api to list all origins could become an admin/superuser call.)
  • I added tests for trying to re-upload the same aux file and to test that the list api returns the right number of files. I did not add any test to assure that aux files can be added to non-tabular types - just seems odd to have a test that is mostly about checking that an old constraint has been removed. (If desired, the csv DataFile used in the test could be changed to a json one.)
  • when to move from experimental? - not sure. This PR makes things more robust but I wouldn't be surprised if there are more changes w.r.t. UI and/or organization of aux files, etc. so I think it's fair to stay experimental as a warning that things may still be in flux.
  • re 404: I think this was discussed when the updated error handling classes were added and I thought there was some reason it wasn't changed (no agreement on better wording?). Not sure - and perhaps it was just about merging the original changes without waiting for further updates. In any case, I'd agree that this is confusing, and that a better message - perhaps even just some more RESTful generic message - might help - i.e. the resource represented by this URL does not exist.

Assuming the IT tests pass, I think this is ready to go forward again unless the changes raise further points.

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this pull request Nov 19, 2021
@pdurbin
Copy link
Member

pdurbin commented Nov 19, 2021

@qqmyers thanks for all the changes. I haven't looked through them all (and probably will) but I wanted to give you a heads up that AuxiliaryFilesIT is failing with a 500 error.

@qqmyers
Copy link
Member Author

qqmyers commented Nov 20, 2021

Fixed the tests. Also discovered that the original code didn't calculate checksums correctly. Code is now fixed, but all existing entries have bad checksum values. They could be recalculated from the files themselves, but it's not something that can be addressed just by a flyway script. Removing/re-adding files via api would probably work, so a script to find/list all entries and then delete/re-add could be made. Since QDR is only using this in testing, I think Harvard is probably the only place with many aux file entries that need updating.

@djbrooke
Copy link
Contributor

Thanks @qqmyers and @pdurbin. I'll move this back over to review.

@coveralls
Copy link

coveralls commented Nov 22, 2021

Coverage Status

Coverage decreased (-0.01%) to 18.939% when pulling 52ee0f6 on QualitativeDataRepository:IQSS/8235-auxfile_enhancements into c91689a on IQSS:develop.

storageIO.saveInputStreamAsAux(fileInputStream, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()) );
if (storageIO.isAuxObjectCached(auxExtension)) {
throw new ClientErrorException("Auxiliary file already exists", Response.Status.CONFLICT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

409 (Conflict) seems slightly exotic for our code base but I guess it's fine.

"For example, you may get a 409 response when uploading a file which is older than the one already on the server resulting in a version control conflict." https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/409

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I went ahead and merged the latest from develop.

@pdurbin pdurbin removed their assignment Nov 22, 2021
@kcondon kcondon self-assigned this Nov 24, 2021
@kcondon kcondon merged commit 1c08b81 into IQSS:develop Nov 24, 2021
@djbrooke djbrooke added this to the 5.9 milestone Nov 24, 2021
@qqmyers qqmyers deleted the IQSS/8235-auxfile_enhancements branch May 17, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: Allow the Deletion of Auxiliary Files via API
6 participants