Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent label (filename) from being null #6821 #6824

Merged
merged 1 commit into from
Apr 20, 2020
Merged

prevent label (filename) from being null #6821 #6824

merged 1 commit into from
Apr 20, 2020

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Apr 15, 2020

What this PR does / why we need it:

Tests are failing, but more importantly, uningest stopped working (500 errors) probably due to 786593e or other commits in pull request #6774.

Which issue(s) this PR closes:

Closes #6821

Special notes for your reviewer:

The getOriginalFileName method only works on tabular files so we need call it early before the file becomes non-tabular. For non-tabular files, null is returned which causes a ConstraintViolationException because label (filename) can't be null.

Suggestions on how to test this:

  • On the "develop" branch, check if uningest works. I predict 500 errors.
  • On this branch, check uningest.

Does this PR introduce a user interface change?:

No.

Is there a release notes update needed for this change?:

No.

Additional documentation:

None.

The getOriginalFileName method only works on tabular files so call it
early before the file becomes non-tabular.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 19.626% when pulling b19353d on 6821-tests into 75f3b69 on develop.

@sekmiller sekmiller self-assigned this Apr 16, 2020
Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, Phil. Thank you.

@kcondon kcondon self-assigned this Apr 16, 2020
@kcondon
Copy link
Contributor

kcondon commented Apr 17, 2020

Saw this error on develop:
[2020-04-17T08:26:52.943-0400] [glassfish 4.1] [WARNING] [AS-EJB-00056] [javax.enterprise.ejb.container] [tid: _ThreadID=31 _ThreadName=http-listener-1(2)] [timeMillis: 1587126412943] [levelValue: 900] [[
A system exception occurred during an invocation on EJB MapLayerMetadataServiceBean, method: public edu.harvard.iq.dataverse.MapLayerMetadata edu.harvard.iq.dataverse.MapLayerMetadataServiceBean.findMetadataByDatafile(edu.harvard.iq.dataverse.DataFile)]]

@kcondon
Copy link
Contributor

kcondon commented Apr 17, 2020

@pdurbin This is failing for me with a different error in your branch. From examining the dataset, it appears that it deletes the original file but leaves the tab file in place.

[2020-04-17T08:38:55.682-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand] [tid: _ThreadID=30 _ThreadName=http-listener-1(1)] [timeMillis: 1587127135682] [levelValue: 900] [[
Failed to open StorageIO for file1://1718818cfc2-e272c594e258 attempting to revert tabular ingest aborting. ((Aux file does not exist.)]]

[2020-04-17T08:38:55.686-0400] [glassfish 4.1] [SEVERE] [] [edu.harvard.iq.dataverse.api.AbstractApiBean] [tid: _ThreadID=30 _ThreadName=http-listener-1(1)] [timeMillis: 1587127135686] [levelValue: 1000] [[
Error while executing command edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand@59112063
edu.harvard.iq.dataverse.engine.command.exception.CommandException: Failed to open StorageIO for file1://1718818cfc2-e272c594e258 attempting to revert tabular ingest aborting. ((Aux file does not exist.)
at edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand.executeImpl(UningestFileCommand.java:82)
at edu.harvard.iq.dataverse.engine.command.AbstractVoidCommand.execute(AbstractVoidCommand.java:29)
:

@kcondon kcondon assigned pdurbin and unassigned kcondon Apr 17, 2020
@pdurbin
Copy link
Member Author

pdurbin commented Apr 17, 2020

@kcondon I'm not sure what to say. Uningest is working for me.

Here's how my Stata file looks before I try to uningest it:

Screen Shot 2020-04-17 at 3 14 57 PM

Here's how it looks after I uningest it:

Screen Shot 2020-04-17 at 3 15 14 PM

I'm able to download the file just fine:

Screen Shot 2020-04-17 at 3 15 25 PM

When I look at the file system I see a single file and the MD5 matches 50by1000.dta which is the file I'm testing with.

murphy:FK2 pdurbin$ cd 6TVFWB
murphy:6TVFWB pdurbin$ ls
1718988a1ed-d189f5dcfb18
murphy:6TVFWB pdurbin$ file 1718988a1ed-d189f5dcfb18 
1718988a1ed-d189f5dcfb18: data
murphy:6TVFWB pdurbin$ md5 1718988a1ed-d189f5dcfb18 
MD5 (1718988a1ed-d189f5dcfb18) = 003b8c67fbdfa6df31c0e43e65b93f0e
murphy:6TVFWB pdurbin$ md5 ~/Downloads/50by1000.dta 
MD5 (/Users/pdurbin/Downloads/50by1000.dta) = 003b8c67fbdfa6df31c0e43e65b93f0e
murphy:6TVFWB pdurbin$ pwd
/usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/6TVFWB
murphy:6TVFWB pdurbin$ 

The only thing that is odd to me is the following message in server.log when I uningest the file:

[2020-04-17T15:14:28.103-0400] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.api.Files] [tid: _ThreadID=87 _ThreadName=http-thread-pool::http-listener-1(5)] [timeMillis: 1587150868103] [levelValue: 900] [[
Dataset publication finalization: exception while exporting:No released version for dataset doi:10.5072/FK2/6TVFWB]]

But I don't think it's related.

Any tips on how I can reproduce the error you're seeing?

@sekmiller do you have any ideas on this?

@kcondon
Copy link
Contributor

kcondon commented Apr 17, 2020

@pdurbin Do you have a storage name and label that is not the default "file" but something like "file1" ?
I'm wondering whether this section of code does not deal with multi store names well.

@pdurbin
Copy link
Member Author

pdurbin commented Apr 17, 2020

@kcondon here's what I have (just "file"):

<jvm-options>-Ddataverse.files.file.type=file</jvm-options>
<jvm-options>-Ddataverse.files.file.label=file</jvm-options>
<jvm-options>-Ddataverse.files.file.directory=/usr/local/payara5/glassfish/domains/domain1/files</jvm-options>

I'm also attaching my whole domain.xml in case it's helpful: domain.xml.txt

I am not set up for multiple stores. I haven't even played with them yet, to be honest.

@kcondon
Copy link
Contributor

kcondon commented Apr 17, 2020

@pdurbin, you are set up for multi store by adding type and label.

so change them to be this:
-Ddataverse.files.file1.type=file
-Ddataverse.files.file1.label=file1
-Ddataverse.files.file1.directory=/usr/local/payara5/glassfish/domains/domain1/files

and

-Ddataverse.files.storage-driver-id=file
to be
-Ddataverse.files.storage-driver-id=file1

@pdurbin
Copy link
Member Author

pdurbin commented Apr 17, 2020

@kcondon I tried to set it up but I must have done it wrong. I got stuff like this and couldn't download files:

  Could not find storage driver for: file://10.5072/FK2/BTX79J]]
  Could not find storage driver for: file://10.5072/FK2/8E5W0D]]
  Could not find storage driver for: file://171842e4c7e-bd58ae54695e]]
  Could not find storage driver for: file://171842e4c7e-bd58ae54695e]]
  Could not find storage driver for: file://10.5072/FK2/VIYE9K]]
  Could not find storage driver for: file://10.5072/FK2/XYU6MD]]

Do I need to get multistore working in order to move this pull request forward?

@kcondon
Copy link
Contributor

kcondon commented Apr 17, 2020

@pdurbin Yeah, I was also trying on dataverse-internal. It had just file and uningest worked there but when I tried to name it file1 as I told you, I'm seeing the same problem. I think this might be a multistore problem. Essentially we're just changing the name of something and the code assumes a default of file. We'd encountered a number of functions that had that logic and thought we'd got them all.

So this is what I think might be happening:

  1. the uningest endpoint has some multistore issue around storage not being called file, eg. file1.
  2. multistore config, when the default storage driver is of type file, wants to be called file, not file1, which is a separate bug and needs to be confirmed (I believe this was found and fixed in testing but is back).

In summary, you probably have fixed the out of the box issue where folks do not rename their storage driver but the endpoint will not work properly in other configs.

I'm off now but will pick it up again on Monday and try to tease out the above issues.

@kcondon
Copy link
Contributor

kcondon commented Apr 20, 2020

@pdurbin, I was able to get it to work on internal with both file and file1 but it was seemingly intermittent.
For the record, I restored the driver settings to be file from file1 but then duplicated all those settings for file1 except for storage driver. That allowed both store types to work from a file upload perspective.

@pdurbin
Copy link
Member Author

pdurbin commented Apr 20, 2020

@kcondon ok. From my perspective, I picked up this issue to (hopefully) get a test passing but discovered that uningest is broken. I see above that you tested the develop branch. Would it be good for me to create an issue about uningest being broken? It failed every time for me before I made the change in this pull request.

@djbrooke
Copy link
Contributor

@pdurbin @kcondon We can handle it here or in another issue, if the latter I'll be happy to prioritize. Thank you!

@kcondon
Copy link
Contributor

kcondon commented Apr 20, 2020

@pdurbin I'm working on identifying what the issue is and I will open any additional ticket.
Hopefully will be done with testing shortly. Btw, the reason simply renaming the storage driver didn't work in our test case was because root dv had a record of the driver being file so when we renamed it to file1 it couldn't find it. So that is working but maybe not obvious.

@kcondon
Copy link
Contributor

kcondon commented Apr 20, 2020

@pdurbin the good news is that it appears everything is working, including multistore. the better news is I am now merging this branch. Happy Monday!

@kcondon kcondon merged commit 6352b89 into develop Apr 20, 2020
@kcondon kcondon deleted the 6821-tests branch April 20, 2020 16:22
@kcondon kcondon assigned kcondon and unassigned sekmiller Apr 20, 2020
@djbrooke djbrooke added this to the Dataverse 5 milestone Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

edu.harvard.iq.dataverse.api.FilesIT.testUningestFileViaApi failing intermittently
5 participants