Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary File API Enhancements #8237

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/release-notes/8235-auxiliaryfileAPIenhancements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API:
- Auxiliary files can now also be associated with non-tabular files
- Improved error reporting
- The API will block attempts to create a duplicate auxiliary file
- Delete and list-by-original calls have been added
- Bug fix: correct checksum recorded for aux file

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the [OpenDP Project](https://opendp.org). If the API endpoints are not needed they can be blocked.

### Major Use Cases

(note for release time - expand on the items above, as use cases)
40 changes: 36 additions & 4 deletions doc/sphinx-guides/source/developers/aux-file-support.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Auxiliary File Support
======================

Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (opendp.org). In future versions, this approach will likely become more broadly used and supported.
Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the `OpenDP project <https://opendp.org>`_. In future versions, this approach will likely become more broadly used and supported.

Adding an Auxiliary File to a Datafile
--------------------------------------
Expand All @@ -16,12 +16,12 @@ To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and
export FORMAT_VERSION='v1'
export TYPE='DP'
export SERVER_URL=https://demo.dataverse.org

curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"

You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file.

Downloading an Auxiliary File that belongs to a Datafile
Downloading an Auxiliary File that Belongs to a Datafile
--------------------------------------------------------
To download an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:
Expand All @@ -33,5 +33,37 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'

curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"

Listing Auxiliary Files for a Datafile by Origin
------------------------------------------------
To list auxiliary files, specify the primary key of the datafile (FILE_ID), and the origin associated with the auxiliary files to list (the application/entity that created them).

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export FILE_ID='12345'
export SERVER_URL=https://demo.dataverse.org
export ORIGIN='app1'

curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"

You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin.

Deleting an Auxiliary File that Belongs to a Datafile
-----------------------------------------------------
To delete an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'

curl -X DELETE "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"


5 changes: 4 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/AuxiliaryFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type = :type"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesWithoutType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),})
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByOrigin",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.origin = :origin"),
})
@NamedNativeQueries({
@NamedNativeQuery(name = "AuxiliaryFile.findAuxiliaryFileTypes",
query = "select distinct type from auxiliaryfile where datafile_id = ?1")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,29 @@
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.ejb.Stateless;
import javax.inject.Named;
import javax.persistence.EntityManager;
import javax.persistence.NoResultException;
import javax.persistence.PersistenceContext;
import javax.persistence.Query;
import javax.persistence.TypedQuery;
import javax.ws.rs.ClientErrorException;
import javax.ws.rs.InternalServerErrorException;
import javax.ws.rs.ServerErrorException;
import javax.ws.rs.core.Response;

import org.apache.tika.Tika;

/**
Expand Down Expand Up @@ -62,8 +71,8 @@ public AuxiliaryFile save(AuxiliaryFile auxiliaryFile) {
* @return success boolean - returns whether the save was successful
*/
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic, String type) {
StorageIO<DataFile> storageIO =null;

StorageIO<DataFile> storageIO = null;
AuxiliaryFile auxFile = new AuxiliaryFile();
String auxExtension = formatTag + "_" + formatVersion;
try {
Expand All @@ -73,12 +82,20 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
// If the db fails for any reason, then rollback
// by removing the auxfile from storage.
storageIO = dataFile.getStorageIO();
MessageDigest md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
DigestInputStream di
= new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(fileInputStream, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()) );
if (storageIO.isAuxObjectCached(auxExtension)) {
throw new ClientErrorException("Auxiliary file already exists", Response.Status.CONFLICT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

409 (Conflict) seems slightly exotic for our code base but I guess it's fine.

"For example, you may get a 409 response when uploading a file which is older than the one already on the server resulting in a version control conflict." https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/409

}
MessageDigest md;
try {
md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
} catch (NoSuchAlgorithmException e) {
logger.severe("NoSuchAlgorithmException for system fixity algorithm: " + systemConfig.getFileFixityChecksumAlgorithm().toString());
throw new InternalServerErrorException();
}
DigestInputStream di = new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(di, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()));

Tika tika = new Tika();
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension)));
Expand All @@ -87,20 +104,20 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
auxFile.setOrigin(origin);
auxFile.setIsPublic(isPublic);
auxFile.setType(type);
auxFile.setDataFile(dataFile);
auxFile.setDataFile(dataFile);
auxFile.setFileSize(storageIO.getAuxObjectSize(auxExtension));
auxFile = save(auxFile);
} catch (IOException ioex) {
logger.info("IO Exception trying to save auxiliary file: " + ioex.getMessage());
return null;
} catch (Exception e) {
logger.severe("IO Exception trying to save auxiliary file: " + ioex.getMessage());
throw new InternalServerErrorException();
} catch (ServerErrorException e) {
// If anything fails during database insert, remove file from storage
try {
storageIO.deleteAuxObject(auxExtension);
} catch(IOException ioex) {
logger.info("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
return null;
} catch (IOException ioex) {
logger.warning("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
}
throw e;
}
return auxFile;
}
Expand All @@ -115,13 +132,43 @@ public AuxiliaryFile lookupAuxiliaryFile(DataFile dataFile, String formatTag, St
try {
AuxiliaryFile retVal = (AuxiliaryFile)query.getSingleResult();
return retVal;
} catch(Exception ex) {
} catch(NoResultException nr) {
return null;
}
}


public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile, String origin) {

TypedQuery<AuxiliaryFile> query;
if (origin == null) {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
} else {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByOrigin", AuxiliaryFile.class);
query.setParameter("origin", origin);
}
query.setParameter("dataFileId", dataFile.getId());

List<AuxiliaryFile> retVal = query.getResultList();
return retVal;
}

public void deleteAuxiliaryFile(DataFile dataFile, String formatTag, String formatVersion) throws IOException {
AuxiliaryFile af = lookupAuxiliaryFile(dataFile, formatTag, formatVersion);
if (af == null) {
throw new FileNotFoundException();
}
em.remove(af);
StorageIO<?> storageIO;
storageIO = dataFile.getStorageIO();
String auxExtension = formatTag + "_" + formatVersion;
if (storageIO.isAuxObjectCached(auxExtension)) {
storageIO.deleteAuxObject(auxExtension);
}
}

public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down Expand Up @@ -151,13 +198,13 @@ public List<String> findAuxiliaryFileTypes(DataFile dataFile, boolean inBundle)
}

public List<String> findAuxiliaryFileTypes(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes");
TypedQuery<String> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes", String.class);
query.setParameter(1, dataFile.getId());
return query.getResultList();
}

public List<AuxiliaryFile> findAuxiliaryFilesByType(DataFile dataFile, String typeString) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
return query.getResultList();
Expand All @@ -167,7 +214,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
List<AuxiliaryFile> otherAuxFiles = new ArrayList<>();
List<String> otherTypes = findAuxiliaryFileTypes(dataFile, false);
for (String typeString : otherTypes) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
List<AuxiliaryFile> auxFiles = query.getResultList();
Expand All @@ -178,7 +225,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
}

public List<AuxiliaryFile> findAuxiliaryFilesWithoutType(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down
Loading