Skip to content

Commit

Permalink
Merge pull request #8237 from QualitativeDataRepository/IQSS/8235-aux…
Browse files Browse the repository at this point in the history
…file_enhancements

Auxiliary File API Enhancements
  • Loading branch information
kcondon authored Nov 24, 2021
2 parents c91689a + 52ee0f6 commit 1c08b81
Show file tree
Hide file tree
Showing 8 changed files with 316 additions and 39 deletions.
14 changes: 14 additions & 0 deletions doc/release-notes/8235-auxiliaryfileAPIenhancements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API:
- Auxiliary files can now also be associated with non-tabular files
- Improved error reporting
- The API will block attempts to create a duplicate auxiliary file
- Delete and list-by-original calls have been added
- Bug fix: correct checksum recorded for aux file

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the [OpenDP Project](https://opendp.org). If the API endpoints are not needed they can be blocked.

### Major Use Cases

(note for release time - expand on the items above, as use cases)
40 changes: 36 additions & 4 deletions doc/sphinx-guides/source/developers/aux-file-support.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Auxiliary File Support
======================

Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (opendp.org). In future versions, this approach will likely become more broadly used and supported.
Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the `OpenDP project <https://opendp.org>`_. In future versions, this approach will likely become more broadly used and supported.

Adding an Auxiliary File to a Datafile
--------------------------------------
Expand All @@ -16,12 +16,12 @@ To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and
export FORMAT_VERSION='v1'
export TYPE='DP'
export SERVER_URL=https://demo.dataverse.org
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file.

Downloading an Auxiliary File that belongs to a Datafile
Downloading an Auxiliary File that Belongs to a Datafile
--------------------------------------------------------
To download an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:
Expand All @@ -33,5 +33,37 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
Listing Auxiliary Files for a Datafile by Origin
------------------------------------------------
To list auxiliary files, specify the primary key of the datafile (FILE_ID), and the origin associated with the auxiliary files to list (the application/entity that created them).

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export FILE_ID='12345'
export SERVER_URL=https://demo.dataverse.org
export ORIGIN='app1'
curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"
You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin.

Deleting an Auxiliary File that Belongs to a Datafile
-----------------------------------------------------
To delete an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
curl -X DELETE "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
5 changes: 4 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/AuxiliaryFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type = :type"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesWithoutType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),})
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByOrigin",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.origin = :origin"),
})
@NamedNativeQueries({
@NamedNativeQuery(name = "AuxiliaryFile.findAuxiliaryFileTypes",
query = "select distinct type from auxiliaryfile where datafile_id = ?1")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,29 @@
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.ejb.Stateless;
import javax.inject.Named;
import javax.persistence.EntityManager;
import javax.persistence.NoResultException;
import javax.persistence.PersistenceContext;
import javax.persistence.Query;
import javax.persistence.TypedQuery;
import javax.ws.rs.ClientErrorException;
import javax.ws.rs.InternalServerErrorException;
import javax.ws.rs.ServerErrorException;
import javax.ws.rs.core.Response;

import org.apache.tika.Tika;

/**
Expand Down Expand Up @@ -62,8 +71,8 @@ public AuxiliaryFile save(AuxiliaryFile auxiliaryFile) {
* @return success boolean - returns whether the save was successful
*/
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic, String type) {
StorageIO<DataFile> storageIO =null;

StorageIO<DataFile> storageIO = null;
AuxiliaryFile auxFile = new AuxiliaryFile();
String auxExtension = formatTag + "_" + formatVersion;
try {
Expand All @@ -73,12 +82,20 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
// If the db fails for any reason, then rollback
// by removing the auxfile from storage.
storageIO = dataFile.getStorageIO();
MessageDigest md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
DigestInputStream di
= new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(fileInputStream, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()) );
if (storageIO.isAuxObjectCached(auxExtension)) {
throw new ClientErrorException("Auxiliary file already exists", Response.Status.CONFLICT);
}
MessageDigest md;
try {
md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
} catch (NoSuchAlgorithmException e) {
logger.severe("NoSuchAlgorithmException for system fixity algorithm: " + systemConfig.getFileFixityChecksumAlgorithm().toString());
throw new InternalServerErrorException();
}
DigestInputStream di = new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(di, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()));

Tika tika = new Tika();
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension)));
Expand All @@ -87,20 +104,20 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
auxFile.setOrigin(origin);
auxFile.setIsPublic(isPublic);
auxFile.setType(type);
auxFile.setDataFile(dataFile);
auxFile.setDataFile(dataFile);
auxFile.setFileSize(storageIO.getAuxObjectSize(auxExtension));
auxFile = save(auxFile);
} catch (IOException ioex) {
logger.info("IO Exception trying to save auxiliary file: " + ioex.getMessage());
return null;
} catch (Exception e) {
logger.severe("IO Exception trying to save auxiliary file: " + ioex.getMessage());
throw new InternalServerErrorException();
} catch (ServerErrorException e) {
// If anything fails during database insert, remove file from storage
try {
storageIO.deleteAuxObject(auxExtension);
} catch(IOException ioex) {
logger.info("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
return null;
} catch (IOException ioex) {
logger.warning("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
}
throw e;
}
return auxFile;
}
Expand All @@ -115,13 +132,43 @@ public AuxiliaryFile lookupAuxiliaryFile(DataFile dataFile, String formatTag, St
try {
AuxiliaryFile retVal = (AuxiliaryFile)query.getSingleResult();
return retVal;
} catch(Exception ex) {
} catch(NoResultException nr) {
return null;
}
}


public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile, String origin) {

TypedQuery<AuxiliaryFile> query;
if (origin == null) {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
} else {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByOrigin", AuxiliaryFile.class);
query.setParameter("origin", origin);
}
query.setParameter("dataFileId", dataFile.getId());

List<AuxiliaryFile> retVal = query.getResultList();
return retVal;
}

public void deleteAuxiliaryFile(DataFile dataFile, String formatTag, String formatVersion) throws IOException {
AuxiliaryFile af = lookupAuxiliaryFile(dataFile, formatTag, formatVersion);
if (af == null) {
throw new FileNotFoundException();
}
em.remove(af);
StorageIO<?> storageIO;
storageIO = dataFile.getStorageIO();
String auxExtension = formatTag + "_" + formatVersion;
if (storageIO.isAuxObjectCached(auxExtension)) {
storageIO.deleteAuxObject(auxExtension);
}
}

public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down Expand Up @@ -151,13 +198,13 @@ public List<String> findAuxiliaryFileTypes(DataFile dataFile, boolean inBundle)
}

public List<String> findAuxiliaryFileTypes(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes");
TypedQuery<String> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes", String.class);
query.setParameter(1, dataFile.getId());
return query.getResultList();
}

public List<AuxiliaryFile> findAuxiliaryFilesByType(DataFile dataFile, String typeString) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
return query.getResultList();
Expand All @@ -167,7 +214,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
List<AuxiliaryFile> otherAuxFiles = new ArrayList<>();
List<String> otherTypes = findAuxiliaryFileTypes(dataFile, false);
for (String typeString : otherTypes) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
List<AuxiliaryFile> auxFiles = query.getResultList();
Expand All @@ -178,7 +225,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
}

public List<AuxiliaryFile> findAuxiliaryFilesWithoutType(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down
Loading

0 comments on commit 1c08b81

Please sign in to comment.