Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop'
Browse files Browse the repository at this point in the history
# Conflicts:
#	README.md
  • Loading branch information
smierz committed Nov 5, 2021
2 parents 184113c + 7ea1f5d commit 00ec451
Show file tree
Hide file tree
Showing 62 changed files with 892 additions and 351 deletions.
19 changes: 17 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,32 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

## [1.2.0] - 2021-11-05
### Added
- add queries for Crossref work and personPlusWorks
- add tests for all controllers
- add new parameter for ROR affiliation to ORCID query

### Changed
- refactor all controllers so GET-path matches resource path
- improve README
- remove Dockerfiles and instead use Spring Docker capabilities
- use Spring profiles to determine how and where data is written


## [1.1.0] - 2021-06-22
### Added
- query ORCID for person and current employees at ROR organization
- refactor all queries to follow three steps: source, extraction, mapping to vivo-rdf
- check GraphQL queries for response code 200 and additionally if response body contains error messages (no data)
- document methods in swagger-UI and remove response section

### Changed
- refactor all queries to follow three steps: source, extraction, mapping to vivo-rdf
- make all controller use HTTP-Get requests
- centralize input validation
- export to VIVO in chunks
- simplify error handling
- document methods in swagger-UI and remove response section


## [Renamed Project] - 2021-05-25
As new datsources were integrated and the name datacitecommons2vivo was not reflecting
Expand Down
14 changes: 0 additions & 14 deletions Dockerfile

This file was deleted.

23 changes: 0 additions & 23 deletions DockerfileBuild

This file was deleted.

99 changes: 43 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,66 @@
![Java](https://img.shields.io/badge/java-%23ED8B00.svg?style=for-the-badge&logo=java&logoColor=white)
![Spring](https://img.shields.io/badge/spring-%236DB33F.svg?style=for-the-badge&logo=spring&logoColor=white)
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)

[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
![Open Source Love](https://badges.frapsoft.com/os/v3/open-source.svg?v=102)

## generate2vivo
generate2vivo is an extensible Data Ingest Tool for the open source software VIVO.
It currently queries metadata from Datacite Commons, ROR and ORCID
generate2vivo is an extensible Data Ingest and Transformation Tool for the open source software VIVO.
It currently contains queries for metadata from [Datacite Commons](https://commons.datacite.org/), [Crossref](https://www.crossref.org/), [ROR](https://ror.org/) and [ORCID](https://orcid.org/)
and maps them to the VIVO ontology using [sparql-generate](https://ci.mines-stetienne.fr/sparql-generate/index.html).
The resulting RDF data can be exported to a VIVO instance directly or returned in a HTTP response.
The resulting RDF data can be exported to a VIVO instance (or any SPARQL endpoint) directly or it can be returned in JSON-LD.

<hr style="border:1px solid gray"> </hr>

- [Available queries](#available-queries)
+ [Datacite Commons](#datacite-commons)
+ [ROR](#ror)
+ [ORCID](#orcid)
- [Features](#features)
- [Installation](#installation)
- [Run in Command Line](#run-in-command-line)
- [Extensible](#extensible)
- [Wiki resources](#wiki-resources)


### Features
![generate2vivo features](https://raw.githubusercontent.com/wiki/vivo-community/generate2vivo/images/generate2vivo.png)

### Available queries
The data sources and queries that are currently available are listed below.
Starting point was the sparql-generate library that we use as an engine for our transformations, which are
defined in different GENERATE queries. Notice that code and queries are separate, this allows users
* to write or change queries without going into the code
* to reuse queries (meaning you can dump the code and only use the queries for example with the command line
tool provided on the sparql-generate website)
* to reuse code (meaning you can dump the queries if the data sources are not interesting for you and use only the code with your own queries)

##### Datacite Commons
For [Datacite Commons](https://commons.datacite.org/) the following queries are available:
* `organization` : This method gets data about an organization by passing a ROR id.
* `organizationPlusPeople`: This method gets data about an organization and its affiliated people by passing a ROR id.
* `organizationPlusPeoplePlusPublications`:This method gets data about an organization and its affiliated people and their respective publications by passing a ROR id.
* `person`: This method gets data about a person by passing an ORCID id.
* `personPlusPublications`: This method gets data about a person and their publications by passing an ORCID id.
* `work`: This method gets data about a work by passing an DOI.

##### ROR
For [ROR](https://ror.org/) there are 2 queries available:
* `organization`: This method gets data about an organization by passing a ROR id.
* `organizationPlusChildren`: This method gets data about an organization and all their sub-organizations by passing a ROR id.
In addition we gave the application a REST API so other programs or services can communicate with the application using HTTP requests which allows generate2vivo to be integrated in an existing data ingest process.

##### ORCID
For [ORCID](https://orcid.org/) the following queries are available:
* `personPlusWorks`: This method gets data about a person and their works by passing an ORCID id.
* `currentEmployeesPlusWorks`: This method gets data about an organization's current employees and their works by passing a ROR id.

On the other side we added output functionality that allows you to export the generated data either directly into a VIVO instance via its SPARQL API or alternatively if you want to check the data before importing or are using a messaging service like Apache Kafka you can return the generated data as
JSON-LD and do some post-processing with it.


### Installation
0. Prerequisites: You need to have maven and a JDK for Java 11 installed.
1. Clone the repository to a local folder using `git clone https://github.com/vivo-community/generate2vivo.git`
2. Change into the folder where the repository has been cloned.
3. Open `src/main/resources/application.properties` and change your VIVO details accordingly.
2. Change into the folder where the repository has been cloned.
3. Open `src/main/resources/application.properties` and change your VIVO details accordingly.
If you don't provide a vivo.url, vivo.email or vivo.password, the application will not import the mapped data to VIVO but return the triples in format JSON-LD.
3. Run the application:
* If you have maven and a JDK for Java 11 installed, you can run the application directly via `mvn spring-boot:run`.

* Alternatively you can compile & run the application in Docker (with or without Java setup):
```dockerfile
# with Java setup:
mvn package
docker build -t g2v .
docker run -p 9000:9000 -t g2v

# without Java setup
docker build -f DockerfileBuild -t g2v .
docker run -p 9000:9000 -t g2v
4. Run the application:
* You can run the application directly via `mvn spring-boot:run`.
* Or alternatively you can run the application in Docker:
```bash
mvn spring-boot:build-image
docker run -p 9000:9000 generate2vivo:latest
```

5. A minimal swagger-ui will be available at `http://localhost:9000/swagger-ui/`.

### Run in Command Line
Alternatively you can run the queries from the command line using the sparql-generate executable JAR-file.
All queries are placed in folder `src/main/resources/sparqlg` and come with a `sparql-generate-conf.json`.
Its structure and use are explained in detail on the [sparql-generate website](https://ci.mines-stetienne.fr/sparql-generate/language-cli.html).
### Wiki resources
Additional resources are available in the GitHub wiki, e.g.

### Extensible
The software is easily extensible, meaning you can add and remove data sources.
* _[data sources & queries](https://github.com/vivo-community/generate2vivo/wiki/data-sources-&-queries)_ :
A detailed overview of the data sources and their queries.

For example, if you are not interested in using Datacite Commons, just remove the folder from `src/main/resources/sparqlg`
and the respective controller in the package `eu.tib.controller`.
* _[using generate2vivo](https://github.com/vivo-community/generate2vivo/wiki/using-generate2vivo)_: A short user tutorial with screenshots on how to use the swagger-UI to execute a query.

On the other hand, if you would like to add a data source:
* add a folder with your queries under `src/main/resources/sparqlg` and include a `sparql-generate-conf.json`
(its structure is described on the [sparql-generate website](https://ci.mines-stetienne.fr/sparql-generate/language-cli.html)).
* add a controller in `eu.tib.controller` that retrieves your input and calls your query like `responseService.buildResponse(queryid, input)`
* the connection between controller and the according query is made by the `queryid`. You need to supply the path within the resources folder to your `sparql-generate-conf.json`.
* put your input into a Map and every key-value-pair will be available in your query as a binding, where ?key will be replaced with value.
* _[run queries in cmd line](https://github.com/vivo-community/generate2vivo/wiki/install-&-run#run-in-command-line)_ :
An alternative way of running the queries via cmd line with the provided Jar from sparql-generate.

* _[dev guide](https://github.com/vivo-community/generate2vivo/wiki/dev-guide)_ : resources specifically for developers, e.g. on how to add a data source or query, how to put variables from code into the query.
16 changes: 9 additions & 7 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

<groupId>eu.tib</groupId>
<artifactId>generate2vivo</artifactId>
<version>1.1.0</version>
<version>1.2.0</version>
<description>Extensible Data Ingest Tool for VIVO. Contains data sources like Datacite Commons, ORCID and ROR.</description>

<properties>
Expand Down Expand Up @@ -79,19 +79,21 @@
<version>6.0.16.Final</version>
</dependency>

<dependency>
<groupId>com.graphql-java</groupId>
<artifactId>graphql-java</artifactId>
<scope>test</scope>
<version>16.2</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<layers>
<enabled>true</enabled>
</layers>
<image>
<name>generate2vivo</name>
</image>
</configuration>
</plugin>
</plugins>
</build>
Expand Down
63 changes: 63 additions & 0 deletions src/main/java/eu/tib/controller/CrossrefController.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package eu.tib.controller;

import eu.tib.controller.validation.InputValidator;
import eu.tib.service.WriteResultService;
import io.swagger.annotations.Api;
import io.swagger.annotations.ApiOperation;
import io.swagger.annotations.ApiParam;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import javax.validation.Valid;
import javax.validation.constraints.Pattern;
import java.util.Map;

@Slf4j
@Validated
@RestController
@RequestMapping(value = "/crossref")
@Api(value = "Controller", tags = {"crossref"})
public class CrossrefController {

@Autowired
private WriteResultService wrService;

@ApiOperation(value = "Retrieve data about a person and their works from Crossref", notes = "This method gets data about a person and their works from Crossref by passing an ORCID id.")
@GetMapping(value = "/personPlusWorks", produces = "application/json")
public ResponseEntity<String> getPersonPlusWorks(
@Valid @Pattern(regexp = InputValidator.orcid)
@ApiParam("Complete Orcid URL consisting of https://orcid.org/ plus id")
@RequestParam String orcid,
@ApiParam("email for polite requests")
@RequestParam(defaultValue = "") String email) {

final String id = "sparqlg/crossref/personPlusWorks";
log.info("Incoming Request for " + id + " with orcid: " + orcid);

final String CURSOR = "*"; // starting cursor for pagination
return wrService.execute(id, Map.of("orcid", orcid,
"polite_mail", email,
"cursor", CURSOR));
}

@ApiOperation(value = "Retrieve data about a work from Crossref", notes = "This method gets data about a work from Crossref by passing an DOI.")
@GetMapping(value = "/work", produces = "application/json")
public ResponseEntity<String> getWork(
@Valid @Pattern(regexp = InputValidator.doi)
@ApiParam("DOI of the publication")
@RequestParam String doi,
@ApiParam("email for polite requests")
@RequestParam(defaultValue = "") String email) {

final String id = "sparqlg/crossref/work";
log.info("Incoming Request for " + id + " with doi: " + doi);

return wrService.execute(id, Map.of("doi", doi, "polite_mail", email));
}
}
Loading

0 comments on commit 00ec451

Please sign in to comment.