The results of ELFIE are quite promising in that all major issues encountered are well known and subject of ongoing work. Technologies tested, where mature and well used, worked well for the purpose and it seems that the future for an easily adoptable resource-oriented linked web of data resources is bright. The following sections outline issues encountered and describe recommended prospective path forward to overcome them.
OGC service interfaces were not the primary focus of ELFIE. However, along the way, both WFS and SOS services were discussed and tested. The major issues encountered were around querying these services by feature identifier. There is something of a chicken and egg situation in that, in most cases, http feature identifiers have not been published so are not available to use to query a web service, but the OGC services also do not lend themselves to publishing feature identifiers.
If queries to WFS or SOS services are used as un-structured links that return representations of features or results of observations, they can be used readily in linked data systems. But to be used with embedded feature identifiers or with more sophisticated content-negotiated retrievals, changes to the web service APIs will be needed.
The work ongoing with WFS appears promising with regard to the issues outlined above. The ability to query by a specific id that can be an http url is supported and is a clear solution that would solve the problem illustrated here. e.g. something like: http://server.com/wfs/getfeature?id=http://feature.x.y.x
A similar discussion has taken place in sensor-related services. SOS V2.0 specification (OGC 12-006), discusses identifier handling in Annex B and promotes reuse of a global identifier (here gml:identifier) as opposed to local identifier (gml:id) when querying services. The SensorThings API Part 1 implementations use their own internal identifier for REST API operations (ex : http://sensorthings.brgm-rec.fr/SensorThingsGroundWater/v1.0/Observations(1752377)). Both services' endpoints (SOS, SensorThingsAPI) were tested, passing them URIs for featuresOfInterest, observed properties, process, observation identifier, etc. to use them in a linked data context. Resulting queries work but further implementation is required to improve usability of the approach.
A JSON-LD response from these services is not available. While response rewriters (to convert existing encodings to a linked data form) have been tested on top of the SensorThings API (JSON to JSON-LD). It may be useful to include JSON-LD capabilities in the standards themselves.
Apart from the semantic association and serialization of features and observations, assignment of URIs to observational resources remains a system by system and use-case by use-case problem that is not necessarily trivial. Identification of unique observations and maintaining identifiers for them can become very complicated considering that time series subsetting is a complex subject in and of itself. The Research Data Alliance 'Data Citation WG Recommendations' cite:[https://doi.org/10.15497/rda00016] might help guide the community to a useful solution on this front.
Domain feature models, such as GWML2 and SoilML, are defined for encoding and exchange of representations of environmental features. Other models, like HY_Features, are only available in conceptual (UML) form. For use in linked data, best practices and publication workflows for domain feature models as linked data ontologies need to be established. As described in Box 1: HY_Features Ontology Note and [ontology_from_uml] this work has begun and needs to be brought to conclusion for use across the community.
W3C and OGC work on Observations and Measurements and Sensor Ontologies (See https://www.w3.org/TR/vocab-ssn/ cite:[Haller:17:SSN]) is among the most advanced with regard to support for linked data approaches as pursued by ELFIE. While attempting to use the Semantic Sensor Network ontology, the concept of a monitoring network was found to be missing or not handled directly. Ad-hoc collections of monitoring features are a common need. Either as a collection that is meant to characterize a specific regional environmental feature, such as a watershed, or as a collection of monitoring features with common ownership, data offerings, or quality. In any case, further analysis of the nature of collections of monitoring features for use within and among environmental monitoring domains is warranted.
A major issue (which was known and was out of scope for ELFIE) was the lack of published feature identifiers and a linked-data baseline to build on. It is clear that, to grow the linked-data baseline, features and data linked to them need to be published using existing basic practices as a starting point. This starting point could be as trivial as a URL that returns a JSON-LD document with ID, Type, Name, and representative point location. The addition of a URI broker, content negotiation to provide HTML and RDF representations, and additional linked content would be possible given current techniques. However, it should not be seen as a necessary prerequisite to publishing linked data. Without a rudimentary starting point, the community has nothing to build on and could continue to debate the nuances of eventual solutions for ever.
Similar to publication of features and other linked data, URLs that define feature types and relations are not available in many cases. Without these referenceable, resolvable definitions, the linked data practices described here are very difficult to use across multiple systems that need to use common types and associations. Further, even when URIs have been established, the network behavior and content received when dereferencing them is not consistent across the community. For example, https://schema.org/GeoCoordinates returns an html page by default and implements both HTTP "accept" headers (e.g. "Accept: application/ld+json") and URL suffixes (e.g. https://schema.org/GeoCoordinates.jsonld) to request specific content types and http://www.w3.org/2004/02/skos/core#related defaults to an html page and implements accept headers but does not support the same content types as schema.org and does not support URL suffixes. Other linked data URIs, those from GeoSPARQL for example (http://www.opengeospatial.org/standards/geosparql/asWKT), do not resolve anything. If the community is going to start building cross-organization solutions based on linked data, these baseline feature types and associations need to be available and well described in a common way. It should be no surprise that schema.org’s linked data is used very broadly given that they provide a rich and approachable suite of content for their types and associations.
Since the ELFIE project has completed the OGC Naming Authority has embarked upon provision of a consistent, extensive and extensible Linked Data compatible “Definitions Server” to address some of these concerns. Publication of HY_Features has been prioritized and is available at https://www.opengis.net/def/appschema/hy_features/hyf Further experimental work is required to identify the range of different representations that can be used, such as automatic provision of JSON-LD context resources.
Non-information resource identifiers, URIs that are an identifier for a thing that may or may not have information representations, are useful to provide persistent and reusable identifiers for entities such as real-world features. These non-information resource identifiers provide a means for global identification across a distributed system such that multiple members of the system can contain information about and/or in reference to shared entities. There is a basic conundrum with non-information resource identifiers in that there can only be one default response (for a given encoding) when dereferencing a URI and only one member of a distributed system can be the source of that default response. Further, in order to achieve linked system interoperability, dereferencing behavior of non-information resource URIs needs to be in accord with the expectations of a member node of a system. The httpRange-14 and the Cool URIs Technical Report cite:[Sauermann:08:CUS] provide a significant basis for solving this conundrum and W3C working groups continue to focus on it, but many details remain to be agreed upon for use cases such as are being pursued by the ELFIE.
While details of the default dereferencing behavior of non-information resource identifiers and subsequent redirects to information resources was discussed in the ELFIE, the IE has attempted to focus on encoding of linked data information resources rather than the network behavior for discovery and retrieval of non-information resources. Future work and best practice development will be needed to address these issues:
-
HTTP vs HTTPS
-
embedded hints and headers to canonical identifiers vs information resources
-
permanence and bookmarkability of information resource URLs
-
detecting information resource URLs and injecting canonical URIs into systems, so that other resources can be found
-
declaration of sameAs relationships
-
where sameAs relationships are mandatory and optional - where can an agent find these?
-
stricter definition of version vs schema vs content model vs profile
-
use of expected W3C formalisms for describing profiles
-
optimal registration processes and registry provision for identifying, cataloguing, discovering and serving profiles/contexts
-
use of expected W3C/IETF mechanisms for profile negotiation in URI dereferencing
-
Adding additional dimensions of representation choice when dereferencing for geometry type, resolution, CRS etc
As demonstrated in the Watershed Data Index use case JSON-LD document, how to encode a preview geometry is not clear. The only widely used preview is a point geometry defined by schema.org GeoCoordinates. The schema.org polygon and line features have not been implemented widely and, like GeoSPARQL WKT geometries, require interpretation above and beyond what would be required for GeoJSON. Linking to a GeoJSON file works, but is not common in practice and requires additional web-requests to retrieve content, which is undesirable for a basic preview geometry. With the advent of javascript libraries that can handle WKT, it seems likely that a GeoSPARQL WKT geometry could be used effectively, but further experimentation may be needed to confirm this assumption.