-
Notifications
You must be signed in to change notification settings - Fork 1
/
DataLinkImp.tex
285 lines (234 loc) · 15 KB
/
DataLinkImp.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\title{IVOA DataLink Implementation note}
% see ivoatexDoc for what group names to use here
\ivoagroup{DAL}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/FrancoisBonnarel]
{Fran\c{c}ois Bonnarel}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/MarkusDemleitner]
{Markus Demleitner}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/PatrickDowler]
{Patrick Dowler}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/LaurentMichel]
{Laurent Michel}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/MarkTaylor]
{Mark Taylor}
\editor[http://www.ivoa.net/twiki/bin/view/IVOA/PatrickDowler]
{Fran\c{c}ois Bonnarel}
% \previousversion[????URL????]{????Concise Document Label????}
\previousversion{This is the first public release}
\newcommand{\blinks}{\{links\}}
\newcommand{\attval}[2]{#1={\allowbreak}{"}#2{"}}
\begin{document}
\begin{abstract}
This note gives some hints on how the DataLink specification should be used. The DataLink
specification document \citep{2023ivoa.spec.0617D} describes the linking of data discovery
metadata to access to the data itself, further detailed metadata, related resources, and
to services that perform operations on the data.
\end{abstract}
\section*{Acknowledgments}
The authors would like to thank all the participants in DAL-WG discussions
for their ideas, critical reviews, and contributions to this document.
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{http://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
The DataLink specification \citep{2023ivoa.spec.0617D} defines mechanisms for connecting
data items discovered via one service to related data products, and web services
that can act upon the data.
The \blinks\ web service capability is a web service capability for drilling
down from a discovered data item such as a dataset identifier, a source in a
catalog or any other data item. In the first case (typically an IVOA publisher
dataset identifier) it allows to find details about the data files that can be
downloaded, alternate representations of the data that are available, and
services that can act upon the data (usually without having to download
the entire dataset). In the latter cases it allows to associate metadata, images,
cubes, spectra, timeseries or ancillary data to a source in the sky or other
kind of item. The expected usage is for DAL (Data Access Layer)
data discovery services (e.g.\ a TAP service \citep{2010ivoa.spec.0327D}
with the ObsCore \citep{2017ivoa.spec.0509L} data
model or one of the simple DAL services) to provide an identifier that
can be used to query the associated DataLink capability. The DataLink
capability will respond with a list of links that can be used to access
the data.
The current document is first an implementation note for \blinks\ endpoint recognition
in various contexts (section \ref{sec:reco}). We are reviewing three methods by which
DataLink documents can be associated to resource entries in service responses and suggest
possible choices according to the context.
In addition it documents how to fill the \blinks\ enpoint response FIELDS (section
\ref{sec:fill}).
\section{DataLink \blinks\ endpoint recognition mechanisms}
\label{sec:reco}
\subsection{DataLink and SIAP-2.0 or ObsTAP services}
SIAP-2.0 \citep{2015ivoa.spec.1223D} services are queriable by setting constraints on
the four "axes" of the data (2D-space, spectral, time and polarization) and their properties.
Other archive metadata or data details are also queriable (target, collection, facilities,
etc...). ObsTAP services are TAP services delivering the ivoa.Obscore table queriable
via ADQL. The successfull response is a VOTable containing a mandatory result TABLE and
optional service descriptor resource(s).
\subsubsection{DataLink discovery via format and reference columns}
The result table describes the dataset characterization on the different axes and give
additional operative features. The various columns are mapped from the ObsCore-1.1
data model \citep{2017ivoa.spec.0509L}. The \verb|accessReference| and \verb|AccessFormat|
fields give different solutions to reach the dataset for download or access. This can be
done in two ways:
\begin{itemize}
\item The data discovery step has yielded a result with the \verb|AccessFormat| value
(media type) :
\verb|application/x-votable+xml;content=datalink|
To the client, this indicates that what is given in the access reference (e.g. the
\verb|access_url| column in ObsTAP or SIAP-2.0) is a DataLink document (see below for
DataLink functionalities).
\item In case the media type is different, the access reference is a direct link to the
dataset download.
\end{itemize}
\subsubsection{SIAP-2.0, ObsTAP and service descriptors}
To enable additional DataLink functionalities, SIAP-2.0 or ObsTAP services can then add
a service descriptor in the query response that indicates the availability of a DataLink
service accompanying the DAL discovery service. It references one (or more) field(s) from
the DAL response.
This is explained in detail in DataLink specification, section 4.4. The net result is that
DataLink-enabled clients can find ancillary data and use associated services for data access
or processing by virtue of being able to retrieve DataLink documents.
The service descriptor embedded in an ObsTAP query response is allowed because ObsTAP
is a peculiar TAP service and TAP allows the TAP response to contain additional resources
in addition to the "results" one (See section 2.9 of TAP-1.0 specification)
\footnote{Apparently TAP-1.1 doesn't say explicitely that additional resources to the
"results" one are allowed in the VOTable response}. Hence the inclusion of a service
descriptor service is valid.
\subsection{DataLink in the context of other dataset discovery methods}
Datasets can also be discovered in various other ways. Dataset "discovery" is defined as
the discovery of the IVOA \verb|publisher_did| of a dataset in standard services or by the
ad hoc discovery of proprietary "dataset-ids" in combination with the a priori knowledge of
a DataLink service "aware" of these dataset-ids.
Beside the most standard and modern discovery path via ObsTAP/SIAP-2.0, the following
alternative scenarii are possible:
\begin{itemize}
\item The discovery can be done via an SIAP-1.0 or an SSA service. A DataLink service
descriptor should have been added to the service query response to allow further guiding
to additional resources
\item The discovery can be done via a paper in a journal implementing publication of
datasets associated with articles. The editor should describe or implement a DataLink
service (eg via a service descriptor valid for the journal) for further usage and access
to download DataAccess metadata resources.
\item Logs of observations are dataset descriptions belonging to the catalog category.
They can be exposed in the VO using ConeSearch or TAP. They are in general not consistent
with the ObsCore data model. However they allow some kind of "dataset discovery". A Service
descriptor can be added to the VOTable containing the log file.
\item HiPS is an IVOA standard for a global and hierarchical allsky access to pixel and
catalogue data (reference HIPS). HiPS cells contain ids for original dataset which have
been processed to build the Healpix maps. HiPS metadata file allows to generate a specific
VOTable record for each cell of the HiPS collection including a "progenitor" URL. This can
be a new way to discover datasets and start a DataLink session to find out additional
associated resources.
\end{itemize}
\subsection{DataLink outside Data discovery context}
DataLink 1.1 provides description of resources to be attached to a dataset or to whatever
item in a response table of a VO service. It is possible to use DataLink mechanisms in other
VO contexts than Dataset discovery.
For example it could be useful to attach to a source in a catalog table such resources as
\begin{itemize}
\item data records for the same object contained in another catalog/database
\item a service interface to an external database prepared to query around the object
position.
\item Datasets associated to the source (such as images, spectra, etc..)
\end{itemize}
In addition we could attach links to each measurement in a measurement list for an
individual source formatted in VOTable. Light curves or radial velocity TimeSeries are
good examples of such measurement lists. These links can attach:
\begin{itemize}
\item original dataset from which the measurement is extracted
\item metadata describing the extraction method
\item calibration information
\end{itemize}
We could also attach links to entities in an IVOA provenance metadata service response.
\subsection{Various recognition solutions for use cases introduced in section 3 and 4}
In such contexts, the \blinks\ resource URL attached to a record may be given in various
ways:
\begin{itemize}
\item A DataLink service descriptor (see DataLink 1.1 specification, section 4) could be added
to the main table as long as the content of one of the table FIELDs may be used as the ID
parameter in the \blinks\ endpoint. An exemple is given in subsection 4.5 of the specification.
\item In case the \blinks\ endpoint URL is given in the main table for each row, a LINK
element SHOULD be added to the FIELD containing this URL. The content-type of the LINK will
be set to :
\begin{verbatim}
application/x-votable+xml;content=datalink
\end{verbatim}
In the example below the content of the "Datalink" FIELD is used as an URL to the
\blinks\ endpoint for each row.
\begin{verbatim}
<FIELD name="DataLink" datatype="char" arraysize="*" ucd="meta.ref.url" >
<LINK content-type="application/x-votable+xml;content=datalink" />
</FIELD>
\end{verbatim}
\item In case the main table contains an URL not systematically pointing to \blinks\
endpoint, but may also point to responses with other content types
(e.g.\ for single file download), the ObsCore utype "Access.reference" SHOULD be
added to the FIELD and an additional FIELD with ObsCore utype "Access.format" SHOULD
contain the "application/x-votable+xml;content=datalink" value when appropriate.
A ref to the ID of the FIELD containing the URL SHOULD be added to avoid ambiguities.
In the example below, the Image-Link contains the URL pointing to a \blinks\ response or
the dataset istself depending on the value of the Image-Format FIELD.
\begin{verbatim}
<FIELD ID="ILink" name="Image-Link" datatype="char" arraysize="*"
ucd="meta.ref.url" utype="AccessReference" />
<FIELD name="Image-Format" datatype="char" arraysize="*"
ucd="meta.code.mime" utype="AccessFormat" ref="ILink" />
\end{verbatim}
\end{itemize}
\section{Datalink \blinks\ endpoint response FIELDS usage}
\label{sec:fill}
This section lists the standard FIELDS in the response and
trate how to use them when this seems required
\begin{itemize}
\item \textbf{ID :} The ID can be a \verb|publisher_did| or any proprietary item id common to
the main service and the Datalink service.
\item \textbf{access\_url :} The usage of this FIELD is straightforward from the
specification. Just pay attention to the URL with fragments case. By using this feature
several time for the same root URL in your \blinks\ endpoint response you may distinguish
different semantics, content\_qualifier, and local\_semantics inside the same linked target.
For example this could allow to distinguish science data from calibration or auxiliary
subsets in the same file
\item \textbf{service\_def :} This is intended to refer to a service described by the service
descriptor embedded in the same VOTable document. A specific IVOA implementation note on
service descriptors should be written in the future
\item \textbf{error\_message :} There is nothing to add to the specification text here
\item \textbf{description :} This free text should be specific to each link for each specific
ID value and is intended to be read by a user when looking at the response table of the
\blinks\ endpoint. It is not really useful for the user that data providers fill this field
with a generic text valid for all links sharing the same semantics value
\item \textbf{semantics :} This field is intended mainly to give information to the client
software in order to help it to take some decision. However it is also an important information
for the end user. It is qualifying the relationship between the main item identified by the ID
value and the thing which is retrievable via the acces\_url. For example \#calibration or
\#auxiliary are not intrinsec properties of this thing but relatively to the main item
(probably a science dataset). The intrinsec thing type or quality is given by
content\_qualifier (in the case if a \#auxiliary relationship the thing pointed by the
access\_url can be a table, -an image, a cube...). Although
\item \textbf{content\_type :} The usage of this Field is straightforward. Again it is intended
to help the client software to manage the response of the access\_url
\item \textbf{content\_length :} This field is useful to inform the user or the client
software how long will the transfer last and how much memory is required to host it
\item \textbf{content\_qualifier :} The usage of this field is rather straightforward. At the
time of writing of this note it is recommended to use the ObsCore vocabulary for dataproduct
type instead of the ivoa external vocabulary which is not stabilized. Other standard vocabularies
could be used in the future according to the quality of the link response, for example it could
be the identifier of some facility consistent with the emergent facility specification
\item \textbf{local\_semantics :} local\_semantics is mainly intended to help the software to sort out the links and take decisions about what to do with them when semantics, content\_type and content\_qualifier do not allow to do it. This will contain a proprietary vocabulary useful for the end user but is not some totally free text like the one in description.
\item \textbf{link\_auth :} This field contains an information which relates to the status of the access\_url and is not dependant of the user
\item \textbf{link\_authorized :} This field will be filled by the server after the user authentification process. It is dependant both from the link and the user
\end{itemize}
\section{Changes}
2024-01-18: upgrade to citations of most recent VO specifications and adding a section on usage of response FIELDS\\
2023-12608: This was the initial version of this document, porting on github a previous internal DAL note.
\bibliography{ivoatex/ivoabib,ivoatex/docrepo,localrefs.bib}
\end{document}