This document is currently under development based on the Working Draft of a convention for APIs built on top of IATI data.
The International Aid Transparency Initiative requires that organisations publish XML files (a) about their organisations; and (b) about aid activities that they are involved in. The segmentation policy of IATI means that donors should publish one IATI Activity file per region or country they work in, and that an activity should only ever be included in one file. This means that users cannot be sure they have found all the activities for a given country, or a given sector, without searching through all available IATI Activity files. It also means that there are considerable technical barriers for anyone wishing to use IATI data, requiring them to download, parse and manipulate XML data.
Many uses of IATI data therefore involve placing the data into some form of data store or database, and their retrieving it from there, without primary regard to the separate files it was distributed in. Many such stores of IATI data will provide API access to their contents, and to analytic functions over the data. If each available service doing so takes a different approach to providing access to that data, and serialising the results of queries, then users still face a high barrier when encountering any new platform, and tools built on top of these platforms will need to be customised for each data store they use. A shared convention for building APIs will increase the interoperability of tools for working with IATI data, and will lower the barriers for anyone seeking to use the data.
An API convention can also help developers to address some of the complexity of working with IATI data by recording common patterns and logic for dealing with difficult issues such as versioning, double-counting, JSON and CSV serialisation of XML data, and so-on.
As of February 2013 this convention has been widely discussed in the IATI technical community. It is not a formally agreed standard by the TAG. Discussion of the convention takes place on the IATI Technical mailing list.
The development of any API should be governed by the following principles:
- 1) RESTful design** - APIs should follow a RESTful pattern, using URLs for key services, and querystring parameters for any filters and settings to be applied to responses.
- 2) Fidelity to the IATI standard** - The IATI Schema provides clear definitions of a wide range of elements. Names used in any API should be based closely upon the names in the IATI Standard schemas. The schemas and code lists provide the authoritative definition of terms. The use of nested elements in the IATI XML schema is semantically important. The rendering of element names and structures in different serialisations (e.g. CSV, JSON) should use predictable rule-based transformations of the element names and structures from the schema.
- 3) Query by codes, rather than names** - IATI has been designed (a) to support multilingual data; (b) to enable users to review individual XML files without reference to code list look-up tables. This means there are places where both a code or reference, and a text description can be given against an element. The text descriptions of code list entries can vary across language, but the codes will always stay the same. For this reason, the default behaviour should be to use codes in queries rather than the text of elements.
- 4) Accountability and provenance** - Most APIs will perform some transformation on the raw IATI XML data they consume. It is important for users to be able to discover (a) what IATI data any public API covers; (b) what processes have been applied to data to clean, transform or augment it.
<note>This document uses the terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY <a href="http://www.ietf.org/rfc/rfc2119.txt">according to RFC2119</a></note>
<note>The convention is currently in a beta state. Not all issues will be fully addressed, not will it account for all situations. Consult with the IATI Technical mailing list if you encounter issues not handled by the convention.</note>
Any API may provide one or more endpoints for accessing entities extracted from IATI XML files. The endpoint determines the kind of response a client can expect. For example, /activity/ endpoints return full 'iati-activity' records. /transaction/ endpoints return the 'transaction' blocks from within 'iati-activity' records.
Three basic endpoint families are suggested:
* **access** for fetching individual records or collections of records * **aggregate** for fetching aggregated data * **about** for describing the data held in the API
A fourth **provenance** endpoint has been suggested, but no clear specification or use-case yet agreed.
Access endpoints can be provided for activities, transactions, budgets, planned-disbursements, results and other common elements in IATI XML files. In general, these endpoints will return collections, filtered by query parameters, but may also provide access to individual records.
* access/ - for all queries that return lists, collections or individual records * activity/ - for returning a list of activities. * transaction/ - for returning a list of transactions. * organisation/ - for returning a list of organisations based on organisational files
A number of entities occur across activities, and have no single unique representation within IATI Activity or Organisation files. This includes sectors, countries, organisations and classifications applied to activities. Endpoints may be provided which return a general description of these entities based upon code lists and/or upon information derived from the available data. Examples include:
* participating-org/ - providing a list of know participating organisations * (To get just funding organisations use /access/participating-orgs/?role=Funding) * participating-org/{@ref} - information on a specific participating organisation * sector/{@vocabulary} - providing a list of sectors. {@vocabulary} is optional. * sector/{@vocabulary}/{@code} - providing details of a specific sector * recipient-country/{@code} - providing details of a specific country * recipient-region/{@code} - providing details of a specific region
=== Aggregate Endpoint** The aggregate endpoint should be used for all aggregations across activities or organisations.
<note>Note: Some APIs MAY also choose to provide per-activity aggregation: for example, to add up the total value of transactions by type for an activity, and to add this to the data returned against that activity. This type of activity-level aggregation is exposed under the access/ endpoint. ) </note>
Under aggregate/ the following common endpoints may be provided:
* aggregate/ * activity/ - for total numbers of activities in an area * transaction/ - for aggregating transactions * budget/ - for aggregating budget values * planned-disbursement/ - for aggregating planned disbursements
The IATI Standard also include a standard for Organisation files. These is not widely used at present. If planning an API that will return data from organisation files please consult with the IATI Technical mailing list to agree an approach.
* organisation/
provenance/ - this endpoint could be used for machine readable provenance data. Not detailed use case or specification is worked out yet.
about/ - this endpoint should return a human and machine readable description of the API, including information about the creator, the features the API implements, and any special considerations for users of the data. No detailed specification for this is worked out yet.
An endpoint without version numbers should point to the most recent version of the endpoint. Version numbers may be prefixed to address a particular version of an API. For example, if the current version is 1.2 then:
/1.1/access/ talks to version 1.1 /1.2/access/ talks to version 1.2 /access/ talks to version 1.2
Please detail in /about/ and in API responses which version of the IATI standard you are using in rendering output.
<note>The naming conventions have been designed to balance human readability with straightforward rule-based conversion between parameter or field names, and xpath expressions - in order to reflect the structure of IATI XML as closely as possible</note>
The default naming pattern for parameters should be:
{parent-element-name}_{element-name}.{attribute-name}
Where the parent is iati-activity or iati-organisation, then {parent-element-name}_ may be omitted.
This naming convention should be used:
* When accepting parameter for queries; * When outputting data in CSV format or another flat format (e.g. flattened JSON)
When .{attribute-name} is omitted, the API should use a default behaviour based on the following rules:
* For elements where the schema allows @code or @ref values, then search on the value of @code or @ref * For elements without @code or @ref specified, then search on the value of the element
As parameters
* ?sector=60061 searches on //sector[@vocabulary=’DAC’]/@code * ?sector.text =Debt searches on //sector[@vocabulary=’DAC’]/text() * ?participating-org=GB-1-123 searches on //participating-org/@ref * ?participating-org.text=Oxfam searches on //participating-org/text() * ?iati-identifier=GB-1-123 searches on //iati-identifier/text() * ?location_name=Oxford searches on //location/name/text() * ?transaction_value=1000 searches on //transaction/value/text()
Where a default behaviour is provided it MUST be possible to also use the canonical parameter name. For example, ?sector.code and ?sector should both be valid to search //sector[@vocabulary=’DAC’]/@code
It SHOULD be possible to search on the element value with .text.
In some cases a flattened output may split out fields based on attribute values as well as element names. For example, a CSV file may include separate columns for start date and end date rather than just one concatenated field for activity-date. In these cases:
* Where the schema permits a single code-list controlled attribute, the value of this can be placed in parentheses after the field name * e.g. 'activity-date (start-actual)' * Where the schema permits multiple code-list controlled attributes, the relevant value can be placed in paratheses after the field name, preceeded by the attribute name and colon. * e.g. 'sector (vocabulary:DAC)'