Data Package Pipelines processors for CKAN.
# clone the repo and install it with pip
git clone https://github.com/frictionlessdata/datapackage-pipelines-ckan.git
pip install -e .
datapackage-pipelines-ckan contains several pipeline processors for working with CKAN.
A processor to retrieve metadata about a CKAN resource from a CKAN instance and add it as a datapackage resource.
run: ckan.add_ckan_resource
parameters:
ckan-host: http://demo.ckan.org
resource-id: d51c9bd4-8256-4289-bdd7-962f8572efb0
ckan-api-key: env:CKAN_API_KEY # an env var defining a ckan user api key
ckan-host
: The base url (and scheme) for the CKAN instance (e.g. http://demo.ckan.org).resource-id
: The id of CKAN resourceckan-api-key
: Either a CKAN user api key or, if in the formatenv:CKAN_API_KEY_NAME
, an env var that defines an api key. Optional, but necessary for private datasets.
A processor to save a datapackage and resources to a specified CKAN instance.
run: ckan.dump.to_ckan
parameters:
ckan-host: http://demo.ckan.org
ckan-api-key: env:CKAN_API_KEY
overwrite_existing: true
push_resources_to_datastore: true
dataset-properties:
name: test-dataset-010203
state: draft
private: true
owner_org: my-test-org
ckan-host
: The base url (and scheme) for the CKAN instance (e.g. http://demo.ckan.org).ckan-api-key
: Either a CKAN user api key or, if in the formatenv:CKAN_API_KEY_NAME
, an env var that defines an api key.overwrite_existing
: Iftrue
, if the CKAN dataset already exists, it will be overwritten by the datapackage. Optional, and default isfalse
.push_resources_to_datastore
: Iftrue
, newly created resources will be pushed the CKAN DataStore. Optional, and default isfalse
.push_resources_to_datastore_method
: Value is a string, one of 'upsert', 'insert' or 'update'. This will be the method used to add data to the DataStore (see https://ckan.readthedocs.io/en/latest/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_upsert). Optional, the default is 'insert'.dataset-properties
: An optional object, the properties of which will be used to set properties of the CKAN dataset.
The processor first creates a CKAN dataset from the datapackage specification, using the CKAN api package_create
. If the dataset already exists, and parameter overwrite_existing
is True
, the processor will attempt to update the CKAN dataset using package_update
. All existing resources and dataset properties will be overwritten.
If the CKAN dataset was successfully created or updated, the dataset resources will be created for each resource in the datapackage, using resource_create
. If datapackage resource are marked for streaming (they have the dpp:streamed=True
property), resource files will be uploaded to the CKAN filestore. For example, remote resources may be marked for streaming by the inclusion of the stream_remote_resources
processor earlier in the pipeline.
Additionally, if push_resources_to_datastore
is True
, the processor will push resources marked for streaming to the CKAN DataStore using datastore_create
and datastore_upsert
.