shic is a collection of shims for (automated) workflow composition and general utility too small or too trivial to warrant their own repository or entry in software registries.
What is a "shim"?
In our interpretation (not to be confused with the one in Wikipedia), a shim is a short code fragment that fills the gaps when creating a data processing workflow from separate software tools. Here a shim can be a few lines of code that extract the necessary information from a more complex file, convert between similar but not identical file types, or add redundant information to fill input requirements. In other words, shims are understood as being small pieces of glue code to gently massage, or shim, data produced by one software into data expected by another. Semantically, most shims are format converters, although their inputs and outputs may be different dialects or interpretations of the same format rather than different formats. The shims are generally dependent on the data type and application domain.
For technical reasons, including automatic annotation in bio.tools and simplifying the master shim interface to the shim collection, we only includes those shims that take a single file as input and outputs a single file.
The shims are contained in the shims folder as simple executable scripts in bash, Python or others scripting language.
For a coarse but standardized description of the functionality, we rely on the EDAM ontology. The shims.md table contains minimum metadata for all shims, including EDAM data type and format for the shim input and output, the operation (typically Conversion), domain and intended use.
shic is developed to support the Workflomics project on automated workflow exploration and benchmarking.
shims.md is automatically parsed to generate the shic bio.tools entry. Changing its format (non-table content, columns, column names, EDAM references, etc.) may require corresponding changes to the parser.
N.B. The Workflomics project enumerates the shims. Append new shims to the bottom of the table. If a shim is obsolete or deprecated, do not remove it, but remove its inputs and outputs, to avoid having APE explore workflows with this particular shim. All shims can be executed by calling masterShim.sh with the ID (number) of the shim and the arguments (input and output filenames) to the shim.
Tools such as awk, cut, grep and XPath used by these shims are available on common UseGalaxy servers, and can be used to implement many of these shims directly in Galaxy workflows.
Updating the bio.tools entry
After each push to the main
branch the new bio.tools annotation file is generated and available as an Artifact in the GitHub action. Step to update bio.tools shic
entry:
- Go to the GitHub action.
- Select the latest run, scroll to Artifacts (at the bottom) and download the file.
- Open the shic entry in bio.tools (you must be logged in as a maintainer of the tool in bio.tools to access the
/edit
screen). - Navigate to the
JSON
tab and paste the content of the downloaded file into the text box. - Click
Validate
to validate the JSON structure. - Click
Save
to update the entry.
The script generate_biotools_json.py is used to compile the complete bio.tools JSON entry based on the metadata in the shims.md table. The user should manually upload the content of assets/bio.tools_entry.json
to the shic entry in bio.tools using "Update this record" -> "JSON" or the bio.tools API (see steps 4-6 in the Semi-automatic update section).