diff --git a/episodes/02-fair-research-software.md b/episodes/02-fair-research-software.md index 88d34296..2b1d33a9 100644 --- a/episodes/02-fair-research-software.md +++ b/episodes/02-fair-research-software.md @@ -32,60 +32,50 @@ work or attitude may be different afterwards. :::::::::::::::::::::::::::::::::::::::::::::::: - -## FAIR software +## What is FAIR? FAIR stands for Findable, Accessible, Interoperable, and Reusable and comprises a set of principles designed to increase the visibility and usefulness of your research to others. The FAIR data principles, first published [in 2016][fair-data-principles], are widely known and applied today. Similar [FAIR principles for software][fair-principles-research-software] have now been defined too. In general, they mean: -* **Findable** - software and its associated metadata must be easy to discover by humans and machines. -* **Accessible** - in order to reuse software, the software and its metadata must be retrievable by standard protocols, free and legally usable. -* **Interoperable** - when interacting with other software it must be done by exchanging data and/or metadata through +- **Findable** - software and its associated metadata must be easy to discover by humans and machines. +- **Accessible** - in order to reuse software, the software and its metadata must be retrievable by standard protocols, free and legally usable. +- **Interoperable** - when interacting with other software it must be done by exchanging data and/or metadata through standardised protocols and application programming interfaces (APIs). -* **Reusable** - software should be usable (can be executed) and reusable +- **Reusable** - software should be usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software). Each of the above principles can be achieved by a number of practices listed below. This is not an exact science, and by all means the list below is not exhaustive, but any of the practices that you employ in your research software workflow will bring you -closer to the gold standard of a fully reproducible research. +closer to the gold standard of fully reproducible research. ### Findable -In order to make your software more findable, you should: - - Create a description of your software to make it discoverable by search engines and other search tools +- Use standards (such as [CodeMeta][codemeta]) to describe interoperable metadata for your software (see [Research Software Metadata Guidelines][rsmg-1]) - Place your software in a public software repository (and ideally register it in a [general-purpose or domain-specific software registry][software-registries]) - Use a unique and persistent identifier (DOI) for your software (e.g. by depositing your code on [Zenodo][zenodo]), which is also useful for citations - note that depositing your data/code on GitHub and similar software repositories may not be enough as they may change their open access model or disappear completely in the future, so archiving your code means it stands a better chance at being preserved ### Accessible - -In order to make your software more accessible, you should: - -* Make sure people can freely, legally and easily get a copy your software -* Use code style conventions and code structure patterns, use comments and create documentation to make your code -comprehensible by people (once they get a copy of it) - i.e. make your code accessible in the *intelligible* sense +- Make sure people can obtain get a copy your software using standard communication protocols (e.g. HTTP, FTP, etc.) +- The code and its description (metadata) has to be available even when the software is no longer actively developed (this includes earlier versions of the software) ### Interoperable -In order to make your software more interoperable, you should: - -- Explain the functionality of your software, so others can understand how other tools can interoperate with it -- Use standard formats for inputs and outputs -- Communicate with other software via standard protocols and APIs +- Explain the functionality of your software and protocols for interaction with it +- Use community-agreed standard formats for inputs and outputs of your software and its metadata (e.g. [CodeMeta][codemeta]) +- Communicate with other software and tools via standard protocols and APIs ### Reusable -In order to make your software more reusable, you should: - -- Document your software (including its functionality, and how to install and run it) to make it more understandable by others who may wish to reuse or extend it -- Follow best practices for software development (including code conventions, structure, readability and correctness) -- Test your software and make sure it works on different platforms/operating systems to make it more reusable +- Document your software (including its functionality, how to install and run it) to make it more understandable by + others who may wish to reuse or extend it +- Follow best practices for software development, e.g. structure your code using common patterns and use coding + conventions to make your code readable and understandable by people +- Test your software and make sure it works on different platforms/operating systems - Give a licence to your software clearly stating how it can be reused - State how to cite your software, so people can give you credit when they reuse it -- Include a contributor policy so that others can contribute to your code and credit for contributions is provided - :::::: callout @@ -228,7 +218,7 @@ We recommend the following resources for some additional reading on the topic of - [CodeRefinery][coderefinery] - training and e-Infrastructure for research software development - A [self-assessment checklist for FAIR research software][fair-rs-checklist], by the Netherlands eScience Center and Australian Research Data Commons -- [Awesome Research Software Registries][awesome-research-software-registries] - a list of research software +- [Awesome Research Software Registries][awesome-rs-registries] - a list of research software registries (by country, organisation, domain and programming language) where research software can be registered to help promote its discovery diff --git a/episodes/03-tools.md b/episodes/03-tools.md index 02215795..12da14b4 100644 --- a/episodes/03-tools.md +++ b/episodes/03-tools.md @@ -20,34 +20,41 @@ After completing this episode, participants should be able to: :::::::::::::::::::::::::::::::::::::::::::::::: -In this course we will introduce you to a number of tools and practices that are commonly used in research to help you -develop software in a FAIR way. -You should already have these tools installed on your machine following the [setup instructions](./index.html#astronaut-data-and-analysis-code). -Here we will give an overview of the tools, how they help you achieve the aims of FAIR research software and how -they work together. -In later episodes we will describe some of these tools in more detail. +## Tools and good practices + +There are various tools and practices that support the development of FAIR research software, contributing to each of +the four FAIR principles. +These tools and practices work together, as no single tool or practice will fully address one principle, and conversely +each one can contribute to multiple principles simultaneously. +It is important to note that simply using these tools, without following good practice and guidance on how best to align +their usage with the FAIR principles, is not enough to produce FAIR software. + +You should already have these tools installed on your machine by following the [setup instructions](./index.html#astronaut-data-and-analysis-code). +Here we will give an overview of the tools and good practices and how, when used in combination, they can help you +achieve the aims of FAIR research software. +In later episodes we will describe these tools and practices in more detail. ### Development environments -Virtual and integrated development environments (IDEs), such as VS Code or PyCharm, help with running, testing, and debugging code. -Virtual environments further enable us to share our working environments with others, making it easier to access, reuse and extend our code. +Virtual and integrated development environments (IDEs), such as VS Code or PyCharm, help with reading, running, testing, and debugging code. +Virtual environments further enable us to share our working environments with others, making it easier to reuse and extend our code. IDEs often provide integrations with other tools, e.g. version control and command line terminals, enabling you to do many tasks from a single environment, saving time in switching between different tools. ### Command line terminals Command line terminals (e.g. Bash, GitBash) enable us to run and test our code without graphical user interfaces (GUI) afforded to us by IDEs - -this is sometimes needed for accessing and running our code remotely on servers and high-performance systems without a GUI provision, where time, +this is sometimes needed for running our code remotely on servers and high-performance systems without a GUI provision, where time, memory and processing power are expensive or in high demand. Version control systems are typically provided as command line tools, making them often only accessible from command line terminals to enter commands and access remote version control servers to backing up and sharing our work. -Finally, command line tools use standard protocols for passing parameters, inputs and outputs. -This makes it easier to integrate ours with other command line tools, allowing us to chain them and build up complex -and reproducible workflows and analysis pipelines using several programs in different steps. -If we write our software in a way which provides such an interoperable command line interface - we will be able to -integrate it with other command line tools to automate and speed up our work. +Finally, command line tools are interoperable software that use standard protocols for passing parameters, inputs and outputs via the command line terminal. +This makes it easier to integrate with other tools, allowing us to chain command line tools and build up complex and reproducible workflows and analysis pipelines +using several programs in different steps. +If we write our software in a way which provides such an interoperable command line interface - we will be able to integrate it with other command line tools to +automate and speed up our work. ### Standard input/output formats and communication protocols @@ -61,12 +68,6 @@ When combined with software sharing and collaborative platforms such as GitHub o teamwork and discussions about software and design decisions, provides backup facilities for your code and speeds up collaboration on shared code by allowing edits by more than one person at a time. -### Code style and structure conventions - -Following code style conventions for your programming language and standard code structure patterns that are agreed upon -by the community and other programmers are important practices to ensure that others find it easy to read your code, -reuse or extend it in their own examples and applications. - ### Code testing Testing ensures that your code is correct and does what it is set out to do. @@ -75,11 +76,32 @@ it is very hard to consider all possible edge cases or notice every single typin Testing also gives other people confidence in your code as they can see an example of how it is meant to run and be assured that it does work correctly on their machine - helping with code understanding and reusability. -### Software- and project- level documentation +### Coding conventions + +Following coding conventions and guides for your programming language that is agreed upon by the community and other programmers +are important practices to ensure that others find it easy to read your code, reuse or extend it in their own examples and applications. + +### Code licensing + +A licence is a legal document which sets down the terms under which the creator of work (such as written text, +photographs, films, music, software code) is releasing what they have created for others to use, modify, extend or exploit. +It is important to state the terms under which software can be reused - the lack of a licence for your software +implies that no one can reuse the software at all. + +A common way to declare your copyright of a piece of software and the license you are distributing it under is to +include a file called **LICENSE** in the root directory of your code repository. -Documentation comes in many forms - from **software-level documentation** including docstrings describing -functions and classes and in-line comments that explain lines of your code, to **project-level documentation** and -**metadata** (including README, LICENCE, CITATION, CONTRIBUTING, etc. files) +### Code citation + +We should add a **CITATION** file to our repository to provide instructions on how and when to cite our code. +A citation file can be a plain text (CITATION.txt) or a Markdown file (CITATION.md), but there are certain benefits +to using use a special file format called the [Citation File Format (CFF)][cff], which provides a way to include richer +metadata about code (or datasets) we want to cite, making it easy for both humans and machines to use this information. + +### Code- and project- level documentation + +Documentation comes in many forms - from **code-level documentation** including descriptive names of variables and functions and +additional comments that explain lines of your code, to **project-level documentation** (including README, LICENCE, CITATION, CONTRIBUTING, etc. files) that help to discover it, explain the legal terms of reusing it, describe its functionality and how to install, run and contribute to it, to whole websites full of documentation with function definitions, usage examples, tutorials and guides. You many not need as much documentation as a large commercial software product, but making your code reusable relies on other people being able to understand @@ -94,36 +116,45 @@ You should check the rules or guidelines of your institution, grant or domain on Some examples of commonly used software repositories and registries include: -- general-purpose software repositories, such as [GitHub][github] and [GitLab][gitlab] -- programming language-specific software repositories, such as [PyPi][pypi] (for Python) and [CRAN][cran] (for R) -- software registries, such as [BioTools][biotools] (for biosciences) and [Awesome Research Software Registries][awesome-rs-registries] (providing a list of research software registries by country, organisation, domain and programming language) where research software can be registered to help promote its discovery +- general-purpose software repositories - [GitHub][github] and [GitLab][gitlab] +- programming language-specific software repositories - [PyPi][pypi] (for Python) and [CRAN][cran] (for R) +- software registries - [BioTools][biotools] (for biosciences) and [Awesome Research Software Registries][awesome-rs-registries], providing a list of research software registries (by country, organisation, domain and programming language) where research software can be registered to help promote its discovery ### Persistent identifiers -Unique persistent identifiers, such as Digital Object Identifiers (DOIs) provided by [Zenodo][zenodo], [FigShare][figshare] and similar digital archiving services, and commits/tags/releases used by GitHub and similar code sharing platforms, +Unique persistent identifiers, such as **Digital Object Identifiers** (DOIs) provided by [Zenodo][zenodo], +[FigShare][figshare], etc., or **SoftWare Heritage persistent IDentifiers** ([SWHID](swhid)) provided by [Software Heritage][software-heritage], +and similar digital archiving services, and commits/tags/releases used by GitHub and similar code sharing platforms, help with findability and accessibility of your software, and can help you get credit for your work by providing citable references. -### Tools and practices summary +### Tools for assessing FAIRness of software -The table below provides a summary of how different tools and practices help with the FAIR software principles. +Here are some tools that can check your software and provide an assessment of its FAIRness: -| Tools and practices | Findable | Accessible | Interoperable | Reusable | -|---------------------------------------------------------------------------------------------------| -------- | ---------- | ------------- | -------- | -| Virtual development environments | | x | | x | -| Integrated development environments/IDEs | | | | x | -| Command line terminals - automated and reproducible pipelines | | | x | x | -| Standard formats - e.g. for data exchange (CSV, YAML) | | x | x | x | -| Communication protocols - Command Line Interface (CLI) or Application Programming Interface (API) | | x | x | x | -| Version control tools | x | | | | -| Code testing and correctness | | x | | x | -| Code style conventions | | x | x | x | -| Software-level documentation (comments and docstrings, explaining functionality) | | x | x | x | -| Project-level documentation (READMEs, explaining functionality/installation/running) | | x | x | x | -| License - code sharing and reuse | | x | | x | -| Citation - code reuse and credit | x | | | x | -| Software repositories and registries - code sharing | x | x | | | -| Unique persistent identifiers - finding and citing software | x | x | | | +- [FAIRsoft evaluator][fair-rs-evaluator] +- [FAIR software test][fair-rs-test] +- [`How FAIR is your software` - command line tool to evaluate a software repository's compliance with the FAIR principles][howfairis] + +### Tools and practices summary + +The table below provides a summary of how different tools and practices help with the FAIR software principles. +| Tools and practices | Findable | Accessible | Interoperable | Reusable | +|------------------------------------------------------------------------------------------------------|----------|------------|---------------| -------- | +| Virtual development environments | | | | x | +| Integrated development environments (IDEs) | | | | x | +| Command line terminals - automated and reproducible pipelines | | | x | x | +| Standard data exchange formats - e.g. for data exchange (CSV, YAML) | | | x | x | +| Communication protocols - Command Line Interface (CLI) or Application Programming Interface (API) | | | x | x | +| Version control tools | x | | | | +| Code testing & correctness | | | | x | +| Coding conventions | | | | x | +| Code-level documentation (comments and docstrings, explaining functionality) | | | | x | +| Project-level documentation & metadata (README, explaining functionality/installation/running, etc.) | | | x | x | +| License - code sharing & reuse | | | | x | +| Citation - code reuse & credit | | | | x | +| Software repositories & registries | x | x | | | +| Unique persistent identifiers | x | x | | | ## Checking your setup @@ -144,28 +175,28 @@ Compare the output with your neighbour and see if you can see any differences. Checking the command line terminal: -1. `date` -2. `echo $SHELL` -3. `pwd` -4. `whoami` +1. `$ date` +2. `$ echo $SHELL` +3. `$ pwd` +4. `$ whoami` Checking Python: -5. `python --version` -6. `python3 --version` -7. `which python` -8. `which python3` +5. `$ python3 --version` +6. `$ python3 --version` +7. `$ which python` +8. `$ which python3` Checking Git and GitHub: -9. `git --help` -10. `git config --list` -11. `ssh -T git@github.com` +9. `$ git --help` +10. `$ git config --list` +11. `$ ssh -T git@github.com` Checking VS Code: -12. `code` -13. `code --list-extensions` +12. `$ code` +13. `$ code --list-extensions` ::: hint diff --git a/episodes/04-version-control.md b/episodes/04-version-control.md index 2d903035..fc2fa69d 100644 --- a/episodes/04-version-control.md +++ b/episodes/04-version-control.md @@ -42,7 +42,7 @@ file changes over time. They keep track of every modification to the files in a special database that allows users to "travel through time" and compare earlier versions of the files with the current state. -## Why use a version control system? +### Why use a version control system? The main motivation as scientists to use version control in our projects is for reproducibility purposes. As hinted to above, by tracking and storing every change diff --git a/episodes/05-code-environment.md b/episodes/05-code-environment.md index 1fc20993..5d527d39 100644 --- a/episodes/05-code-environment.md +++ b/episodes/05-code-environment.md @@ -94,7 +94,7 @@ Virtual environments also enable you to always use the latest available version without specifying it explicitly. They also enable you to use a specific older version of a package for your project, should you need to. -## Managing virtual environments +### Managing virtual environments There are several command line tools used for managing Python virtual environments - we will use `venv`, available by default from the standard `Python` distribution since `Python 3.3`. @@ -107,7 +107,7 @@ it interacts and obtains the packages from the central repository called So, we will use `venv` and `pip` in combination to help us create and share our virtual development environments. -## Creating virtual environments +### Creating virtual environments Creating a virtual environment with `venv` is done by executing the following command: @@ -205,7 +205,7 @@ Note that, since our software project is being tracked by Git, the newly created virtual environment will show up in version control - we will see how to handle it using Git in one of the subsequent episodes. -## Installing external packages +### Installing external packages We noticed earlier that our code depends on four **external packages/libraries** - `json`, `csv`, `datetime` and `matplotlib`. @@ -293,7 +293,7 @@ zope.interface 7.0.1 To uninstall a package installed in the virtual environment do: `python -m pip uninstall `. You can also supply a list of packages to uninstall at the same time. -## Sharing virtual environments +### Sharing virtual environments You are collaborating on a project with a team so, naturally, you will want to share your environment with your collaborators diff --git a/episodes/07-code-structure.md b/episodes/07-code-structure.md index 526ebe10..13c42721 100644 --- a/episodes/07-code-structure.md +++ b/episodes/07-code-structure.md @@ -48,7 +48,7 @@ https://github.com/carpentries-incubator/astronaut-data-analysis-not-so-fair/tre :::::: -## Functions for modular and reusable Code +## Functions for modular and reusable code As we have already seen in the previous episode - functions play a key role in creating modular and reusable code. We are going to carry on improving our code following these principles: diff --git a/learners/reference.md b/learners/reference.md index 54cb6575..ad94eb76 100644 --- a/learners/reference.md +++ b/learners/reference.md @@ -66,7 +66,7 @@ can be used by the research community to understand how they can make their rese - [Short online courses on various aspects of research software (including FAIR)][nesc-rs-support-courses], by the NeSC Research Software Support -- [Awesome Research Software Registries][awesome-research-software-registries], a list of research software registries +- [Awesome Research Software Registries][awesome-rs-registries], a list of research software registries (by country, organisation, domain and programming language) where research software can be registered to help promote its discovery diff --git a/links.md b/links.md index 0f7ce05b..0fb1dced 100644 --- a/links.md +++ b/links.md @@ -34,7 +34,6 @@ any links that you are not going to use. [fair-cookbook]: https://faircookbook.elixir-europe.org/content/home.html [10-easy-fair-things]: https://librarycarpentry.org/Top-10-FAIR/files/poster_10things_FAIRsoftware.pdf [top-10-fair-things-per-domain]: https://librarycarpentry.org/Top-10-FAIR/ -[awesome-research-software-registries]: https://github.com/NLeSC/awesome-research-software-registries [beginner-guide-reproducible-research]: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/bes2.1801 [swc-git-lesson]: https://swcarpentry.github.io/git-novice [swc-git-lesson-track]: https://swcarpentry.github.io/git-novice/04-changes.html @@ -126,4 +125,13 @@ any links that you are not going to use. [mkdocs-deploy]: https://www.mkdocs.org/user-guide/deploying-your-docs/ [opensource-licence-guide]: https://opensource.guide/legal/#which-open-source-license-is-appropriate-for-my-project [choosealicense]: https://choosealicense.com/ - [10-rules-better software]: https://doi.org/10.1371/journal.pcbi.1012410 +[10-rules-better software]: https://doi.org/10.1371/journal.pcbi.1012410 +[rsmd-g1]: https://fair-impact.github.io/RSMD-guidelines/1.General/ +[software-heritage]: https://www.softwareheritage.org/ +[swhid]: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html +[figshare]: https://figshare.com/ +[howfairis]: https://github.com/fair-software/howfairis/ +[fair-rs-evaluator]: https://openebench.bsc.es/observatory/Evaluation +[fair-rs-test]: https://github.com/marioa/fair-test?tab=readme-ov-file +[codemeta]: (https://codemeta.github.io/) +