How many names published in IPNI are available open access?
IPNI record DOIs against the publications in which new nomenclatural acts are found. This data element has been recorded since 2012.
- What proportion of IPNI monitored nomenclatural acts are published open access, and how is this changing over time?
- What OA statuses (green, gold, bronze, hybrid etc) are reported, and how are these changing over time?
- Do these trends vary with the WCVP distribution of the species?
graph TB
subgraph "Data access "
subgraph subgraph_padding_1 [ ]
style subgraph_padding_1 stroke-dasharray: 0 1
ipnidata["Download name publication <br>data from <b>IPNI</b>; extract DOIs"]
end
end
subgraph "Processing "
subgraph subgraph_padding_2 [ ]
style subgraph_padding_2 stroke-dasharray: 0 1
unpaywall["Lookup DOIs in <b>unpaywall</b>"]
ipnidata-->unpaywall
end
end
subgraph "Reporting "
subgraph subgraph_padding_3 [ ]
style subgraph_padding_3 stroke-dasharray: 0 1
rptoatakeup[Report on OA takeup over time]
rptoastatus[Report on OA statuses over time]
unpaywall-->rptoatakeup
unpaywall-->rptoastatus
end
end
The software is written in Python
and execution is managed with the build tool make
.
The command used to launch python is defined in the makefile as the variable python_launch_cmd
; on Windows (the default) the python
executable is prefixed with winpty
. Comment out this line of the Makefile if you are on Linux.
APIs are used to access IPNI data (via pykew
) and unpaywall (via the unpywall
package).
Software package dependencies are specified in requirements.txt
- Create a virtual environment:
python -m venv env
- Activate the virtual environment:
source env/Scripts/activate
- Install dependencies:
pip install -r requirements.txt
The Makefile
includes a year_min
variable which is passed to the getipninames.py
script to select the initial set of records. By default this is set to 2012
.
A report of the actions that will be taken to build a particular Makefile target can be seen by using the --dry-run
flag. For example to see the actions taken to process the reportoa
target use make reportoa --dry-run
.
The unpaywall lookup will take some time (several hours for datasets of thousands of records). The unpywall
utility offers a cache option which stores the results of a lookup and uses this local cache for subsequent requests. See more details here: https://unpywall.readthedocs.io/en/latest/cache.html. The cache file is named unpaywall_cache
and is specified in .gitignore
.
A complete run can be initiated with make all
or individual steps are detailed below.
- Download names from IPNI
- Script
getipninames.py
- Outputfile:
downloads/ipninames.csv
- Method Using the time period specified by the
year_min
variable, IPNI names are downloaded using thepykew
API wrapper. - How to run: Use the Makefile target:
make downloads/ipninames.csv
or the shorthand:make getnames
- Script
- Lookup DOI labelled literature in unpaywall
- Script
ipninames2oastatus.py
- Inputfile(s):
downloads/ipninames.csv
- Outputfile:
data/ipniname-oastatus.csv
- Method: Using the extracted DOI, make a call to unpaywall (https://unpaywall.org/) using the
unpywall
API wrapper and store the results in CSV format. - How to run: Use the Makefile target:
make data/ipniname-oastatus.csv
or the shorthand:make getoastatus
- Script
- Report on OA status over time
- Script
reportoastatus.py
- Inputfile(s):
data/ipniname-oastatus.csv
- Outputfile:
data/ipniname-oastatus-report.csv
- Method: Summarise the unpaywall data, by grouping on year, whether the literature item has a doi available (
has_doi
), if the literature is available open access (is_oa
) and the open access status (oa_status
- green, gold, bronze, hybrid etc), and counting the size of each of the groups. - How to run: Use the Makefile target:
make data/ipniname-oastatus-report.csv
or the shorthand:make reportoa
- Script
- Plot OA takeup over time
- Script
plotoa.py
- Inputfile(s):
data/ipniname-oastatus-report.csv
- Outputfile:
data/oatrend.png
- Method: Organise the unpaywall data by year and plot a stacked bar graph of OA takeup.
- How to run: Use the Makefile target:
make data/oatrend.png
or the shorthand:make plotoa
- Script
- Plot OA status over time
- Script
plotoatype.py
- Inputfile(s):
data/ipniname-oastatus-report.csv
- Outputfile:
data/oastatustrend.png
- Method: Organise the unpaywall data by year and plot a stacked bar graph of OA status.
- How to run: Use the Makefile target:
make data/oastatustrend.png
or the shorthand:make plotoastatus
- Script
- Indicate any correlations between WCVP distribution of species and OA status.
- TBC
- Script
- Method
- How to run
- Outputfile:
Two utility make targets are provided for this:
make clean
- removes all processed files (ie the contents of thedata
directory)make sterilise
- removes all processed files and all downloaded files (ie the contents of both thedata
anddownloads
directories)
- Execute an complete analysis using
make all
- Archive the inputs and results using
make archive
- Tag the software version used using git tag, and push the tag to github
- Create a release in github using the tag created in the previous step and attach the archived file to the github release
Please use the github issue tracker associated with this project to report bugs and make feature requests.
Please link your commit message to an issue which describes what is being implemented or fixed.
Any new dependencies should be added to requirements.txt
and committed to git. The env
directory is specified in .gitignore
, please do not commit this to git.
The data
and download
directories are specified in .gitignore
, so please do not commit these, or any outputs such as data files / chart images to git. Instead you should:
- Develop a script which automates the construction of the output (the datafile or chart image)
- Add a target to the
Makefile
which will:- Define the dependencies of the output (the script used to create the output, and any input files required)
- Call the script and generate the output
- Update the instructions above
Similarly, the archive
directory is specified in .gitignore
, please do not commit this or any of its contents to git - instead follow the process laid out in the "How to archive an analysis" section above.
Nicky Nicolson, RBG Kew ([email protected])