This project concerns the collecting and presenting, in the form of a dashboard, of data obtained from the IUCN RedList website and API. Any use of the contents of this project should follow the terms of service of the IUCN RedList.
The source code for the project is found in the src/ directory. It is divided into four parts:
-
webscraping subdirectory:
-
pull-website.py: This script uses Selenium WebDriver to collect IDs of species. The URLs visited by the drivers are URLs of filtered lists of species, which you can create by applying filters on the search page, and pressing the "Save search" button while logged in, to save that filtered page to your account. An account is only needed in order to save the filtered page, not to access it, so your Selenium driver does not require a login. The script generates a text file containing IDs of species to be used in the API requests.
-
pull-api.py: This script takes the text file above and requests from the API the data of all the assessments of each species contained in the text file. It does so using threading as there is a significant limit to the number of consecutive requests you can make to the API, and it requires an authorization token that must be activated in https://api.iucnredlist.org/api-docs/index.html. The output is a .json file for each thread containing multiple JSON objects that contain the data obtained from the API, after some selection and reformatting as per the function formatted_json. These files should be combined into one to be used in the script below.
-
json-test.py: This script should be used to check if all the species in the ID text file are contained in the .json file generated by the script above, as some of the threads might be terminated due to errors and not complete their jobs. It will generate a new text file containing all the IDs not found in the .json file, to continue the process of scraping.
-
-
clear_assessments.py: This script converts assessments.json into multiple CSV files, which are used in the dashboard scripts.
-
chi_test_per_country_proportion_vulnerable_species - The script in R contains the analyses presented during phase 5 of the project. This analysis is divided into two parts: the first part is a chi-square test to assess any statistically significant differences in the proportions of vulnerable species among the countries that are trade partners with China, both before and after China's accession to the WTO. The second part involves plotting these proportions to provide a visual representation of the differences.
-
dashboard subdirectory:
-
app.py: This script is the entrypoint for the dashboard and contains all the callbacks for the webpage created by the Dash library.
-
graphing.py: This script contains non-callback functions that create or update the graphs in the dashboard.
-
data_manipulation.py: This script contains dataframe filtering, file reading and other auxiliary functions.
-
To run the dashboard, run app.py. Make sure your data folder is properly set-up (check data_manipulation.py)