================================================================================= Virtual Herbarium - a simple webscraper to collect information on plants from online databases
Created by Dr. Matteo Jucker Riva, BFH-HAFL Bern University of Applied Sciences (Switzerland) [email protected]
information about plants, especially wild species, are scattered across multiple sources and websites, providing unstructured information in different ways and templates. The aim of this project is to collect information about plants by scraping different databases and web sources and combine it in a structured way.
VirtHerb currently scrapes:
| CABI Invasive species compendium https://www.cabi.org/ISC |
| World Flora Online http://www.worldfloraonline.org/ |
| JSTOR Global Plants https://plants.jstor.org/ |
Workflow of VirtHerb: 1- Scrape the web using a plant scientific name (provided by user) 2- Structure text in sections, optionally together with user defined notes and pictures 3- Ouput a simple 1 or 2 page pdf with essential information to be downloaded or shared
11.2020 CURRENTLY UNDER DEVELOPMENT
Planned imporvements: With the integration of BERT or Distilled BERT Machine Learning model for Natural langugaer processing, we aim to integrate the text belonging to each section of the final output with an automated model. COME BACK SOON to SEE THE LATEST CHANGES!