Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update osmSpecies.txt #676

Open
wants to merge 1 commit into
base: modified
Choose a base branch
from

Conversation

davidpnewton
Copy link

@davidpnewton davidpnewton commented Oct 11, 2024

Alphabetise, modernise taxonomy and remove duplicates

Alphabetise, modernise taxonomy and increase number of entries
@Helium314
Copy link
Owner

osmSpecies is data coming from OSM via taginfo. I think it's misleading to add e.g. Acer × freemanii Autumn Blaze to osmSpecies when on OSM it's actually not used at all (according to taginfo).
You can update is with more recent data, but it should reflect current tagging (also when removing duplicates).

@davidpnewton
Copy link
Author

There is still some room for improvement here. I only included the Autumn Blaze cultivar because it was already in the file. Personally speaking I really don't like putting cultivars into the species tag. Nothing wrong with putting them into taxon, or indeed cultivar itself as a key (although that way of doing things seems somewhat unusual).

When it comes to remove duplicates a lot of it is simply moving to have only correct names in there and not synonyms. There are instances where synonyms are extremely popular tags and yet are utterly wrong. For example the correct species name of Platanus × hispanica has 17,334 instances, whereas the combined total of all the incorrect variants stands at somewhere around 87,000 with two of those incorrect variants having more entries than 17,334. Still doesn't make them any more correct as entries. It simply means there's a VERY big job to fix the tagging needed!

Very similar story with Tilia × europaea, except more dramatic. The correctly tagged name has only 1,267 instances and the vast majority of entries are tagged Tilia x europaea (59,057). Again it simply means a big job to fix the tagging rather than the popular tagging being correct.

There were also quite a few instances of species names having their botanist abbreviation tacked on the end. For example Prunus dulcis D.A.Webb. Although technically correct in taxonomic circles, in normal usage it's extremely rare to see the botanist abbreviation used, hence getting rid of duplicates with the abbreviations at the end and removing the abbreviations at the end even if it were not a duplicate.

The single weirdest one in there was Ciconia ciconia, That's the Latin binomial for the White Stork!!!! Quite how the Latin name for a bird ended up in a tree data file is anyone's guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants