Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification as data frame #4

Closed
tomjwebb opened this issue Jun 13, 2017 · 6 comments
Closed

Classification as data frame #4

tomjwebb opened this issue Jun 13, 2017 · 6 comments
Milestone

Comments

@tomjwebb
Copy link

tomjwebb commented Jun 13, 2017

Hi @sckott - I've been using this package and found I wanted the output of wm_classification() to run over a list of species and end up as a data frame - the output of this is somewhat different from what you get from wm_record(), in particular the classification function returns 'non-standard' taxonomic groups (superfamily etc.) which I happen to need for this application. Anyway, I've written a couple of functions, which I thought I'd upload here in case you felt there is any more general use for them? The first just turns the output from wm_classification() into a data frame (and attempts to do something sensible with errors), the second runs this over a list of species and then binds all the resulting data frames together into a single tbl_df of classifications for the whole species list:

get_sp_classif <- function(sp){
	
	# try to get WoRMS aphia ID from name
	aphia <- try(wm_name2id(sp), silent = TRUE)
	# check if this worked (catches unrecognised names and instances where AphiaID of -999 is returned)
	if(identical(class(aphia), "try-error") | aphia < 0){
		# try using genus
		aphia <- try(wm_name2id(stringr::word(sp, 1)), silent = TRUE)	
		if(identical(class(aphia), "try-error") | aphia < 0){	
			# return NULL if no aphia ID was found
			classif_df <- data.frame(sciname = sp)
			aphia <- NA
		}
	}
	if(!is.na(aphia)){
		# if aphia ID was found, get full classification
		classif <- wm_classification(aphia)
		# convert into data frame
		classif_df <- read.csv(text = "",
			col.names = c("sciname", "AphiaID", classif$rank),
			colClasses = c("character", "numeric", rep("character", length(classif$rank))), stringsAsFactors = FALSE)
		if("Species" %in% classif$rank){
			classif_df[1,] <- cbind(sp, classif$AphiaID[classif$rank == "Species"], t(classif$scientificname))
		} else {
			classif_df[1,] <- cbind(sp, NA, t(classif$scientificname))
		}
	}

	classif_df
		
	}
sp_list_classif <- function(sp_list){
	
	# run the classification function over the whole list
	classifs <- sapply(sp_list, function(sp_list){get_sp_classif(sp = sp_list)})
	
	# return as dataframe
	classifs <- dplyr::bind_rows(classifs)
	
	dplyr::tbl_df(classifs)		
}
@sckott
Copy link
Contributor

sckott commented Jun 13, 2017

thanks for the issue @tomjwebb 😸

edited your code above just a bit so it can run without errors (namespace calls to dplyr fxns)

Might make sense to add a function like this where you can input >1 taxonomic names and get classifications in a data.frame

Note that this is in taxize

library(taxize)
xx <- get_wormsid(c('Platanista gangetica', 'Leucophaeus scoresbii'))
dplyr::tbl_df(cbind(classification(xx)))
# A tibble: 2 x 29
   kingdom   phylum  subphylum    superclass superclass.1    class subclass           order     suborder infraorder
     <chr>    <chr>      <chr>         <chr>        <chr>    <chr>    <chr>           <chr>        <chr>      <chr>
1 Animalia Chordata Vertebrata Gnathostomata    Tetrapoda Mammalia   Theria Cetartiodactyla Cetancodonta    Cetacea
2 Animalia Chordata Vertebrata Gnathostomata    Tetrapoda     Aves     <NA> Charadriiformes         <NA>       <NA>
# ... with 19 more variables: superfamily <chr>, family <chr>, genus <chr>, species <chr>, kingdom_id <chr>,
#   phylum_id <chr>, subphylum_id <chr>, superclass_id <chr>, superclass_id.1 <chr>, class_id <chr>, subclass_id <chr>,
#   order_id <chr>, suborder_id <chr>, infraorder_id <chr>, superfamily_id <chr>, family_id <chr>, genus_id <chr>,
#   species_id <chr>, query <chr>

even though that's in taxize, maybe there is still reason to include similar functionality here

thoughts?

@tomjwebb
Copy link
Author

Ah thanks @sckott - I should have checked taxize first! I should probably switch to using that, have just found worrms convenient. I also added a bit more to the first function (edited above), which now tries to get the classification of a genus if it can't find an AphiaID for the species (this is useful for what I'm doing at the moment).

Anyway if you envisage others using worrms standalone then I think this functionality is useful - both returning the classification as a data frame, and being able to run it over a list of species. But if you want to point people to taxize instead that works too. Feel free to close this issue anyway!

@sckott
Copy link
Contributor

sckott commented Jun 14, 2017

thinking about this

sckott added a commit that referenced this issue Jun 14, 2017
add some helper fxns to zzz.r and add underscore versions to children and classification with egs
@sckott
Copy link
Contributor

sckott commented Jun 14, 2017

@tomjwebb okay, reinstall like devtools::install_github("ropensci/worrms@changes")

and look at docs for wm_children and wm_classification - new fxns for those two (just to demo the concepts, then pkg wide later maybe) - and egs added

thoughts?

i don't want to break current functionality of fxns in pkg, so this makes it so that new functionality will be easy to find as they are on the same man pages as their sister fxn

@sckott
Copy link
Contributor

sckott commented Aug 24, 2017

@tomjwebb thoughts?

@sckott
Copy link
Contributor

sckott commented Aug 24, 2017

done, we can reopen or open new issue to discuss anything further related to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants