Home

Notes on design:

Function names:

camelCase (insert debate here but camelCase and period.separator appear popluar: http://stackoverflow.com/questions/1944910/what-is-your-preferred-style-for-naming-variables-in-r)
Prefix API end points with "idig", eg "idigSearch". This will mean there is a 1-1 correspondence between the API and the R methods.
Add in methods that are language-specific by tacking on stuff to the end of the endpoint eg "idigSearchGetAll"

Query Objects:

Let's do objects but let's make people generate valid idigbio query JSON for now. That way everyone's code will be making query objects and if we want to help making query object easier later, we're not going to have to re-spec the idigSearch method to take a query object instead of a list/JSON. EG:

query <= idigQuery(json='{"family": ["asteraceae","fagaceae"]}')
query2 <= idigQuery(list=list(family=list("asteraceae","fagaceae")))
results <= idigSearch(query)

It's more clunky than I'd like but it keeps things tied directly to the API, is simple for us to implement (just call jsonlite::toJSON on list) and if we want to add convenience methods for building queries, no existing user code will need to change. An alternative would be to skip the object for now and go with named params on the idigSearch method which would also require no user code changes in the future:

results <= idigSearch(rq='json stuff')
# later let people write this
# results <= idigSearch(query=idigQuery(stuff))

I wrote the below before changing my mind: I think I want to do a query object. Nested lists are just going to be trouble. The object can have a "fromJson" method that just takes JSON text for those who are chaining APIs or who want to just write JSON. Otherwise, probably a 1:1 matching of https://github.com/iDigBio/idigbio-search-api/blob/master/app/lib/query-shim.js is best. And update the iDigBio Query format documentation to include the query shim methods so that there is 1:1:1 correspondence between a query format snipped, a query shim method, and an R query object method.

Results returned:

Rows named "1", "2", etc
Place UUID in row as a column, always present, user can't turn off
Allow users to specify column names and pull dwc:country from data list and county from indexTerms list. Why? I imagine at some point indexTerms will be cleaned data and data will be original. This will let people choose. Also, don't have to modify the API returns now with fancy logic to drop some but not all namespace prefixes in the data list. More work on my side to concat indexTerms and data terms though. I assume indexed terms will continue to be un-namespaced
Alter API to take parameter "fields" which is just a flat list of terms either from indexTerms list or data list and return only those fields.
No "fields" parameter means return some default set, proposed: occur ID, ins code, collection code, catalog number, genus, species, scientific name, date collected, lat, long -> Enough for most people and skips the verbatium and text fields which are chunky. (Only slightly afraid of people thinking this means that this is all iDigBio has in it...)
Support fields=all to return everything known.

Tests:

Yes. Look @ Francois's stuff for JSON definitions between Python and R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally