Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include flags as request? #26

Closed
seltmann opened this issue Oct 31, 2015 · 4 comments
Closed

Include flags as request? #26

seltmann opened this issue Oct 31, 2015 · 4 comments

Comments

@seltmann
Copy link

Hi,
I've been working with the iDigBio recordset data, using the corrected occurrences file in comparison to the raw file to clean up our georeferencing. I now have a new use case, where I am getting loads of plant data via ridigbio for modeling. It would be very helpful to me if I could get the recordset flags associated with each record. One of the first tasks when getting data together to model is check the lat/longs are in the states and countries they should be. You guys are already doing that!

Thanks again,
Katja

@mjcollin
Copy link
Contributor

mjcollin commented Nov 2, 2015

I think the last release of ridigbio was before the flags were in the API but regardless, they're multivalued and there's code that skips that when building the returned dataframe.

What kind of data structure would you want to see? A column in the df that contained character vectors? Is the indexing syntax for that intuitive for people? For instance I decided that the dot syntax R uses is confusing but I still made two columns for "geopoint.lat" and "geopoint.lon" from the nested JSON structure because that is the delimiter we use in our API.

@seltmann
Copy link
Author

seltmann commented Nov 2, 2015

I think the dot syntax is understandable, although it is not apparent why it is not the syntax of dwc (decimalLatitude, decimalLongitude) like the other fields. I understand that it has to do with iDigBio data structure, but thats as far as the understanding goes (yet, perhaps that is far enough?). I also think that having them as separate columns is better than nested json. Although, those are very important, and commonly used fields.

For error information, I am not certain what would be best for the return. Here is what I was thinking:

It would be important to know clearly 1) if a flag exists, and 2) corrected value for that record.
It would also be important to be able to include flags in the result set or exclude them, based on passed parameters.

@mjcollin
Copy link
Contributor

It looks like I already fixed up the returned data.frame to support multivalued fields. If I understand your use case, you want the indexed lat/lon, whether there is a flag for a fix to the lat/lon, and then the original lat/lon.

In discussing this we Alex, he said that the flag rev_geocode_mismatch would tell you whether we decided there was a problem with the geocoding. Reverse geocoding matching the given country is the criteria for deciding that there is a problem. So this would give you what you are looking for:

df <- idig_search_records(rq=list("genus"="acer", "flags"="rev_geocode_mismatch"), fields=c("uuid", "flags", "geopoint", "data.dwc:decimalLongitude", "data.dwc:decimalLatitude"), limit=10)

You can then look at flags with some syntax like df$flags[[1]][[1]]. The flags field contains a list of character vectors.

Also, beware that we are working on iDigBio/idigbio-search-api#13

Please let me know if this doesn't meet your needs.

@mjcollin
Copy link
Contributor

Alex has pushed a new either boolean or meta flag to indicate that the georeference has been "fixed" by us that is more intuitive than "rev_geocode_mismatch". It will be in the beta API for a few days and then in production maybe by Thanksgiving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants