Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what are best practices to designate multiple (living) animals observed at a place and time? #131

Open
naknomum opened this issue Mar 12, 2019 · 24 comments
Labels
answered term - Organism Pertaining to a term organized in the Organism class.

Comments

@naknomum
Copy link

example: herd of 5 zebras

critical here is that each animal has a known identity. would this be considered a single Occurrence, but represented by five Occurrence entries (sharing a common OccurrenceID) ... and would individualCount = 5 for each of these Occurrences?

it seems like this would introduce a great deal of redundancy (e.g. geo data, habitat, date/time, etc); but maybe that is just the cost of this level of detail.

or: does associatedOccurrences play into this, and each zebra should have their own Occurrence (with 5 unique OccurrenceIDs).

bonus question: does OrganismID refer to a specific instance of the animal? i.e. would each zebra above have its own OrganismID which could be referenced across multiple Occurrences over its lifetime?

thanks!

@Jegelewicz
Copy link

bonus question: does OrganismID refer to a specific instance of the animal? i.e. would each zebra above have its own OrganismID which could be referenced across multiple Occurrences over its lifetime?

YES. We are currently wrangling with this related to Mexican wolves. These wolves have individual IDs - The Mexican Wolf Studbook number. This is currently an Other ID, with a url that will find all of the cataloged instances that include the ID.

image

If you click the Mexican Wolf Studbook Number, you get all instances of that wolf:

image

What we would like to see is that Studbook Number BE the organism ID. We have a meeting tomorrow here at MSB, but I was going to propose this in an issue later this week.....

@Jegelewicz
Copy link

Jegelewicz commented Mar 12, 2019

example: herd of 5 zebras

critical here is that each animal has a known identity. would this be considered a single Occurrence, but represented by five Occurrence entries (sharing a common OccurrenceID) ... and would individualCount = 5 for each of these Occurrences?

it seems like this would introduce a great deal of redundancy (e.g. geo data, habitat, date/time, etc); but maybe that is just the cost of this level of detail.

or: does associatedOccurrences play into this, and each zebra should have their own Occurrence (with 5 unique OccurrenceIDs).

My take on this is that you would have 5 occurrences (each with a count of 1 and each should also include the organism ID - see above). They would be linked by a single collecting event, but they are 5 individual occurrences and each should be recorded separately.

IF these animals did not have individual IDs, my answer would change....

@naknomum
Copy link
Author

thanks for the great insights, teresa. i am working on refactoring the GBIF support on our project Wildbook, so am trying to wrap my head around the best way to map to darwin core. wildbook centers around computer-vision to detect and identify animals in images. thus we have a lot of nesting and clustering to our data -- for example 10 photos might be taken of the same 50 zebras on a single stop during a survey. i had not considered connecting this via a collection event, but that makes sense. definitely will check out your mexican wolf project!

@MattBlissett
Copy link
Member

GBIF will recognize the organismID. When it's present, the occurrence page will add a link to a search for other occurrences with the same organismID. Example near the bottom of https://www.gbif.org/occurrence/1494129928

That search is restricted to the same dataset, but it doesn't have to be. I think that's currently done because we have a lot of organismIDs like 10 that would otherwise cause likely-incorrect matches. (Or even 1216 like the wolf above, which matches 15 plants). Some form of id that's unlikely to be used for other occurrences is best, such as UUIDs or identifiers with namespaces.

@baskaufs
Copy link

The intention of creating the Organism class was primarily to allow for re-sampling of the same biological organism over time. It allowed for multiple biological organisms to be included in a single organism instance primarily to handle cases where it wasn't possible to know whether what was being observed was a single biological organism (i.e. a coral head or clump of moss), or where it was convenient to track a taxonomically homoogeneous multi-organism entity over time. The example of wolf packs or herds was given because those were taxonomically homogeneous entities where there was a precedent for assigning them identifiers and tracking them over time.

I don't think the intention was as way to simplify record-keeping when multiple biological organisms were observed at once and were distinguishable. Whether to track many similar occurrences separately is a practical matter. In theory, a radio-tracked flying bird could have one occurrence (or more) recorded per second, but it probably wouldn't make sense to report all of those occurrences to GBIF. I would say that what is done in practice depends on the data creator and aggregator.

If you can distinguish between individual biological organisms, I'd assign them separate organsimIDs and track them separately. I'm assuming that you wouldn't both distinguishing among them unless there were some benefit for maintaining separate records for them. Whether you report every occurrence of every organism is a practical matter that would depend on what you want to do and what kind of information GBIF or other aggregators want to receive.

@Jegelewicz
Copy link

That search is restricted to the same dataset, but it doesn't have to be. I think that's currently done because we have a lot of organismIDs like 10 that would otherwise cause likely-incorrect matches. (Or even 1216 like the wolf above, which matches 15 plants). Some form of id that's unlikely to be used for other occurrences is best, such as UUIDs or identifiers with namespaces.

Where is the community discussion on minting globally unique biological organism identifiers? I am in desperate need of this...

@dagendresen
Copy link

dagendresen commented Mar 13, 2019

@albenson-usgs
Copy link

@Jegelewicz I use the R package UUID to create globally unique identifiers.

@Jegelewicz
Copy link

Jegelewicz commented Mar 13, 2019

So the stable URIs are fine and dandy (we use them at Arctos), except when the skin of an organism resides at one institution and the skeleton at another. Who's "GUID" wins? But more importantly, how does anyone know they are related?

And then, what happens when an object moves from one institution to another?

Sorry - I've dragged this thread off it's original topic. I'll create a new issue soon.

@tucotuco
Copy link
Member

tucotuco commented Mar 13, 2019 via email

@Jegelewicz
Copy link

Jegelewicz commented Mar 13, 2019

What about a service layer that sits above currently published data that allows one register organisms, mint compliant IDs to them, and associate published records to them?

Yes, please! Wanna start a business?

@tucotuco
Copy link
Member

Have one just for that sort of thing. Just need someone(s) to right the check(s).

@peterdesmet
Copy link
Member

Pitching in to say that a group of us have started to write guidelines for how to express biologging data in Darwin Core at https://github.com/tdwg/dwc-for-biologging/wiki Those data rely on organismID to string occurrences of the tracked animal together. In the GBIF page referenced by @MattBlissett I’ve used the metal ring code of the animal, which is the closest to a unique identifier we have (across datasets), but our guidelines currently suggest something in the form of “urn:catalog:otn:Dalhousie:NSBS:Brandy” to make it globally unique. That is in the absence of a global registry of course.

@naknomum
Copy link
Author

personally, dont mind at all that this thread has turned (a bit) to the problem of persistent identifiers for individual organisms. i have not (yet!) discovered the community discussing this, but maybe we are close right here, ha.

i have been wrestling with this problem for several years now (luckily back-burner; but it needs a solution eventually). i started to brainstorm how one might allow for registering (and, perhaps more importantly resolving overlap) animal ids, based roughly on revision control using github, and even got so bold to register a domain for it a few years ago! ha. https://ioreg.id/

i wonder: where should we take this discussion next?

@Jegelewicz
Copy link

WOW! @naknomum good thinking! Take it to the iDigBio conference at Yale? SPNHC in Chicago? Or work with @tucotuco and let's get this registry started!

Sometimes all it takes is for the tool to exist when it is something people really need.

I was actually wondering how your data would easily link up with the data in a museum collection if one of your study zebras wound up as a specimen....

@naknomum
Copy link
Author

thanks for the enthusiasm, @Jegelewicz ... i actually went to the very first idigbio (ann arbor); am considering the yale one. didnt go for this concept, but just my work with wildbook in general.

hypothetically, if a zebra ended up as a specimen, the id could follow the zebra. my personal proposal (very much work-in-progress, if i havent made that clear, heh) is that the id registry would be agnostic to how matches were made, and would mostly be a way to reference (outside) documentation establishing identity. this would necessarily allow for many "merges" (at least thats what we refer to them internally -- splits and merges). that is to say, if your (hypothetical) db was referencing zebra (internal ID) Z-123 and mine had a zebra called M-890, lets say we both registered these zebras independently as two different IDs via ioreg.id ... later, if one of us would discover we were talking about the same zebra, we could make a note of this and propagate it to ioreg.id -- at this point, the other group could (should) be notified and adjust their (external) id accordingly -- two ids merge to one. my intention was to use git (and github specifically) to do this: (a) for its ability to track revisions to large data structures; (b) it would effectively be a free home to store this (relatively slow-changing) info. thats the elevator pitch, if not too confusing. maybe i need to update my technical document someday? haha https://github.com/IOReg/root

@naknomum
Copy link
Author

incidentally, on the topic of conferences, i am currently at the citizen science conference, and there is no doubt some audience for this here -- hoping to find them. i will be at the data & metadata working group, which always is interesting, and inevitably brings up gbif.

@baskaufs
Copy link

You might be interested in Baskauf, S and CO Webb (2016) Darwin-SW: Darwin Core-based terms for expressing biodiversity data as RDF. Semantic Web Journal 7:629-243.
http://dx.doi.org/10.3233/SW-150203 (open access at http://bit.ly/2dG85b5). Darwin-SW was specifically designed to handle the kind of complicated situations involving multiple specimens derived from the same organism, "duplicates", repeated observations, etc. However, it presupposes using Linked Data, which hasn't really gotten much traction yet in our community. Darwin-SW uses Darwin Core terms, but has no official status with TDWG - it's just something Cam Webb and I dreamed up after some extensive discussion on the tdwg-content email list about ten years ago. It's been successfully implemented using real data and there are about a million RDF triples at https://sparql.vanderbilt.edu/ where one can run queries on the data. See also the blog posts http://baskauf.blogspot.com/2016/11/guid-o-matic-meets-dwc-rdf-octopus.html and http://baskauf.blogspot.com/2016/11/fixing-octopus.html which talks specifically about connecting derived resources. We can talk more off-list if this interests you.

@naknomum
Copy link
Author

wow, fantastic info @baskaufs -- thanks for the links. i had no idea this original question would yield such interesting leads....

@naknomum
Copy link
Author

naknomum commented Mar 14, 2019

fwiw, i have updated the README on my IOReg repo. i definitely need to do my homework and read a lot of the suggested links on this thread, so i can rethink some of my ideas there.

@debpaul
Copy link
Contributor

debpaul commented Mar 14, 2019

Hello all, hm. I do see that the original thread from @naknomum seemed to be about tracking living organisms (am I right)? Then @Jegelewicz added the Wolf example (which I think are now museum specimens). Is this correct? Thanks @baskaufs for explaining OrganismID purpose and intended use. And yes to all - we still need a way to do coordinated-streamlined specimen identification in support of linked-data. Do you see the ID needs for the living specimens as a separate issue from the museum specimens? Parallel? Identical? Different?

@Jegelewicz
Copy link

Do you see the ID needs for the living specimens as a separate issue from the museum specimens? Parallel? Identical? Different?

Definitely NOT separate, perhaps not identical, so maybe parallel?

In some cases, zoos sort of do this with studbooks, although they aren't GUIDs, they are pretty stable, zoo people know what studbook numbers refer to, and studbooks are managed over the entire course of any recovery program. The data isn't so public and possibly gets lost at the end of a program? We need a zoo person in this conversation....

@naknomum
Copy link
Author

i happen to be in raleigh for the citsci conference i had previously mentioned. having seen her in person, i can vouch that stumpy kept on being stumpy after she died and was moved there. 😃

but these are great questions. and what of an organism which is divided up to multiple exhibits?

@baskaufs
Copy link

From 2009 to 2011 there was a somewhat punishing tdwg-content thread that hashed over a number of the issues that have come up again here in this issue. At the time, there was a complaint that such email threads were counterproductive because they were forgotten and never summarized. So I summarized it for posterity. Here's that summary: https://code.google.com/archive/p/darwin-sw/wikis/TdwgContentEmailSummary.wiki

It is not for the faint of heart, but it includes discussions about whether members of a proposed Organism class (which did not yet exist at that time and was referred to as the Individual class in the discussion) could be dead, what was the scope of an Organism/Individual, how are they related to Occurrrences, what are they for, etc. , etc.

There were several outcomes from that discussion. One was the chartering of an RDF Task Group, which eventually produced the DwC RDF Guide. Another was changes to the definitions of the classes in Darwin Core, clarifying them and deprecating the old DwC Type terms which somewhat duplicated the class terms. (You can see the changes in the definition of Occurrence by checking out http://rs-test.tdwg.org/dwc/terms/version/Occurrence-2014-10-23 and follow the "Replaces" links.) Another outcome was the development of Darwin-SW, which was something that Cam Webb and I just decided to do to see if it could be done.

That discussion shaped a significant part of what is currently Darwin Core, so anyone who is crazy enough can read through the whole thing.

@tucotuco tucotuco added answered term - Organism Pertaining to a term organized in the Organism class. labels Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered term - Organism Pertaining to a term organized in the Organism class.
Projects
None yet
Development

No branches or pull requests

9 participants