-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what are best practices to designate multiple (living) animals observed at a place and time? #131
Comments
YES. We are currently wrangling with this related to Mexican wolves. These wolves have individual IDs - The Mexican Wolf Studbook number. This is currently an Other ID, with a url that will find all of the cataloged instances that include the ID. If you click the Mexican Wolf Studbook Number, you get all instances of that wolf: What we would like to see is that Studbook Number BE the organism ID. We have a meeting tomorrow here at MSB, but I was going to propose this in an issue later this week..... |
My take on this is that you would have 5 occurrences (each with a count of 1 and each should also include the organism ID - see above). They would be linked by a single collecting event, but they are 5 individual occurrences and each should be recorded separately. IF these animals did not have individual IDs, my answer would change.... |
thanks for the great insights, teresa. i am working on refactoring the GBIF support on our project Wildbook, so am trying to wrap my head around the best way to map to darwin core. wildbook centers around computer-vision to detect and identify animals in images. thus we have a lot of nesting and clustering to our data -- for example 10 photos might be taken of the same 50 zebras on a single stop during a survey. i had not considered connecting this via a collection event, but that makes sense. definitely will check out your mexican wolf project! |
GBIF will recognize the That search is restricted to the same dataset, but it doesn't have to be. I think that's currently done because we have a lot of organismIDs like |
The intention of creating the Organism class was primarily to allow for re-sampling of the same biological organism over time. It allowed for multiple biological organisms to be included in a single organism instance primarily to handle cases where it wasn't possible to know whether what was being observed was a single biological organism (i.e. a coral head or clump of moss), or where it was convenient to track a taxonomically homoogeneous multi-organism entity over time. The example of wolf packs or herds was given because those were taxonomically homogeneous entities where there was a precedent for assigning them identifiers and tracking them over time. I don't think the intention was as way to simplify record-keeping when multiple biological organisms were observed at once and were distinguishable. Whether to track many similar occurrences separately is a practical matter. In theory, a radio-tracked flying bird could have one occurrence (or more) recorded per second, but it probably wouldn't make sense to report all of those occurrences to GBIF. I would say that what is done in practice depends on the data creator and aggregator. If you can distinguish between individual biological organisms, I'd assign them separate organsimIDs and track them separately. I'm assuming that you wouldn't both distinguishing among them unless there were some benefit for maintaining separate records for them. Whether you report every occurrence of every organism is a practical matter that would depend on what you want to do and what kind of information GBIF or other aggregators want to receive. |
Where is the community discussion on minting globally unique biological organism identifiers? I am in desperate need of this... |
One interesting discussion on minting persisting identifiers is happening here: |
@Jegelewicz I use the R package UUID to create globally unique identifiers. |
So the stable URIs are fine and dandy (we use them at Arctos), except when the skin of an organism resides at one institution and the skeleton at another. Who's "GUID" wins? But more importantly, how does anyone know they are related? And then, what happens when an object moves from one institution to another? Sorry - I've dragged this thread off it's original topic. I'll create a new issue soon. |
Such an interesting discussion, getting at the fundamentals of the Organism
concept, identifiers, relationships, and how to deal with them at a global
level.
Maybe one blessing is that we don't really use organismIDs (in the sense of
persistent resolvable globally unique identifiers) as a community yet, so
that if we get on with incredibly useful solutions soon, we can maybe avoid
huge problems later. What about a service layer that sits above currently
published data that allows one register organisms, mint compliant IDs to
them, and associate published records to them?
…On Wed, Mar 13, 2019 at 5:23 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
So the stable URIs are fine and dandy (we use them at Arctos), except when
the skin of an organism resides at one institution and the skeleton at
another. Who's "GUID" wins? But more importantly, how does anyone know they
are related?
And then, what happens when an object moves from one institution to
another?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#131 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAcP69GyPTpn3JQYf4lv8r04ktqz2EIGks5vWV5ZgaJpZM4br9z1>
.
|
Yes, please! Wanna start a business? |
Have one just for that sort of thing. Just need someone(s) to right the check(s). |
Pitching in to say that a group of us have started to write guidelines for how to express biologging data in Darwin Core at https://github.com/tdwg/dwc-for-biologging/wiki Those data rely on organismID to string occurrences of the tracked animal together. In the GBIF page referenced by @MattBlissett I’ve used the metal ring code of the animal, which is the closest to a unique identifier we have (across datasets), but our guidelines currently suggest something in the form of “urn:catalog:otn:Dalhousie:NSBS:Brandy” to make it globally unique. That is in the absence of a global registry of course. |
personally, dont mind at all that this thread has turned (a bit) to the problem of persistent identifiers for individual organisms. i have not (yet!) discovered the community discussing this, but maybe we are close right here, ha. i have been wrestling with this problem for several years now (luckily back-burner; but it needs a solution eventually). i started to brainstorm how one might allow for registering (and, perhaps more importantly resolving overlap) animal ids, based roughly on revision control using github, and even got so bold to register a domain for it a few years ago! ha. https://ioreg.id/ i wonder: where should we take this discussion next? |
WOW! @naknomum good thinking! Take it to the iDigBio conference at Yale? SPNHC in Chicago? Or work with @tucotuco and let's get this registry started! Sometimes all it takes is for the tool to exist when it is something people really need. I was actually wondering how your data would easily link up with the data in a museum collection if one of your study zebras wound up as a specimen.... |
thanks for the enthusiasm, @Jegelewicz ... i actually went to the very first idigbio (ann arbor); am considering the yale one. didnt go for this concept, but just my work with wildbook in general. hypothetically, if a zebra ended up as a specimen, the id could follow the zebra. my personal proposal (very much work-in-progress, if i havent made that clear, heh) is that the id registry would be agnostic to how matches were made, and would mostly be a way to reference (outside) documentation establishing identity. this would necessarily allow for many "merges" (at least thats what we refer to them internally -- splits and merges). that is to say, if your (hypothetical) db was referencing zebra (internal ID) Z-123 and mine had a zebra called M-890, lets say we both registered these zebras independently as two different IDs via ioreg.id ... later, if one of us would discover we were talking about the same zebra, we could make a note of this and propagate it to ioreg.id -- at this point, the other group could (should) be notified and adjust their (external) id accordingly -- two ids merge to one. my intention was to use git (and github specifically) to do this: (a) for its ability to track revisions to large data structures; (b) it would effectively be a free home to store this (relatively slow-changing) info. thats the elevator pitch, if not too confusing. maybe i need to update my technical document someday? haha https://github.com/IOReg/root |
incidentally, on the topic of conferences, i am currently at the citizen science conference, and there is no doubt some audience for this here -- hoping to find them. i will be at the data & metadata working group, which always is interesting, and inevitably brings up gbif. |
You might be interested in Baskauf, S and CO Webb (2016) Darwin-SW: Darwin Core-based terms for expressing biodiversity data as RDF. Semantic Web Journal 7:629-243. |
wow, fantastic info @baskaufs -- thanks for the links. i had no idea this original question would yield such interesting leads.... |
fwiw, i have updated the README on my IOReg repo. i definitely need to do my homework and read a lot of the suggested links on this thread, so i can rethink some of my ideas there. |
Hello all, hm. I do see that the original thread from @naknomum seemed to be about tracking living organisms (am I right)? Then @Jegelewicz added the Wolf example (which I think are now museum specimens). Is this correct? Thanks @baskaufs for explaining OrganismID purpose and intended use. And yes to all - we still need a way to do coordinated-streamlined specimen identification in support of linked-data. Do you see the ID needs for the living specimens as a separate issue from the museum specimens? Parallel? Identical? Different? |
Definitely NOT separate, perhaps not identical, so maybe parallel? In some cases, zoos sort of do this with studbooks, although they aren't GUIDs, they are pretty stable, zoo people know what studbook numbers refer to, and studbooks are managed over the entire course of any recovery program. The data isn't so public and possibly gets lost at the end of a program? We need a zoo person in this conversation.... |
i happen to be in raleigh for the citsci conference i had previously mentioned. having seen her in person, i can vouch that stumpy kept on being stumpy after she died and was moved there. 😃 but these are great questions. and what of an organism which is divided up to multiple exhibits? |
From 2009 to 2011 there was a somewhat punishing tdwg-content thread that hashed over a number of the issues that have come up again here in this issue. At the time, there was a complaint that such email threads were counterproductive because they were forgotten and never summarized. So I summarized it for posterity. Here's that summary: https://code.google.com/archive/p/darwin-sw/wikis/TdwgContentEmailSummary.wiki It is not for the faint of heart, but it includes discussions about whether members of a proposed Organism class (which did not yet exist at that time and was referred to as the Individual class in the discussion) could be dead, what was the scope of an Organism/Individual, how are they related to Occurrrences, what are they for, etc. , etc. There were several outcomes from that discussion. One was the chartering of an RDF Task Group, which eventually produced the DwC RDF Guide. Another was changes to the definitions of the classes in Darwin Core, clarifying them and deprecating the old DwC Type terms which somewhat duplicated the class terms. (You can see the changes in the definition of Occurrence by checking out http://rs-test.tdwg.org/dwc/terms/version/Occurrence-2014-10-23 and follow the "Replaces" links.) Another outcome was the development of Darwin-SW, which was something that Cam Webb and I just decided to do to see if it could be done. That discussion shaped a significant part of what is currently Darwin Core, so anyone who is crazy enough can read through the whole thing. |
example: herd of 5 zebras
critical here is that each animal has a known identity. would this be considered a single Occurrence, but represented by five Occurrence entries (sharing a common
OccurrenceID
) ... and wouldindividualCount
= 5 for each of these Occurrences?it seems like this would introduce a great deal of redundancy (e.g. geo data, habitat, date/time, etc); but maybe that is just the cost of this level of detail.
or: does
associatedOccurrences
play into this, and each zebra should have their own Occurrence (with 5 unique OccurrenceIDs).bonus question: does
OrganismID
refer to a specific instance of the animal? i.e. would each zebra above have its own OrganismID which could be referenced across multiple Occurrences over its lifetime?thanks!
The text was updated successfully, but these errors were encountered: