Do we need directly-attached stable part identifiers? #3630

Jegelewicz · 2021-06-02T16:13:07Z

Picking a specific part out of the pile is a lot of work. Once that happens, you can

give it a unique identifier so you don't ever have to deal with any ambiguity again, or
don't, and dig it out again the next time you need it

Barcodes, real or otherwise, serve nicely as unique identifiers. In this case, giving everything a barcode would mean you can add the attributes later, gives you a super-easy way to eventually add to the loan, and probably provides a pathway to whatever you mean by "loan subsamples in the same virtual container."

(I've been wondering if we need directly-attached stable part IDs for a while, and maybe we do - new Issue - but they're not available NOW and barcodes are.)

Originally posted by @dustymc in #3627 (comment)

dustymc · 2021-06-02T16:24:40Z

Thanks @Jegelewicz

Yay: every part could have a way of being uniquely identified, I could avoid some semi-expensive joins, having multiple parts in a "base" container wouldn't necessarily mean they can't still be individually identified.

Maybe not-so-yay: Barcodes are used for lots of things in addition to part IDs, this could be confusing when those are separated (or unstable - so useful only at limited scale - if that's somehow synchronized/maintained).

The "workaround" is containers which exist only for the purposes of serving as part identifiers (for which I'd recommend a dedicated container type). That's nice because it fits into all existing workflows and requires no development, but it also requires some setup (getting the parts into the containers). Perhaps some of that could somehow be automated.

campmlc · 2021-06-02T16:44:25Z

So we could auto assign all parts to a virtual container with an autogenerated stable identifier?

…

On Wed, Jun 2, 2021, 10:24 AM dustymc ***@***.***> wrote: * [EXTERNAL]* Thanks @Jegelewicz <https://github.com/Jegelewicz> Yay: every part could have a way of being uniquely identified, I could avoid some semi-expensive joins, having multiple parts in a "base" container wouldn't necessarily mean they can't still be individually identified. Maybe not-so-yay: Barcodes are used for lots of things in addition to part IDs, this could be confusing when those are separated (or unstable - so useful only at limited scale - if that's somehow synchronized/maintained). The "workaround" is containers which exist only for the purposes of serving as part identifiers (for which I'd recommend a dedicated container type). That's nice because it fits into all existing workflows and requires no development, but it also requires some setup (getting the parts into the containers). Perhaps some of that could somehow be automated. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBDZPRSQZHYFCD55EQLTQZLNRANCNFSM457B2RAQ> .

dustymc · 2021-06-02T16:52:12Z

So we could auto assign all parts to a virtual container

Yes.

autogenerated

No, but maybe we could fake it by semi-automating containering uncontainerized parts in your collection(s) or something.

stable

That's up to you, but nothing in Arctos would change your containers (same as any other containers).

Jegelewicz · 2021-06-02T17:10:57Z

autogenerated
No, but maybe we could fake it by semi-automating containering uncontainerized parts in your collection(s) or something.

stable
That's up to you, but nothing in Arctos would change your containers (same as any other containers).

I think we should lean toward autogeneration and stability. Isn't this a path to the ultimate "material sample" that GGBN seeks?

dustymc · 2021-06-02T17:33:14Z

autogeneration and stability.

That's probably an argument for something more specialized than containers. I think my concerns all center around usability, as above. Nothing good documentation can't bridge....

If we're going there, some sort of resolvable ID - URLs, ARKs, some short Arctos alternate URL that we could buy (and which I'd use for things like JSON), or whatever - would be cool.

material sample

The "DWC community" seems to remain at least partially convinced that institution_acronym + collection_cde can do something (it can't) so I'm not really holding my breath, but there is a materialSampleID (https://dwc.tdwg.org/terms/#materialSampleID) with a sane definition in the "core" (extension?? IDK, and IDK how to know!).

GGBN

GGBN apparently has their own thing (https://terms.tdwg.org/wiki/GGBN_Material_Sample_Vocabulary), it does NOT carry an ID (that I can find).

In either case, I believe there's at least the presumption of dependence - I don't think it could ever be "correct" to show DWC:MaterialSample data without also showing DWC:Occurrence data (but I'm not DWCologist, maybe I'm not understanding something).

Arctos has no such inherent limitations, and it's common (at least in entomology) to just "cite" whatever's scribbled on the tube/pin/part no matter what else has been specified or agreed upon. This could be an opportunity for us to make "whatever's scribbled on the tube" something that browsers can use to get to the catalog record (or a subset of it). That comes back to the usability question - are CM's going to be able to use barcodes up to some point and then switch to "part IDs," or can we find a way to sync those so they don't have to (and what's that do for the possibility of buying pre-printed containers if so), or ???????????????????

campmlc · 2021-06-02T17:50:13Z

We have an incoming collection that wants to assign guids and separate part identifiers in the field at time of collection. They want to know if they can use their tissue identifiers for part barcodes. It would be ideal if we could somehow incorporate this, giving a stable material sample ID at collection, associated with a guide/url organism ID and occurrence ID . . . Right now the closet thing we have for this is barcodes, and they mostly work. But they are not / cannot be universally applied due to cost and resources. If Arctos could provide a list of stable part identifiers that could be downloaded and made into labels in advance and applied in the field, and linked to an organism ID, maybe we could bypass NK numbers and externally supplied barcodes?

…

On Wed, Jun 2, 2021, 11:33 AM dustymc ***@***.***> wrote: * [EXTERNAL]* autogeneration and stability. That's probably an argument for something more specialized than containers. I think my concerns all center around usability, as above. Nothing good documentation can't bridge.... If we're going there, some sort of resolvable ID - URLs, ARKs, some short Arctos alternate URL that we could buy (and which I'd use for things like JSON), or whatever - would be cool. material sample The "DWC community" seems to remain at least partially convinced that institution_acronym + collection_cde can do something (it can't) so I'm not really holding my breath, but there is a materialSampleID ( https://dwc.tdwg.org/terms/#materialSampleID) with a sane definition in the "core" (extension?? IDK, and IDK how to know!). GGBN GGBN apparently has their own thing ( https://terms.tdwg.org/wiki/GGBN_Material_Sample_Vocabulary), it does NOT carry an ID (that I can find). In either case, I believe there's at least the presumption of dependence - I don't think it could ever be "correct" to show DWC:MaterialSample data without also showing DWC:Occurrence data (but I'm not DWCologist, maybe I'm not understanding something). Arctos has no such inherent limitations, and it's common (at least in entomology) to just "cite" whatever's scribbled on the tube/pin/part no matter what else has been specified or agreed upon. This could be an opportunity for us to make "whatever's scribbled on the tube" something that browsers can use to get to the catalog record (or a subset of it). That comes back to the usability question - are CM's going to be able to use barcodes up to some point and then switch to "part IDs," or can we find a way to sync those so they don't have to (and what's that do for the possibility of buying pre-printed containers if so), or ??????????????????? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBFB7YSEQGHKFD4R3RTTQZTOTANCNFSM457B2RAQ> .

dustymc · 2021-06-02T18:01:29Z

assign guids and separate part identifiers in the field at time of collection.

I haven't seen that work yet, but as long as they can keep their numbers straight it's not a problem for Arctos.

use their tissue identifiers for part barcodes

Sure, that's always been possible/recommended, and it (along with good procedures) might be an actual way to "keep their numbers straight."

they are not / cannot be universally applied due to cost and resources

You've lost me and that seems to conflict with above - explain, please.

If Arctos could provide a list of stable part identifiers

I'm still missing a big piece of the puzzle, but ARKs might be an easy way to get those.

If you'll settle for a bit less stable, grab a series of barcodes of whatever format you want and do WHATEVER with them - get them printed, print them yourself, attempt to transcribe them, ....

Jegelewicz · 2021-06-25T16:43:10Z

OK, this is probably crazy-talk because it occurred to me in the middle of night, but Arctos is cataloging occurrences (identification at a place and time) NOT collections (I have this thing from this occurrence). @dustymc is always hammering home that we are not cataloging the "item of interest" and this is absolutely true. Almost every prospective institution asks about catalog numbers like 12345.1 so that they can track the various parts associated with some thing (usually a plant or animal, but other stuff too with catalog number 12345) that they are managing. Because we are so focused on the event, the parts are secondary in the system.

The problem comes from the fact that for the majority of our collection managers, the parts are really the focus, but we don't number/track them well. We have now created the "part identifier" attribute to get around this, but it only creates more work for collections.

Barcodes are great - but they apply to containers, not parts and I think we need to keep that distinction.

I think we need to look at MaterialSampleID:

In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique.

Would it be possible to construct such an ID with the object url + Arctos part number? Or should parts be assigned a "GUID" equal to the Catalog record "GUID" + part number? (Perhaps we need both, one for humans, the other for machines)

If Arctos could do this for us in a way that makes it easy for us, that would be GREAT. I think the thing we need to figure out is what this "part number" should be. While the part number assigned in the parts code table is nice, it isn't known until the part is entered. How can we best accomplish this?

dustymc · 2021-06-25T17:32:30Z

Because we are so focused on the event

That's mostly "just UI" - Arctos is truly normalized, seeing it as a part management system (with catalog records as metadata) is a valid viewpoint. (So is seeing it as an event system, if you want to go there.)

but they apply to containers, not parts

Again, parts are 100% containers. The current level of container that can have an exposed identifier isn't in a 1:1 relationship with parts so I'm not suggesting what we have fully does what we should be doing, but there is always and inevitably a container that is in the correct relationship with parts, and it might serve this purpose (depending on what precisely that turns out to be).

We have now created the "part identifier" attribute

The origins of that are a case study in how to not do science. Strongly suggest just avoiding that situation in exploring how to move forward.

one for humans, the other for machines

I don't think there's anything exactly wrong with that, but it will inevitably get used in the wrong context so I'd rather avoid it.

what this "part number" should be.

We need to figure out what it DOES before we think about what it looks like. Eg deleting catalog records (==destroying GUIDs) is fairly difficult (it would be impossible if I had my way) because those are "citable" - minting them comes with some implicit (it would be explicit in my little fantasy world) promise that they'll be suitable for certain purposes, and that demands certain behavior from the creators. "Minting" UUIDs (or internal keys, etc.) is an act of convenience - once they've served whatever purpose they've been created to serve they can be deleted and nobody cares. I think the first question is, which of those situations is more analogous to what should be done here?

If that answer turns out to be what I think, the second question involves our ability to live without (or with limited access to) 'delete part' buttons.

KyndallH · 2021-06-25T21:25:15Z

@Jegelewicz Not crazy talk!

The problem comes from the fact that for the majority of our collection managers, the parts are really the focus, but we don't number/track them well.

YES!!!!!

what this "part number" should be.

We need to figure out what it DOES before we think about what it looks like.

It helps keep track of how parts are used! I want to see parts tied to the outside identifiers. Liver part --> Loan--> Project --> Publication --> Genank etc. It wasn't the skull or postcranial or the kidney that lead to all that extra data about that occurrence.

Jegelewicz · 2021-06-25T21:32:38Z

I think we are both pretty sure what the answer is and given @campmlc comment

tie the GenBank number to the actual tissue part that was sampled - which means cataloging MaterialSamples

I think she does too. This also ties in with the Mexican Wolf scenarios and having events tied to the parts they came from.

In my mind right now the answer is that we are cataloging the wrong way. A basic catalog record only requires an identification and a locality but NO PART. How does that make sense when we are managing PARTS? It should be the other way around - I should be able to catalog a part with absolutely no other information because the most important thing in that moment is that I can find the part and match it up with all of the other information. OK, before anyone jumps on me, I realize that I can put unknown everywhere (even for part name) but it feels wrong. Not saying we can't train people to do it though.

Anyway, I think our problems mostly stem from putting too many parts in a single catalog record. If a part is important enough to have an associated GenBank sequence, maybe it needs it's own catalog number. Because all the parts from an event can share an event, we should not be afraid to do this. And yes, it will require a new pricing strategy....

As @dustymc says - catalog the item of interest and apparently that is not Andalgalomys pearsoni dorbignyi but one of these -

And by the way, which of these ended up as these?

In case anyone is interested - this ties in with tdwg/dwc#314 (comment)

Jegelewicz · 2021-06-25T21:34:22Z

FWIW - our new entity module could help here...all the "organism" type attributes could go there and would not need to be re-created in every catalog record.

KyndallH · 2021-06-25T21:37:23Z

@Jegelewicz Another example! In a paper, the Arctos interns found two UAM no data bison cited. Yeah, they had no data but data has now been generated about those parts. Unfortunately, they were not cataloged and we don't know which is which. The part has the data but also continues to generate MORE data.

think our problems mostly stem from putting too many parts in a single catalog record

Why is this a problem?

part is important enough to have an associated GenBank sequence, maybe it needs it's own catalog number. Because all the parts from an event can share an event, we should not be afraid to do this. And yes, it will require a new pricing strategy

That is going to go over like a lead balloon. GGBN has a very similar model where all parts are separated out.

dustymc · 2021-06-25T22:08:36Z

I want to see parts tied to the outside identifiers.

I think there are two components of that:

Something curatorial - we can't let students randomly delete parts, and also treat them as more than metadata of catalog records. I think that's about 99% social issue, but good implementations would certainly have a technical/UI/data aspect.
The identifier itself. I'm sure we can come up with something and find a place for it fairly easily. ARKs might be nice as values but they're not the only possibility. I think this is "details" - (1) is where I expect to find difficulties, but maybe nobody would complain about setting disposition rather than deleting.

A basic catalog record only requires an identification and a locality but NO PART. How does that make sense when we are managing PARTS?

That's but one use case. Catalog records are and always have been "whatever someone felt like cataloging." I don't see any realistic possibility of that changing, and I don't see much reason to attempt to change it. There are usability implications to cataloging a bucket of guppies or each of the 47 slices of liver, but sometimes reality (or tradition) ends up in strange places anyway. Mostly I'm just not sure why you'd want to juggle more data than you have to - this just doesn't make any sense to me.

If a part is important enough to have an associated GenBank sequence, maybe it needs it's own catalog number.

Catalog numbers are special only because Curators have decided to treat them that way. A part identifier (assuming some decent design and curatorial commitment and all that jazz) can do ~everything a catalog number can do (and some other stuff), just issue them and change your loan agreements.

catalog the item of interest and apparently that is not Andalgalomys pearsoni dorbignyi

I think that's almost always the biological individual (where that's easy to define, anyway), and I don't think this one's any different - the focus is population-level stuff, the individual is representative (everyone hopes!) of that, the sample is just a way to get at characteristics of the individual. If nothing else, it's a lot easier to see that 27 methods all fail to reject that critter being a member of Andalgalomys pearsoni when those data are attached to a single data object.

our new entity module

....doesn't make any sense as a replacement for catalog records; it just doesn't have the structure to stand like that. It's not too late to ditch the thing and just let some new value in https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type define entities. (#1966 (comment))

Jegelewicz · 2021-06-25T22:20:51Z

That is going to go over like a lead balloon.

You've heard of these guys, right? https://en.wikipedia.org/wiki/Led_Zeppelin

Jegelewicz · 2021-06-25T22:24:53Z

Well I guess my answer to the question is YES

dustymc · 2021-06-25T23:26:26Z

heard of these guys

Some Plant guy, right? Must be botanists....

YES

I think I want to revise what I said earlier - there are two social issues:

Are Curators really willing to direct users to cite some part identifier; will anyone actually use this?
How do we add stability?

It looks like we can't get enough ARKs after all, so maybe instead of preemptively giving all parts an ID it's some sort of on-demand mechanism, and getting the IDs prevents deletion. That might also set up a path to various kinds of IDs, which could either

help mitigate (1) (eg by getting IDs that contain the catalog number) or
add a great deal of confusion (by introducing an infinite number of formats and moving control to whoever creates them)

Jegelewicz · 2021-06-25T23:44:27Z

Are Curators really willing to direct users to cite some part identifier; will anyone actually use this?

This is exactly why I said what I said. As long as a bunch of parts are floating around with a common "catalog number" we will never get stuff lined up (part used to create Genbank ID). Maybe nobody cares, but if two researchers borrow parts from the same "catalog number" and get different identifications, I would think it would start to matter.

Jegelewicz · 2021-06-25T23:46:00Z

help mitigate (1) (eg by getting IDs that contain the catalog number) or
add a great deal of confusion (by introducing an infinite number of formats and moving control to whoever creates them)

I mean, there is always the old standby:

MSB:Mamm:5000.1 for the first "request" and so on....but will it get cited correctly?

dustymc · 2021-06-26T00:16:03Z

if two researchers borrow parts from the same "catalog number" and get different identifications, I would think it would start to matter

There's plenty of that now, it's just tracked internally - given semi-sane loan policies/procedures I don't think it's a huge deal, but there's definitely room to improve.

create Genbank ID

#1257 - that might be this plus decent procedures.

MSB:Mamm:5000.1

"MSB:Mamm:5000" is the first problem - if we're starting with nonresolvable identifiers then the rest of this can only go so far; we're mostly not going to change anything of note.

If you want unambiguous citable parts today,

Get some decent IDs, point them to the catalog record
Stuff them into https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#part_identifier
Change your loan agreement

and tomorrow we can worry about a more refined approach to (2) (then using them to add parts to loans &etc.).

campmlc · 2021-06-27T18:38:01Z

We already link loans to parts and part subsamples. I think Arctos does a good job of this, especially with the tools we have for tracking parent/child relationships. And Arctos assigns part identifiers, see below. And barcodes work great as internal identifiers, although not as external ones. I suggest we leave aside the broader issue of citations by external users, because that is a can of worms that will not be resolved any time soon. We could make these part identifiers permanent, in the same way we make locality and event nicknames permanent, on an as needed basis. So when I issue a loan, I would click to subsample a parent part, and click to assign a part "nickname" or whatever "part ID" to both the parent (if unbarcoded) and the child (if unbarcoded). I would like to avoid having to go reserve a bunch of virtual part barcodes and deal with all the work and tracking that involves - which is significant. Arctos already gives a part ID - just permanentify it as a virtual "barcode nickname"?

dustymc · 2021-06-28T13:29:38Z

Arctos assigns part identifiers, see below

And they're useful for some purposes - I probably won't change them without some notice, but someone else with access to your collection might, for example - is that enough to do whatever you want to do? (They might be good enough to bulkload loan items, just don't expect them to be there in a week.)

barcodes work great as internal identifiers, although not as external ones.

That's up to you - there are plenty of values of barcodes which can do all kinds of things. They're just identifiers.

just permanentify it as a virtual "barcode nickname"?

That's not how place names work - there's a user-supplied (with the option to autogenerate) value which permanentify-s.

ANYWAY - an on-demand dedicated ID is about where I ended up as well (#3630 (comment)), but I can't see any reason not to add the extra (small, I think) bit of effort to make it citable if there's an actual use case.

The use case part of this would make a useful AWG topic, lest we build something that nobody will use.

The "someone gave this an ID so now you can't delete it" part is a necessary AWG topic.

dustymc · 2021-06-28T18:05:12Z

I should be able to catalog a part

I think this (and similar) deserves some further exploration. This is probably another discussion, but I'm not sure enough of that to move it. Minimally it's relevant to this discussion.

Short version: https://handbook.arctosdb.org/documentation/catalog.html#understanding-cataloged-items, you can catalog whatever you want. But...

Much of that's reactionary (we have to accommodate whatever's been cataloged for whatever reason), and much of the discussion centers around our (Gordon's, maybe) "catalog the item of interest" mantra which presumes there is a "THE item," or one correct answer. Maybe there's not.

Entities ultimately come back to choosing what to catalog, and that is again at least partially based on the idea that cataloging should somehow be limited to one correct THING. Existing (and discussed) Entities are all just things that might get cataloged in other circumstances; we're just making the data less accessible by introducing a new arbitrarily-used data object. Cataloged items can do everything Entities can do, and #3630 (comment) suggests they can't do some stuff that might be desirable.

I can't quite wrap my hear around how cataloging a part could be Research Grade, but it probably happens and it's ultimately the same situation as Entities, just in a different direction - some arbitrary thing gets a catalog number because of some reason that may or may not make much sense in various contexts.

#1966 (comment) (almost certainly including more, or more refined, values in https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type) unifies all of that, resulting in one kind of data object discoverable in one place through one set of authorities. Researchers don't have to guess at what we thought the item of scientific interest might be (or what was traditional in the discipline at the time the item was cataloged, or any other arbitrary thing), and "we" don't have to guess at what those researchers will want in order to choose what to catalog - we can just catalog what we have, or what someone asks for.

In the most basic use case, a wolf sampled twice at two different Events will result in two cataloged items.

If that wolf is known to be a member of a pack, an additional cataloged item (representing the pack) can be created and linked. (As always, a lack of this would indicate a lack of resources or knowledge, not an assertion that there is no pack/colony/hive/population/"super-individual").

If there's a reason to do so (and the resources to act are available), another cataloged item representing the wolf as a DWC:Organism could be created and linked. (This is basically the core of what we've done with Entities.)

If there's some reason to catalog a sample of one of the original items, it's just more cataloged items.

Etc. It's likely that something like all of those situations exists, so I'm simply suggesting we build on that rather than adding another way of doing the same thing.

Doing more (UI styling, perhaps more search options, DWC mapping) with cataloged_item_type is probably necessary, but we should probably be doing that now - I think that's more "improvement" than "change."

Adding more metadata to Other IDs (which are also relationships) would be necessary for some situations - eg an individual wolf might be a member of multiple packs at various times - but that's also an improvement that's come up a few times.

Any tools (eg to create an Organism from multiple Occurrences) would be broadly applicable rather than limited to one way of doing things; I think this unified approach would ultimately result in a more usable system.

I don't think any of that provides compelling reason not to add "citable" part identifiers, but perhaps it provides a citable alternative (eg just catalog the part) that allows this Issue to focus on more practical shorter-term usage (such as adding items to loans).

Jegelewicz · 2022-05-03T18:04:49Z

If anyone is currently requiring even GUIDs in the places where this could be used, I'm not aware of it

We just discussed this today in planning the parasite webinar.

There is a part in a mammal record = ectoparasite

which is a vial of mixed ectos.

These get split into a bunch of actual parasite records (fleas, ticks, etc). One way to relate the individual catalog records to the original part as well as to each other is to use a stable part ID for the part in the mammal record as the lot number for all of the parasite records.

Jegelewicz · 2022-05-03T18:10:50Z

I could also see using this in the meteorite collection (see #4638):

I have a catalog record with a meteorite part. Somebody prepares a thin section from it.

subsample it in the existing record.
give the subsample a stable part ID
Create the new record for the subsample and add the stable part ID of the subsample in the original record as an other identifier in the new record with relationship "same individual as"

dustymc · 2022-05-03T18:15:29Z

@DerekSikes has a defensible approach for this: Send out the vial of random junk along with clear instructions, catalog whatever gets sorted out, and cite that. That completely avoids needing to care what might have been in the jar of bug-like bits; "something from some jar of random junk" never gets published, the fact that it exists (or used to) is entirely an internal issue.

If you're cataloging parts then you have no need for this - it's just redundant with the record's GUID. (That's its own flavor of mess, but not relevant to this.)

campmlc · 2022-05-03T18:23:13Z

We need a way to keep track of the parent-child relationships of containers prior to cataloging - a lot ID, ideally part ID url, that will link all derivative parts/containers and allow tracking and discovery for the cataloging process. When a lot of ectoparasites get split into ticks, mites, lice, and fleas and sent on multiple loans to different researchers who return items over many years, we need something to link all these derivatives back to the original vial and host that is not subject to the transcription error that occurs with something like an NK. A part ID url would do that.

…

On Tue, May 3, 2022 at 12:15 PM dustymc ***@***.***> wrote: * [EXTERNAL]* @DerekSikes <https://github.com/DerekSikes> has a defensible approach for this: Send out the vial of random junk along with clear instructions, catalog whatever gets sorted out, and cite that. That completely avoids needing to care what might have been in the jar of bug-like bits; "something from some jar of random junk" never gets published, the fact that it exists (or used to) is entirely an internal issue. If you're cataloging parts then you have no need for this - it's just redundant with the record's GUID. (That's its own flavor of mess, but not relevant to this.) — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBBG32SVDNBX5VRICPDVIFUMXANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

DerekSikes · 2022-05-03T18:28:13Z

Or a collection event nickname? On Tue, May 3, 2022 at 10:23 AM Mariel Campbell ***@***.***> wrote:

…

We need a way to keep track of the parent-child relationships of containers prior to cataloging - a lot ID, ideally part ID url, that will link all derivative parts/containers and allow tracking and discovery for the cataloging process. When a lot of ectoparasites get split into ticks, mites, lice, and fleas and sent on multiple loans to different researchers who return items over many years, we need something to link all these derivatives back to the original vial and host that is not subject to the transcription error that occurs with something like an NK. A part ID url would do that. On Tue, May 3, 2022 at 12:15 PM dustymc ***@***.***> wrote: > * [EXTERNAL]* > > @DerekSikes <https://github.com/DerekSikes> has a defensible approach for > this: Send out the vial of random junk along with clear instructions, > catalog whatever gets sorted out, and cite that. That completely avoids > needing to care what might have been in the jar of bug-like bits; > "something from some jar of random junk" never gets published, the fact > that it exists (or used to) is entirely an internal issue. > > If you're cataloging parts then you have no need for this - it's just > redundant with the record's GUID. (That's its own flavor of mess, but not > relevant to this.) > > — > Reply to this email directly, view it on GitHub > <#3630 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ADQ7JBBG32SVDNBX5VRICPDVIFUMXANCNFSM457B2RAQ > > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACFNUM4AMTQAGSPBWZ5CVETVIFVJXANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- +++++++++++++++++++++++++++++++++++ *Derek S. Sikes*, Curator of Insects, Professor of Entomology University of Alaska Museum (UAM), University of Alaska Fairbanks 1962 Yukon Drive, Fairbanks, AK 99775-6960 ***@***.*** phone: 907-474-6278 he/him/his University of Alaska Museum <https://www.uaf.edu/museum/collections/ento/> - search 357,704 digitized arthropod records <http://arctos.database.museum/uam_ento> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us

dustymc · 2022-05-03T18:31:03Z

prior to cataloging

Arctos cannot help you there.

who return items over many years

You can (and should!) catalog those things before they get cited; "returned" is not related to that in any way I can see.

campmlc · 2022-05-03T18:57:14Z

This should not be a controversial request. I'm happy to discuss workflows of what goes on in actual museums, vs in theory. This tool will help solve multiple real world problems, and it was previously in proposed as a solution to some of them. The reason it makes sense here is because IDs can be minted as needed going forward, not retroactively applied to legacy problem records. Yes, quality control checks are necessary prior to assigning them, and yes, that should be integrated into the workflow. If no permanent ID url is provided, we'll just have to use barcodes as IDs, which will not work as well when we have the opportunity to use a URL.

…

On Tue, May 3, 2022, 12:31 PM dustymc ***@***.***> wrote: * [EXTERNAL]* prior to cataloging Arctos cannot help you there. who return items over many years You can (and should!) catalog those things before they get cited; "returned" is not related to that in any way I can see. — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBEAX54ZOWZ5EFU6LYDVIFWHFANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

campmlc · 2022-05-03T19:02:29Z

In this case, it is the parts that need to be linked via a parent child relationship across cataloged items. We can use a parent part ID from the source part or container (part ID for "ectoparasite" vial in mammal record) as the lot ID for all individual ectos taken out of that vial and placed in other vials and cataloged as parts in other catalog records.

…

On Tue, May 3, 2022, 12:28 PM DerekSikes ***@***.***> wrote: * [EXTERNAL]* Or a collection event nickname? On Tue, May 3, 2022 at 10:23 AM Mariel Campbell ***@***.***> wrote: > We need a way to keep track of the parent-child relationships of containers > prior to cataloging - a lot ID, ideally part ID url, that will link all > derivative parts/containers and allow tracking and discovery for the > cataloging process. When a lot of ectoparasites get split into ticks, > mites, lice, and fleas and sent on multiple loans to different researchers > who return items over many years, we need something to link all these > derivatives back to the original vial and host that is not subject to the > transcription error that occurs with something like an NK. A part ID url > would do that. > > On Tue, May 3, 2022 at 12:15 PM dustymc ***@***.***> wrote: > > > * [EXTERNAL]* > > > > @DerekSikes <https://github.com/DerekSikes> has a defensible approach > for > > this: Send out the vial of random junk along with clear instructions, > > catalog whatever gets sorted out, and cite that. That completely avoids > > needing to care what might have been in the jar of bug-like bits; > > "something from some jar of random junk" never gets published, the fact > > that it exists (or used to) is entirely an internal issue. > > > > If you're cataloging parts then you have no need for this - it's just > > redundant with the record's GUID. (That's its own flavor of mess, but not > > relevant to this.) > > > > — > > Reply to this email directly, view it on GitHub > > < #3630 (comment) > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/ADQ7JBBG32SVDNBX5VRICPDVIFUMXANCNFSM457B2RAQ > > > > . > > You are receiving this because you were mentioned.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > <#3630 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ACFNUM4AMTQAGSPBWZ5CVETVIFVJXANCNFSM457B2RAQ > > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- +++++++++++++++++++++++++++++++++++ *Derek S. Sikes*, Curator of Insects, Professor of Entomology University of Alaska Museum (UAM), University of Alaska Fairbanks 1962 Yukon Drive, Fairbanks, AK 99775-6960 ***@***.*** phone: 907-474-6278 he/him/his University of Alaska Museum <https://www.uaf.edu/museum/collections/ento/> - search 357,704 digitized arthropod records <http://arctos.database.museum/uam_ento> +++++++++++++++++++++++++++++++++++ Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBB2T3WEP255KQ7TBETVIFV4PANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dustymc · 2022-05-03T19:41:23Z

parts that need to be linked via a parent child relationship across cataloged items

This does not do that.

Anyway, at some level I suppose I don't have to understand how this might get used or even misused, I just need some sort of commitment that it will get used and some indication that whoever uses it understands the inherent implications (laid out above) of a truly persistent identifier at this level. FYI I read #3630 (comment) as a refusal to accept that commitment; please clarify if I'm not interpreting something correctly. (And, also as above, http://arctos-test.tacc.utexas.edu/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#part_identifier provides a mechanism to introduce an identifier without the commitments this Issue would require.)

Jegelewicz · 2022-05-03T20:25:52Z

a refusal to accept that commitment

On a global scale yes, but on a part-by-part basis most of that is acceptable.

BUT - why couldn't you encumber a record with a stable part ID? How would that be different from encumbering any record in Arctos right now? It isn't any different than the resolvable (with a password) identifiers from the Zoo community, the record is there, you just don't have appropriate permissions....

Jegelewicz · 2022-05-03T20:27:45Z

http://arctos-test.tacc.utexas.edu/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#part_identifier provides a mechanism to introduce an identifier without the commitments this Issue would require.)

Sure, but that is not actionable? I mean I add identifiers to parts a lot, but they don't give me the ability to link that part to anything.

Jegelewicz · 2022-05-03T20:30:58Z

Also a wise person said this in the comments above:

We have now created the "part identifier" attribute
The origins of that are a case study in how to not do science. Strongly suggest just avoiding that situation in exploring how to move forward.

dustymc · 2022-05-03T20:44:29Z

On a global scale

I've never proposed that?!?

encumbering any record

I don't think emulating our past mistakes is a great model.

with a password

"It might be there but we're not telling you" doesn't seem worthy of investment.

that is not actionable

Depends on which identifier you use.

how to not do science

My opinions on that haven't changed! We can do something awesome here, but it will require a curatorial commitment ("pre-commitment" might be a better way to view it?), or we can use existing tools to do less-awesome stuff. (Which could still be pretty awesome, but it's not structurally constrained to awesomeness.)

don't give me the ability to link that part to anything

There's no difference, minus the "structurally constrained" bits. Grab an ARK-or-whatever, stuff it in part attributes, demand your loan recipients use it, be careful not to hide it, and you've done exactly what we're proposing here. This would just make "be careful not to hide it" something you don't need to worry about (and maybe make the "grab..." step a bit easier, but we could do that without this).

Jegelewicz · 2022-05-03T20:46:43Z

encumbering any record
I don't think emulating our past mistakes is a great model.

There are legitimate reasons to encumber information that we cannot ignore.

dustymc · 2022-05-03T20:59:47Z

legitimate reasons to encumber information

I've just proposed prohibiting mask record (and I still don't think this is worth doing without that) - other current or future types of encumbrances would not be affected, as long as they leave SOMETHING behind.

A "most everything but still there" encumbrance might even help atone for past sins, although that would of course ultimately be up to the collections.


arctosprod@arctos>> select count(*) from flat
arctos-> inner join coll_object_encumbrance on flat.collection_object_id=coll_object_encumbrance.collection_object_id
arctos-> inner join encumbrance on coll_object_encumbrance.encumbrance_id=encumbrance.encumbrance_id and encumbrance_action='mask record'
arctos-> inner join citation on flat.collection_object_id=citation.collection_object_id
arctos-> ;
 count 
-------
 37462

campmlc · 2022-05-03T21:08:23Z

Could we encumber identification, higher geog, locality, collector etc - the whole shebang- but leave the record shell with URL?

…

On Tue, May 3, 2022, 3:00 PM dustymc ***@***.***> wrote: * [EXTERNAL]* legitimate reasons to encumber information I've just proposed prohibiting mask record (and I still don't think this is worth doing without that) - other current or future types of encumbrances would not be affected, as long as they leave SOMETHING behind. A "most everything but still there" encumbrance might even help atone for past sins, although that would of course ultimately be up to the collections. ***@***.***>> select count(*) from flat arctos-> inner join coll_object_encumbrance on flat.collection_object_id=coll_object_encumbrance.collection_object_id arctos-> inner join encumbrance on coll_object_encumbrance.encumbrance_id=encumbrance.encumbrance_id and encumbrance_action='mask record' arctos-> inner join citation on flat.collection_object_id=citation.collection_object_id arctos-> ; count ------- 37462 — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBEHE3PJ5YN6TMQ4ERDVIGHU5ANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

campmlc · 2022-08-12T23:40:19Z

Found an example of why we don't want to assign permanent PIDs to parts without validating legacy parts first - see all the false duplicate parts associated with this record (anything without a part location path, which may not be visible to those not logged in): https://arctos.database.museum/guid/MSB:Mamm:83457
BUT - I'd like to be able to clean up this record and validate the actual parts that exist, and assign PIDs as needed, especially to parts that have been subsampled for loans. Make sense?

campmlc · 2022-08-12T23:43:17Z

Quite of few of these are real parts - and once I validate them, I'd like to be able to assign a permanent ID to confirm their validity. I'd rather not be forced to slap an actual barcode on to the vial to do this - that is the point of having the part ID. Possible?

dustymc · 2022-08-12T23:55:33Z

Technical: #3630 (comment) (very restricted, heavily documented bulkloader) still looks like the only plausible path to implementation; maybe you think I suggested something else??

Social: This is still pointless until someone commits to to demanding citations by partID - there are lots of easier paths (for all of us, in all directions) to "confirm validity."

campmlc · 2022-08-13T00:27:32Z

I guess my question is: is the current part ID stable within Arctos? If it is, I could envision gradually shifting over to using Arctos assigned PIDs as barcodes, even minting url-based PIDs. SOMEONE needs to start doing this before we can start asking users to cite them - we have to be the horse before the cart. If it is not, and the PID may randomly change for no reason . . . then that won't work.

…

On Fri, Aug 12, 2022 at 5:55 PM dustymc ***@***.***> wrote: * [EXTERNAL]* Technical: #3630 (comment) <#3630 (comment)> (very restricted, heavily documented bulkloader) still looks like the only plausible path to implementation; maybe you think I suggested something else?? Social: This is still pointless until someone commits to to demanding citations by partID - there are lots of easier paths (for all of us, in all directions) to "confirm validity." — Reply to this email directly, view it on GitHub <#3630 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBEENWHSWJ5XLQGOQI3VY3QABANCNFSM457B2RAQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Jegelewicz · 2022-08-13T13:41:53Z

@campmlc I believe it is completely up to you. You can MAKE the part ID stable. See #3630 (comment)

Until someone experiments and DOES this, we are going to keep having circular conversations.

I was going to try this in test, but I get this:

Secure Connection Failed

An error occurred during a connection to test.arctos.database.museum. Peer using unsupported version of security protocol.

Error code: SSL_ERROR_UNSUPPORTED_VERSION
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.
This website might not support the TLS 1.2 protocol, which is the minimum version supported by Firefox.

Learn more…

dustymc · 2022-08-13T17:53:00Z

is the current part ID stable within Arctos?

The "core" of that is functional in a few places, we've discussed the stability (lack thereof) of it many times.

shifting over to using Arctos assigned PIDs as barcodes

We're having this conversation because ya'll convinced me barcodes are not suitable. (And you're right - they wear out, hold lots of parts, hold nothing, hold things that aren't parts, aren't used at all for political reasons, etc.)

(Some sort of resolvable PID would still be fabulous barcodes, would seamlessly deal with the 'someone cited the barcode' scenario, let anyone get to at least where parts used to be, etc., but as they're used now they're not quite interchangeable.)

SOMEONE needs to start doing this before we can start asking users to cite them - we have to be the horse before the cart.

It's going to be a lot of work - but not much innovation - to make them do what they need to do to be stable, and there's a huge curatorial commitment involved. This is maybe closer to buying a horse and cart today (except I'm going to wave my wand and the horse won't conveniently keel over about the time the kids leave for college, your great grandkids will still need to feed it) - if you're not SURE you're going to use it then it'll just hang around and take space and consume resources and maybe make a huge mess from time to time all without really giving anything back.

f it is not

It is not, this is discussed above, maybe I could work up a summary or something if that's useful, but I think it would just end up being a bad representation of this whole thread, and this whole thread needs read, carefully, before making any decisions.

randomly change

Yes but no. I'm _probably_not going to mess with them, but you delete parts (even those that claim to be used) and 'mask record' encumber and such about every day. As is, part IDs are not suitable for citation. They are (usually) suitable for local timely things.

experiments and DOES t

There's not much to experiment with. You say "hey borrower, use this OR ELSE", let me know about that and I make sure the identifier never changes, you go on to make sure you have policies and documentation so you don't toss it out and just not bother deleting it from the DB or reuse the identifier for something else and etc., and now there is in a very real way a physical item attached to a publication. Then we all swoon because that's sciencey.

I don't think there's a less-rigorous yet still defensible approach, and maybe that's simply more than you can commit to, maybe even if the current CM, Curator, and Director think it's a fabulous idea. If that's the case then https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#part_identifier is still available, and I could mint PIDs or get ARKs or something to go there. Those identifiers would be stable (as in, I'm not going to delete them), but there'd be no technology keeping them attached to things that exist and such, you (or your successor) can break a million publication<-->part links with a handwave and I can't stop it. Maybe that's still a decent babystep - if we start seeing those things pop up in GenBank and that changes our view then they could be elevated to some more "forever" structure, if it turns out nobody's going to USE them they can be quietly "sent to the farm" without making the world (rightly) think that Arctos itself is broken.

Jegelewicz added this to the Needs Discussion milestone Jun 2, 2021

This was referenced Jun 25, 2021

Code Table Request - Genome ID #3652

Closed

Change term - MaterialSample tdwg/dwc#314

Closed

campmlc mentioned this issue Jun 27, 2021

Help needed to bulkload subsamples of subsamples with part attributes #3627

Closed

dustymc mentioned this issue May 13, 2022

copy barcodes of UAM:Ento to other id #2685

Closed

Jegelewicz added the dwc terms label Jun 4, 2022

dustymc removed the Priority-Critical (Arctos is broken) Critical because it is breaking functionality. label Jun 24, 2022

dustymc mentioned this issue Aug 9, 2022

Add Parent Part ID to Part Bulkload Tool ArctosDB/dev#38

Closed

dustymc mentioned this issue Sep 27, 2022

Code Table Request - new catalog record attribute: pathology #5101

Closed

dustymc mentioned this issue Oct 10, 2022

Feature Request - Accomodate pathogen screening documentation #5147

Closed

ArctosDB locked and limited conversation to collaborators Nov 1, 2022

dustymc converted this issue into discussion #5231 Nov 1, 2022

This issue was moved to a discussion.

Do we need directly-attached stable part identifiers? #3630

Do we need directly-attached stable part identifiers? #3630

Comments

Jegelewicz commented Jun 2, 2021

dustymc commented Jun 2, 2021

campmlc commented Jun 2, 2021 via email

dustymc commented Jun 2, 2021

Jegelewicz commented Jun 2, 2021

dustymc commented Jun 2, 2021

campmlc commented Jun 2, 2021 via email

dustymc commented Jun 2, 2021

Jegelewicz commented Jun 25, 2021

dustymc commented Jun 25, 2021

KyndallH commented Jun 25, 2021 • edited Loading

Jegelewicz commented Jun 25, 2021

Jegelewicz commented Jun 25, 2021

KyndallH commented Jun 25, 2021

dustymc commented Jun 25, 2021

Jegelewicz commented Jun 25, 2021

Jegelewicz commented Jun 25, 2021

dustymc commented Jun 25, 2021

Jegelewicz commented Jun 25, 2021

Jegelewicz commented Jun 25, 2021

dustymc commented Jun 26, 2021

campmlc commented Jun 27, 2021

dustymc commented Jun 28, 2021

dustymc commented Jun 28, 2021

Jegelewicz commented May 3, 2022

Jegelewicz commented May 3, 2022

dustymc commented May 3, 2022

campmlc commented May 3, 2022 via email

DerekSikes commented May 3, 2022 via email

dustymc commented May 3, 2022

campmlc commented May 3, 2022 via email

campmlc commented May 3, 2022 via email

dustymc commented May 3, 2022

Jegelewicz commented May 3, 2022

Jegelewicz commented May 3, 2022

Jegelewicz commented May 3, 2022

dustymc commented May 3, 2022

Jegelewicz commented May 3, 2022

dustymc commented May 3, 2022

campmlc commented May 3, 2022 via email

campmlc commented Aug 12, 2022

campmlc commented Aug 12, 2022

dustymc commented Aug 12, 2022

campmlc commented Aug 13, 2022 via email

Jegelewicz commented Aug 13, 2022

dustymc commented Aug 13, 2022

This issue was moved to a discussion.

KyndallH commented Jun 25, 2021 •

edited

Loading