Maui Nui thoughts #3

sunray1 · 2025-02-20T22:00:44Z

I've been chatting about the state of the current vocabularies with some folks I'm working with out in Maui Nui and getting a sense of why they need to retract data and when - here are some thoughts from them:

Generally, it needs to be possible to redact any field (CE: I think this is possible already, but just a confirmation)
Coordinates can be fuzzed per usual, but also location may need to be redacted
Often landholders don't want folks to know what is on their land or even that their land has been surveyed
Habitat type often also needs to be redacted (i.e., it's pretty easy to work out where the one bog on Kauai is)
Redacting common names as well as scientific names (CE: I don't remember if there's a term for common names in DwC)
Collector names often need to be redacted - they've had concerns with being targeted by some groups (think cat colonies)
The use of redacting based on dates is useful (i.e., a data embargo on tracking live organisms - think introduced species)

And some questions:

Is it possible to reduce the resolution of the reason? i.e., not publish subcategories but just the categories?
Is it possible to fuzz extension data or just core? You'd be able to figure out what a species is based on DNA or something.
Is there a difference between extreme/high or high/moderate and how you treat the data? Or is that just a division for internal use?

ben-norton · 2025-02-20T23:26:44Z

@sunray1
A common traditional workflow for redacting information for sensitive species in a DWC dataset is as follows. Note. This is done prior to publication by the publishing institution.

A set of sensitive species is identified by scientific name and stored as a list.
The sensitive species list is checked against a DWC compliant dataset prior to publication.
DWC records with scientific names that match an item on the sensitive species list are flagged.
For flagged records, information in a specific set of fields is set to null.
A notification is added to the informationWitheld field that states something similar to the following: "location information for this record was been removed to protect a sensitive species. Please contact xxx for location information".
The dataset is then published via an IPT.

Which fields are set to null is up to the individual institution. Null is not a requirement. It's just a common practice. For coordinates, fuzzy generalization is a common course of action. Ultimately, it's up to the provider.

For datasets that I've published, we would delete decimalLatitude, decimalLongitude, county, locality, localityRemarks, verbatimLocality, and, sometimes, stateProvince for sensitive species.

Determing which species are deemed "sensitive" is trickier than it may seem. Many default to IUCN, but that often omits local sensitivity. Again, this is ultimately up to the provider. For this reason (and a few others) global automated redaction has proven problematic.

Ita also worth noting that redaction has different implications for different types of datasets. Camera trap data without coordinates is often not suitable for Ecological modelling, which is its primary use. Coordinates are fuzzy instead of removed for this reason. Also, images (and the associated metadata) of park rangers are never published. It can be a matter of life or death for those individuals.

TaniaGLaity · 2025-02-20T23:27:22Z

HI Chandra
Thanks very much for passing this info onto us!

Multiple attributes in a record will be able to be obfuscated / withheld using the proposed model. I think that answers a few of the questions / use cases above.

DWC uses vernacularName for common name

regarding the questions:

for various reasons we don't want to be able to allow users to select category by default as we may get fewer more detailed reasons this way. A work around for this would be to add an extra subcategory e.g. species regarded as sensitive because of threat but taxa not individually assessed (ie a catch-all subcategory for the category). This has been brought up by other task group members - will discuss at the March meeting.
I believe so - another one for discussion but don't think it should be an issue - in theory we could apply this to any attribute in DWC and extensions
Generally yes in Australia - High would be Extreme would be withheld, High would be obfuscated to 1 decimal place / ~10km and medium would be obfuscated to 2 decimal places / ~1km

TaniaGLaity · 2025-02-20T23:35:48Z

@sunray1 A common traditional workflow for redacting information for sensitive species in a DWC dataset is as follows. Note. This is done prior to publication by the publishing institution.

A set of sensitive species is identified by scientific name and stored as a list.

The sensitive species list is checked against a DWC compliant dataset prior to publication.

DWC records with scientific names that match an item on the sensitive species list are flagged.

For flagged records, information in a specific set of fields is set to null.

A notification is added to the informationWitheld field that states something similar to the following: "location information for this record was been removed to protect a sensitive species. Please contact xxx for location information".

The dataset is then published via an IPT.

Which fields are set to null is up to the individual institution. Null is not a requirement. It's just a common practice. For coordinates, fuzzy generalization is a common course of action. Ultimately, it's up to the provider.

For datasets that I've published, we would delete decimalLatitude, decimalLongitude, county, locality, localityRemarks, verbatimLocality, and, sometimes, stateProvince for sensitive species.

Determing which species are deemed "sensitive" is trickier than it may seem. Many default to IUCN, but that often omits local sensitivity. Again, this is ultimately up to the provider. For this reason (and a few others) global automated redaction has proven problematic.

Ita also worth noting that redaction has different implications for different types of datasets. Camera trap data without coordinates is often not suitable for Ecological modelling, which is its primary use. Coordinates are fuzzy instead of removed for this reason. Also, images (and the associated metadata) of park rangers are never published. It can be a matter of life or death for those individuals.

Thanks Ben.
for context, we made a call out for case studies for testing our draft sensitivity treatments and reasons for our next Task Group meeting. thanks for outlining the traditional workflow - that's useful for the Task Group to understand!

sunray1 · 2025-02-21T00:12:26Z

HI Chandra
Thanks very much for passing this info onto us!

Multiple attributes in a record will be able to be obfuscated / withheld using the proposed model. I think that answers a few of the questions / use cases above.

That's what I thought, perfect! Definitely allows for all of the use cases I think

DWC uses vernacularName for common name

regarding the questions:

for various reasons we don't want to be able to allow users to select category by default as we may get fewer more detailed reasons this way. A work around for this would be to add an extra subcategory e.g. species regarded as sensitive because of threat but taxa not individually assessed (ie a catch-all subcategory for the category). This has been brought up by other task group members - will discuss at the March meeting.

I think the concern here was more about the fact that adding a more specific reason might unintentionally draw more attention to those records vs not having enough information to distinguish a subcategory.

I believe so - another one for discussion but don't think it should be an issue - in theory we could apply this to any attribute in DWC and extensions

Perfect

Generally yes in Australia - High would be Extreme would be withheld, High would be obfuscated to 1 decimal place / ~10km and medium would be obfuscated to 2 decimal places / ~1km

Interesting, did not know that! Perhaps this could be noted somewhere in the docs.

ArthurChapman · 2025-02-21T02:56:10Z

Of course just using a taxonomic name alone for a sensitive species is problematic. It must also include a region of sensitivity. Some taxa may be highly sensitive in one area (e.g. a species of Hakea in Australia) but may be a invasive species in another (e.g. South Africa). In the area where it is invasive, one would need to know precise localities, etc.

TaniaGLaity · 2025-02-21T03:47:52Z

@ArthurChapman agreed. I'm guessing when we get to the discussion about how we represent what changes have been made to the data - we have to indicate the record is sensitive according to XX list. That's kind of how we do it in the ALA. but a standard set of words or examples would be a good thing to have I think

tucotuco · 2025-02-21T04:02:38Z

I haven't heard mention of dwc:dataGeneralizations in the conversation so far, but it is relevant.

TaniaGLaity · 2025-02-21T04:20:55Z

I haven't heard mention of dwc:dataGeneralizations in the conversation so far, but it is relevant.

Definitely agree - we haven't got to the actual implementation yet. We're just trying to start to define the vocabularies first and test that they meet most scenarios using case studies. We might need to enlist your help once we get to that John!

tucotuco · 2025-02-21T04:40:22Z

Ready and willing to be helpful when needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maui Nui thoughts #3

Maui Nui thoughts #3

sunray1 commented Feb 20, 2025

ben-norton commented Feb 20, 2025 •

edited

Loading

TaniaGLaity commented Feb 20, 2025

TaniaGLaity commented Feb 20, 2025

sunray1 commented Feb 21, 2025

ArthurChapman commented Feb 21, 2025

TaniaGLaity commented Feb 21, 2025

tucotuco commented Feb 21, 2025

TaniaGLaity commented Feb 21, 2025

tucotuco commented Feb 21, 2025

Maui Nui thoughts #3

Maui Nui thoughts #3

Comments

sunray1 commented Feb 20, 2025

ben-norton commented Feb 20, 2025 • edited Loading

TaniaGLaity commented Feb 20, 2025

TaniaGLaity commented Feb 20, 2025

sunray1 commented Feb 21, 2025

ArthurChapman commented Feb 21, 2025

TaniaGLaity commented Feb 21, 2025

tucotuco commented Feb 21, 2025

TaniaGLaity commented Feb 21, 2025

tucotuco commented Feb 21, 2025

ben-norton commented Feb 20, 2025 •

edited

Loading