Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: keyword parsing #12164

Merged
merged 1 commit into from
Aug 30, 2024
Merged

fix: keyword parsing #12164

merged 1 commit into from
Aug 30, 2024

Conversation

jrasm91
Copy link
Contributor

@jrasm91 jrasm91 commented Aug 30, 2024

Found a case where keywords returned a list that included a number (year).

@jrasm91 jrasm91 enabled auto-merge (squash) August 30, 2024 21:28
@jrasm91 jrasm91 merged commit d18bc70 into main Aug 30, 2024
24 checks passed
@jrasm91 jrasm91 deleted the fix/keyword-parsing branch August 30, 2024 21:33
@mattsteg
Copy link

mattsteg commented Sep 3, 2024

Thank you for the fix I was just about to submit a bug report.

It appears that the fix doesn't support hierarchical keywords properly. I have a tag hierarchy with a number (year) as the 2nd tier and the hierarchy was not imported.

@jrasm91
Copy link
Contributor Author

jrasm91 commented Sep 3, 2024

Thank you for the fix I was just about to submit a bug report.

It appears that the fix doesn't support hierarchical keywords properly. I have a tag hierarchy with a number (year) as the 2nd tier and the hierarchy was not imported.

Hierarchical implies / as a delimiter. Are you using something else?

@mattsteg
Copy link

mattsteg commented Sep 3, 2024

I'm not directly using anything - the files are generated by commercial software Capture One on output and (absent adding additional workflow steps) I get what I get. Short version - C1 uses | in "HierarchicalSubject"

Looking at the Exif, I see the following:

  1. Hierarchical Subject uses | as a delimiter
  2. Tags List uses / as delimiter, but does not include this hierarchy
  3. The XMP on the input files (used for syncing between C1 and digikam - this file is not within view of immich) did not have this hierarchy applied
  4. After initiating an XMP sync, the XMP on the input file has added this hierarchy to "Hierarchical Subject" with | delimiters

So the implication is that Capture One writes hierarchical keywords to Hierarchical Subject using | as delimiter.

Given that you're getting other tags correct (which are saved in Tags List with / delimiter by digikam) I think you're probably reading in those correctly and not reading the Hierarchical Subject ones (or discarding them)

Tag data is available in the following places in these files:

  1. XMP-acdsee "Categories". This (I believe) I'm adding from digikam for compatibility and not using as far as I know. XML, non-hierarchy
  2. XMP-microsoft "LastKeyWordXMP". Added by digikam (I believe) for compatibility. / delimited hierarchy
  3. XMP-mediapro "CatalogSets" Again added by digikam for compatibility I believe, | delimited hierarchy
  4. XMP-dc "Subject" individual keywords (non-hierarchy) separated by * - this is updated by Capture One
  5. XMP-lr "HierarchicalSubject - keyword hierarchy delimited by | - this is what Capture One is updating for hierarchy
  6. XMP-digiKam "TagsList" - keyword hierarchy delimited by /, added by digikam when roundtripping.

@jrasm91
Copy link
Contributor Author

jrasm91 commented Sep 3, 2024

Correct. At the moment we only are importing data for "tags" from Keywords and TagsList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants