Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AniDB vs Anilist - add support for Movies and wo/o naming differences #126

Open
karpik123 opened this issue Mar 30, 2022 · 1 comment
Open

Comments

@karpik123
Copy link

karpik123 commented Mar 30, 2022

I went through my library and synced everything. I use x-jat names from AniDB and I noticed two naming patterns that should be straightforward to cover, saving a lot of work on custom mappings.

First Pattern - 'Movie'

AniDB Name Anilist Name
Gekijouban Blood-C: The Last Dark BLOOD-C: The Last Dark
Gekijouban Mahouka Koukou no Rettousei: Hoshi o Yobu Shoujo Mahouka Koukou no Rettousei: Hoshi wo Yobu Shoujo
Gekijouban xxxHOLiC: Manatsu no Yo no Yume xxxHOLiC: Manatsu no Yoru no Yume
Gekijouban Dungeon ni Deai o Motomeru no wa Machigatte Iru Darouka: Orion no Ya Dungeon ni Deai o Motomeru no wa Machigatte Iru Darouka: Orion no Ya

PlexAniSync can recognise this word and attempt to do an extra attempt to match title after removing Gekijouban<space> from the string.

Another similar example is 'Eiga':

AniDB Name Anilist Name
Eiga Crayon Shin-chan: Mononoke Ninja Chinfuuden Crayon Shin-chan: Mononoke Ninja Chinfuuden
Eiga Doraemon: Nobita no Little Star Wars 2021 Doraemon: Nobita no Little Star Wars 2021

Second Pattern - wo vs o

AniDB Name Anilist Name
Hige o Soru. Soshite Joshikousei o Hirou. Hige wo Soru. Soshite Joshikousei wo Hirou.
Sono Bisque Doll wa Koi o Suru Sono Bisque Doll wa Koi wo Suru
Seishun Buta Yarou wa Yumemiru Shoujo no Yume o Minai Seishun Buta Yarou wa Yumemiru Shoujo no Yume wo Minai
Nakitai Watashi wa Neko o Kaburu Nakitai Watashi wa Neko wo Kaburu
Fune o Amu Fune wo Amu

AniDB is almost universally done as o, while Anilist uses wo in titles. I don't know Japanese well enough to understand why...
PlexAniSync can catch <space>o<space> in the string and do an extra attempt to match title after convering o into wo. Note top example from the table even has double o.
While some titles might genuinely use o in the title, I don't expect them to be a match to a completely different title even if PlexAniSync converts innocent o into wo.

@karpik123
Copy link
Author

karpik123 commented Mar 31, 2022

I got my hands on AniDB title .xml.gz file and did some top level counting. I discarded all lines from xml except lang="x-jat" and type="main".

I was left with 593 titles:

  • 211 titles with 'Gekijouban '
  • 261 titles with ' o ' (266 instances, so a few had multiple o o)
  • 142 titles with 'Eiga '

Numbers don't add up as Eiga + o or Gekijouban + o happen sometimes.

I did this to do more data checks and to confirm the logic won't be harmful. I spotted some odd cases, please read on.

The wo->o rule

The overwhelming number of examples would be perfect if o became wo.

Some oddities:

Gekijouban rule

Some medium disappointment here, I have to go back on my initial assumption.

Here are examples where gekijouban-less title will match to tv show of the same name:

Funny outlier:
Gekijouban Idol Bu Show, anidb: 17230 is https://anilist.co/anime/145916/IDOL-bu-SHOW-Movie/ but there's no tv show covering the name.

Eiga rule

Not as much as Gekijouban case, but I can find similar issues.

Here are examples where eiga-less title will match to tv show of the same name:

Other oddities:
Komadori Eiga Komaneko, anidb 7306 proves that Eiga needs to be matched from the beginning of the string.


Summary

Wo-ing the titles seems safe and desired.

While all previous examples from my own library would match correct anilist title (after de-gekijoubaning or de-eigaing), there seem to be too many cases where it will cause problems.

Instead, I think it's safer to attempt to do following treatment:

  • de-gekijouban or de-eiga the title
  • add Movie and (Movie)
  • try to match
  • give up if nothing found

I attach file with cleaned titles I used for above research: https://gist.github.com/karpik123/760774de1a0a90156567d794a704e71a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant