Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subliminal score system with conceptual error. #656

Closed
a-nunes opened this issue Nov 3, 2019 · 22 comments
Closed

Subliminal score system with conceptual error. #656

a-nunes opened this issue Nov 3, 2019 · 22 comments

Comments

@a-nunes
Copy link

a-nunes commented Nov 3, 2019

After using subliminal for sometime, I get a glimpse of a conceptual error in score system that gives me a lot of false positives, since then I've been using it with a new score that is more effective, so I'd like to share this with you to discuss about changing it. Here is the examples:

Imagine this is my release: New.Amsterdam.2018.S02E06.1080p.WEB-DL.x264-KILLERS

With the subliminal standard score system, I get this two real subtitles:

  1. New.Amsterdam.2018.S02E06.1080p.WEB-DL.x264-NTb
  2. New.Amsterdam.2018.S02E06.720p.HDTV.x264-KILLERS

For humans, it's clear that the first one is the a positive match, but for the subliminal score system, it gets like this:

New.Amsterdam.2018.S02E06.1080p.WEB-DL.x264-NTb
=180(series)+90(year)+30(season)+30(episode)+7(format)+2(resolution)+2(video_codec)=341 (94,9% of match)

New.Amsterdam.2018.S02E06.720p.HDTV.x264-KILLERS
=180(series)+90(year)+30(season)+30(episode)+15(release_group)+2(video_codec)=347 (96,6% of match)

This happens because the release_group has a score that just is bigger than format, resolution and video_codec together.

Here is how my new score works and seems to me more realistic:

New.Amsterdam.2018.S02E06.1080p.WEB-DL.x264-NTb
=180 (series)+90(year)+30(season)+30(episode)+17(format)+2(resolution)+8(video_codec)=357 (98,6% of match)

New.Amsterdam.2018.S02E06.720p.HDTV.x264-KILLERS
=180(series)+90(year)+30(season)+30(episode)+2(release_group)+8(video_codec)=340 (93,9% of match)

Here is how I changed the code in score.py :

#: Scores for episodes
episode_scores = {'hash': 362, 'series': 180, 'year': 90, 'season': 30, 'episode': 30, 'release_group': 2,
'format': 17, 'audio_codec': 3, 'resolution': 2, 'video_codec': 8, 'hearing_impaired': 1}

#: Scores for movies
movie_scores = {'hash': 120, 'title': 60, 'year': 30, 'release_group': 2,
'format': 17, 'audio_codec': 3, 'resolution': 2, 'video_codec': 8, 'hearing_impaired': 1}

@morpheus65535
Copy link
Owner

@pannal What do you think about that?

@defcon84
Copy link

defcon84 commented Nov 4, 2019

I like it.
I disabled subscene because to too many false hits, format is way more important for timings

@pannal
Copy link
Collaborator

pannal commented Jan 18, 2020

Well yeah, I agree that this is bad. I think the release group match should not be counted if the format mismatches.

I'm thinking about any repercussions this might have. @Diaoul any ideas?

I imagine something like HDDVD and BluRay would then mismatch, which might not be good.

release_group having a super high score is still something viable, as there aren't many release groups covering different formats.

@pannal pannal pinned this issue Jan 18, 2020
@Diaoul
Copy link

Diaoul commented Jan 18, 2020 via email

@a-nunes
Copy link
Author

a-nunes commented Jan 18, 2020

Well, I still have some problems, specially when some episodes are WEB-DL vs WEBRIP, they mismatch every time. But I think this is part of guessit. In development version of subliminal, we have more fields to compare, like stream service and other. It may clear up more if we try use this more modern version. What do you think?

@morpheus65535 morpheus65535 unpinned this issue Jan 18, 2020
@pannal
Copy link
Collaborator

pannal commented Jan 18, 2020

@a-nunes yes, I wasn't aware that there's a new development to subliminal? I thought only guessit got a major upgrade.

@a-nunes
Copy link
Author

a-nunes commented Jan 19, 2020

@pannal yeah, there's a development branch at subliminal github rep.

Here is the score of this branch:

#: Scores for episodes
episode_scores = {'hash': 809, 'series': 405, 'year': 135, 'country': 135, 'season': 45, 'episode': 45,
'release_group': 15, 'streaming_service': 15, 'source': 7, 'audio_codec': 3, 'resolution': 2,
'video_codec': 2, 'hearing_impaired': 1}

#: Scores for movies
movie_scores = {'hash': 269, 'title': 135, 'year': 45, 'country': 45, 'release_group': 15, 'streaming_service': 15,
'source': 7, 'audio_codec': 3, 'resolution': 2, 'video_codec': 2, 'hearing_impaired': 1}

Include countries and streaming service.

I'll try some combinations for best scores in both versions. Maybe we should update guessit too, because it has some improvements over older version.

@pannal
Copy link
Collaborator

pannal commented Jan 19, 2020

Can you link the branch? I can't find any official branch doing that.

@morpheus65535
Copy link
Owner

@pannal
Copy link
Collaborator

pannal commented Jan 19, 2020

@morpheus65535 interesting. I'm currently adding a more generic matching approach for format to subliminal_patch. I'd like to try that out before moving to a much different scoring.

@pannal
Copy link
Collaborator

pannal commented Jan 19, 2020

    "TV": ("HDTV", "SDTV", "AHDTV", "UHDTV", "SATRip", "DVB", "PPV"),
    "Disk": ("DVD", "HD-DVD", "BluRay")

in particular. If none of those match, but release_group does, ignore release group.

@pannal
Copy link
Collaborator

pannal commented Jan 19, 2020

SATRip, DVB and PPV might be too much here, just a proof of concept.

pannal added a commit to pannal/Sub-Zero.bundle that referenced this issue Jan 19, 2020
@morpheus65535
Copy link
Owner

Seems totally fine to me! :-)

@a-nunes
Copy link
Author

a-nunes commented Jan 20, 2020

    "TV": ("HDTV", "SDTV", "AHDTV", "UHDTV", "SATRip", "DVB", "PPV"),
    "Disk": ("DVD", "HD-DVD", "BluRay")

in particular. If none of those match, but release_group does, ignore release group.

@pannal Could be added a group of WEB?
Web, webrip and WEB-DL to be treated as same format, I think it would be Nice.

BTW, I liked your approach with release group just be used in case of match of format.

@pannal
Copy link
Collaborator

pannal commented Jan 20, 2020

Most definitely not. WEB-DL is not the same as WEBRip necessarily. A WEBRip can have massively different timings depending on how it was performed.

@a-nunes
Copy link
Author

a-nunes commented Jan 23, 2020

@pannal is it possible to use updated version of subliminal? With the country and streaming service?

@morpheus65535
Copy link
Owner

@a-nunes it would be like moving from a Tesla to a Renaud 5 because they now have power steering...

@a-nunes
Copy link
Author

a-nunes commented Jan 24, 2020

@morpheus65535 I don't see how, if it has more filters, it'll reduce the false positives and bring more reliable subtitles, don't you agree? Even more with the patch that only uses release group if it's formats matchs...

@morpheus65535
Copy link
Owner

@a-nunes subliminal_patch have a multitude of upgrade to get the actual experience that aren't even considered in base subliminal.

@pannal
Copy link
Collaborator

pannal commented Jan 25, 2020

Rebasing on the "latest" changes by rato on subliminal might be something to consider, though.
Although, it might be hard as there have been many changes to subliminal_patch that'd have to be checked and backported when necessary.

I doubt that those changes will do much compared to what I recently added to the scoring system - did you try the latest changes from last weekend @a-nunes?

@a-nunes
Copy link
Author

a-nunes commented Jan 25, 2020

Yes, I did, but I’ve to rollback, don’t know why python3 started to using 100% CPU and crashing my machine. I’d like your solution, will try it in a test machine to see how it behaviors. I still think that score system needs a redesign. I’ll let you know once I’ve properly tested it and checked its accuracy.

@morpheus65535
Copy link
Owner

I'll close this one. Please reopen it if you feel that something still need to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants