-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing non-alphanumeric characters from all searches doesn't work for some indexers #1225
Comments
goes with: #542 ? |
I think this one just needed a better name, we're using a modified version of the scene names, which works for a lot of indexers, but some are special 😄 We just need a way to modify them later in the process to allow for customization for certain indexers. |
We should move the cleanup logic out of the searchcriteria and into the RequestGenerator. |
Similar issue with |
Is it perhaps possible to allow users to change the name that's being searched ? |
I think it would be great if it was possible to manually modify how the search is done. I know for some torrent trackers you sometimes need to include double quotes around the search term to get a proper result if the show name contains spaces for example. |
Knight's & Magic
|
markus101 can there be an exception for stuff, like so shows with titles with a ' can also search without, and also, titles with an amperstand (&) be searched with AND without, as well as search with it replaced as "and"? Such as, The Handmaid's Tale, Will & Grace, >> The Handmaids Tale, Will Grace, Will and Grace? RSS caught The Handmaid's Tale, but manual search doesn't. |
Running into the same issue as kat with The Handmaid's Tale -- Jackett catches both, but manual search only catches versions without the apostrophe. Any way we could have it search for both, or perhaps specify which on a per-show basis? |
This happens for me using Torznab through Jackett. Debug log:
Manual search on Jackett for "Handmaid's Tale" works but not "Handmaids Tale", so the above causes the indexer to return no results. |
roman-22 So Jackett is part of the problem... Didn't even think of that part. |
I think each indexer's search handles it differently. TL recognises both "Handmaids Tale" and "Handmaid's Tale" whereas PHD (used in above log) needs the apostrophe. Because Jackett is not given the apostrophe and is only passed "Handmaids Tale" I don't think there's a way for Jackett to solve the problem. Either Sonarr needs to pass the apostrophe to Jackett, or the indexer needs to adapt their search engine to allow looser matches to be found. |
I opened other issue (#2644) with a similar problem. Because mine was closed inmediatly and this one has more attention I'll add my opinion here. In my case the problem isn't only the single quote auto-removal of Sonarr, it's removing "the" from any series like The handmaid's tale, leaving it like "handmaids tale", which is far from correct, and it's complicating the way indexers works. @markus101 said that they cannot let indexers sanitize because they don't always do it, but I don't think that is a reason to do things wrong. If indexers are not sanitizing is not problem of Sonarr, is problem of the indexer. I don't understand that one application should do things that it shouldn't because external applications don't work otherwise. I wrote one indexer on Jackett, and fix another one to make it Sonarr compliant, and in my case I face the problem that Sonarr is "making up" titles that doesn't match the reality, so I cannot really know the real one. Remove apostrophes or remove "the" from titles before send it to indexer is out of Sonarr scope. |
This is NOT the fault or problem of indexers. Sonarr picks up results IN RSS but not in searches. The implementation could be added with tweaks to the search algorithm for titles by also searching for results with special characters stripped in Sonarr. In fact, releases are actually meant to be untouched (including their filenames) due to standards of release groups and the Scene which make sure files do not contain special characters for the sake of compatibility and consistency. As per removing "the" from titles and adding them to search results, this occurs but is seemingly harmless in and of itself. It does not appear to be injuring RSS snatches. |
@kat953162 I think that you didn't understand my point. I'm saying that is problem of the indexer to sanitize the title, not Sonarr. Obviously RSS works, because Sonarr doesn't send any query. The origin of the problem is Sonarr, Sonarr is removing characters from titles that it shouldn't. But they say that they do it because indexers don't sanitize, so they have to. Wrong. If indexers are bad implemented is problem of the indexer, not Sonarr, Sonarr should do things right, because if it doesn't is way more complicate to implement an indexer that needs the removed parts. I think there would be two harmless solutions to this problem without affect any implemented indexer:
|
The indexers have protocols to follow. Higher-level indexers are not going to rename releases if a group uses a certain title. Its not "sanitization" if the indexer adds extra special characters to a title. The Scene will not suddenly start allowing characters other than A-Z, a-z, 0-9, periods, and dashes as per the rules, and other release groups generally follow these standards as well but with flexibility. Sonarr needs to perform queries with specials characters removed in order to capture a full set of results. It will then notice items that would have an apostrophe or other special symbols (ampersands) removed from the title. It will only have a slightly lower speed (not performance) for titles with symbols, which is not too common, and it is far better than not having the results at all from the query in the first place. Instead of making one API request, it would be making two for titles with symbols, which isn't a big deal. |
The majority of indexers with newznab use sphinx indexing and are usually configured to strip special characters like that during the indexing process. It would be nice if they did the same for the api query, but they often don't. It doesn't make sense to demand Sonarr queries by unmodified titles simply because you desire it so and break it for all the sphinx indexers in the process. Currently there is no way for newznab/torznab indexers to convey their keyword format in the t=caps capabilities, otherwise we could use that, so it's not possible to implement different behavior depending on the site. At least, not at this time. |
I didn't say break anything. None of my proposals will break even one indexer currently working. |
Maybe since |
A hard coded list of sites / indexers in the code will require unnecessary administrative overhead on the code base. Trying to idiot proof the program by removing user choice is infuriating to me and if I do write a patch it will include a toggle option and possibly an option to specify what characters can remain. Some trackers allow a few but not all special characters in a search. I've been on the opposing end no less than a few dozen times and I hate when a developer tries to single handedly be smarter than the user. If it's an advanced option it's on the user to mess with it. If the user misconfigures it, that's their problem. With adequate documentation and UI design the user should be able to figure out exactly what it is they're tinkering with.
I disagree. It's avoidable and for the same reasoning above, it should be up to the user since it's their account on the indexer the additional load will show up under. |
@lps-rocks I think you did not understand what @Taloth tried to explain. There's just no place where a user-editable setting would make sense. The indexers are already hardcoded in Sonarr. The only option you have to add other indexers is via custom newznab or custom torznab. If someone has a newznab or torznab api, they can provide the capabilities of their api to Sonarr. Then Sonarr can adjust its queries accordingly. Once such capabilities are in Sonarr, it simply needs to be added to the few indexers that are integrated. And since Jackett is just another torznab indexer, if the user sets up custom indexers through Jackett, they can just set the api capabilities from there. |
It would need to be an option under the custom torznab feed in Sonarr. Sonarr is the responsible party that’s modifying the name to be ’safe’.
… On May 20, 2019, at 1:44 PM, xelra ***@***.***> wrote:
I think you did not understand what @Taloth <https://github.com/Taloth> tried to explain.
There's just no place where a user-editable settings would make sense. The indexers are already hardcoded in Sonarr. The only options you have to add other indexers is via custom newznab or custom torznab.
If someone has a newznab or torznab api, they can provide the capabilities of their api to Sonarr. Then Sonarr can adjust its queries accordingly.
So where exactly is the user supposed to meddle with this?
Once such capabilities are in Sonarr, it simply needs to be added to the few indexers that are integrated. And since Jackett is just another torznab indexer, if the user sets up custom indexers through Jackett, they can just set the api capabilities from there.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1225?email_source=notifications&email_token=ACTDUV6EU6SG3AEY7Y7KXH3PWLWPZA5CNFSM4CAGHEE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVZXGIQ#issuecomment-494105378>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACTDUV6XE6F4OK534Q2VK63PWLWPZANCNFSM4CAGHEEQ>.
|
Newznab indexers mostly use sphinx as search engine, and thus the query titles were formatted for that. Back in the day we proposed and succeeded in getting the supportedParams attribute added to the newznab-specification caps response, allowing indexers to specify which query parameters they supported. This was first introduced in torznab specifically for Jackett. And after tvrage disappeared, it was successfully proposed to the actual newznab specification and their codebase. It was an improvement because it allowed the indexers to specify what they supported and clients to act accordingly. All without requiring the user to fiddle with configuration. So the decision logic that determines which QueryTitle cleanups are needed has to be moved to RequestGenerator so that the indexer capabilities can be taken into account. I discussed this with markus and he also does not want to add a user setting for this behavior. The correct format should be automatically determined, but if that is not possible or inconclusive then both titles should be queried instead. |
Feels like a short-term fix is to query both the sanitised and non-sanitized titles, and that makes a lot of sense. I also agree that it would be worth asking the Jackett devs to support a "sanitized" field - they could have that stored in the Jackett DB and then it would solve the problem for all feeds everywhere, but reduce the need to double-query on Sonarr. This change would solve/fix a lot of manual matches/searches that I have to do. |
Hi guys, |
How has this not been fixed yet? Is there really no workaroudn for sites like nyaa.si that require the apostraphe in shows with titles containing apostraphes? |
As far as I understand, there is no tangible movement in the Jackett on this issue. |
I will add one more case to the piggy bank. |
Ref - Radarr/Radarr#4502 1cbcad6 helped lay the ground work for some of this and once trackers indicate they support/need RawSearch that should signficantly alleviate this issue YGG Torrents in Prowlarr has marked the tracker as supporting RawSearch - Once Jackett supports the parameter they can update their definitions as well Other Indexer definitions will need similar updates Doesn't fully resolve it, but should help for some. |
Same issue for
Manual search using correct title from prowlarr works fine:
|
Jackett will now have RawSearch support shortly
Indexers that require RawSearch simply need to be reported to Jackett and then will be pulled to Prowlarr when updated or can be reported directly to Prowlarr Believe this should effectively resolve this issue then. Prowlarr RuTracker commit - Prowlarr/Prowlarr@bc50fd9 |
Indexers through Jackett and Prowlarr that report raw search capabilities are handled correctly which does solve this issue in the majority or cases, other newznab/torznab indexers can do the same if required. |
When searching, especially for anime, cleanTitle is not what is needed.
It should (maybe additionally to not break other search APIs) search for the exact scene title. This would be especially helpful for anime.
Nyaa fixed their search API and now properly returns results for '. It needs to be substituted with %27 though.
Here is the now successful search for JoJo's Bizarre Adventure:
http://www.nyaa.se/?page=search&cats=1_37&filter=1&term=JoJo%27s+Bizarre+Adventure
Here is the hastebin that Taloth made about how Sonarr currently searches:
http://hastebin.com/qotubuxeme.vhdl
The text was updated successfully, but these errors were encountered: