-
-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix EvilAngel performerByURL -> Refactor Algolia scraping #2177
base: master
Are you sure you want to change the base?
Conversation
I've done a first pass of implementing all the scrapers for EvilAngel.yml with the new AlgoliaAPI.py. There are some TODOs for handling multiple sites, and doing some form of results score matching (e.g. galleryByFragment) when the operation finds multiple API results, but the operation only returns a single scraper result. |
@Maista6969 I think a long time ago, there was a discussion about refactoring the existing Algolia.py, and you were in that discussion? Sorry if I'm mistaken, it was quite some time ago... Anyway, what I have here in this PR is working, albeit with some functionality yet to port over, e..g
I may have overlooked some stuff, but I was wondering if you (or anyone else) has any input, suggestions, requests, etc. at this point? |
Implementing markers would be nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wanting to rewrite Algolia for a long time, thank you for contributing this! I certainly agree that the old Algolia has become pretty crufty and I think this is an excellent start on the Road to Refactor 😁
The next step will be creating an EvilAngel.py
that uses this API module so we can have an extra layer of indirection where we can handle the special cases for this site like studio remappings that are currently such a mess in the old Algolia.py
I did wonder what else would be needed, like apart from the studio remapping stuff... I thought that subclassing or importing functions from a "base" script (similar to the Aylo implementation) might be more complex than it's worth (to give flexibility that ultimately isn't needed). With that in mind, I had a think about how the scraper configuration YAML could be used for something like the studio name mapping, and came up with a possible solution like this: import ast
# the API hit dictionary
api_hit = {
'studio_name': 'Enid Blyton',
'serie_name': 'Groovy Gang',
'channel_name': 'Happy Joy',
'sitename': 'thisthing',
'segment': 'something',
}
# these could come in via the `args["extra"] list of strings
conditions_and_values_to_assign = [
"api_hit['studio_name'] == 'Enid Blyton' => api_hit['channel_name']",
"api_hit['segment'] == 'something' => 'a fixed value'",
"api_hit['studio_name'] == 'Not A Match' => api_hit['serie_name']",
]
for condition_and_value_to_assign in conditions_and_values_to_assign:
condition, value_to_assign = condition_and_value_to_assign.split(' => ')
# Parsing and evaluating the condition
parsed_condition = ast.parse(condition, mode='eval')
if eval(compile(parsed_condition, filename="", mode="eval")):
new_variable = eval(value_to_assign)
else:
new_variable = 'default_value'
print(condition_and_value_to_assign)
print(new_variable)
print() when run. this outputs:
I'm not super excited about the use of |
I see what you mean here, but I feel like dynamically evaluating code from a YAML file is even more complex than just having one Python script that calls another Python script 😅 For most sites we might not even need any special handling, see for example the True Amateurs scraper which can just use the general API results and so doesn't have a separate Python script 🙂 edit: whoops originally linked to the wrong scraper here, not Trans Angels but True Amateurs |
You'll have to enlighten me, as in:
I can't see how any of the scraped models provide any marker feature. |
Stash does not currently support scraping markers, but several scrapers have hacked it in because of user demand: it breaks the model of scrapers because instead of just returning results to Stash (where users can decide whether or not they'd like to keep the results) it makes the scraper call the GraphQL API to mutate the scene as it's being scraped It's currently a feature in the Vixen Network scraper as well as the Aylo API, but I'd much prefer to lobby for native support before hacking it into any more scrapers I think it's a moot point here though, as far as I can tell these sites don't have marker data in their APIs |
If you look at the Adult Time json, markers are there under "action_tags". It might be not all the studios have them, same thing happens with Aylo studios. I get your desire to make the support more native, I was just throwing the suggestion out there. |
Thanks, I wasn't aware that they provided these :) I'll make a note of it for when we expand the use of this to AdultTime and the other sites that can use this API 👍 |
yeah to grab markers with a scrape you really want to be sure you matched correctly when you pull them, ideally they are integrated as something we can pass to stash that's like any other scraped metadata but if you want to use a scraper with a confirmation dialog the next best option would probably be a on update hook that looks for a custom flag added by the scraper to remove the flag and add the markers this would happen after the user confirms the scrape and the scene is updated even better would be the ability for a hook for post scrape update |
Scrapers can't register hooks, but I see your point in that we could maintain a separate plugin for this 👍 |
Implemented studio name determination for:
|
file metadata (duration, file size) is now used in the match scoring |
Putting back to draft while I move some of the Adult Time studios from their own scraper (e.g. All Girl Massage, Fantasy Massage, etc.) to the new AdultTime scraper |
Overview
I started this as an attempt to fix the
performerByURL
scraping for EvilAngel, but it turned into the long overdue effort to overhaul the Algolia script.Scraper type(s)
Outstanding tasks
Examples to test
performerByURL
performerByName
Create Performer > Scrape with... > EvilAngel > Performer Name = Ariel
Ariel Demure
from the resultsperformerByFragment
do the Create Performer search action above
go to an existing performer that has scenes at Evil Angel > Edit > Scrape with... > EvilAngel
sceneByURL
movieByURL
galleryByURL
Short description
Problem
Recently, many Algolia-based sites have closed the free access to pages like:
This means you can no longer browse videos, performers and movies on sites like evilangel.com, genderxfilms.com, and a whole load of other sites. This is especially annoying for performer scraping as that is not implemented in the current Algolia.py
Solution
There is actually a full Python client for the Algolia API, and all that's needed is fetching the appId and apiKey, and setting the host and referer headers. By refering to that client's docs, the current Algolia.py, and the Aylo API script, I've cobbled together a working:
The current Algolia.py is a whole load of jank taped together and is long overdue an overhaul. Rather than trying to refactor it in-place, I've decided to make a new script called AlgoliaAPI.py, so that each site scraper can be migrated over individually.
The good parts of the existing Algolia.py should be included now.