-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add donna hay #1153
base: main
Are you sure you want to change the base?
Add donna hay #1153
Conversation
Added Functionality to Scrape Instructions
Add donna hay
Added Functionality to Scrape Ingredients
Thanks all! And apologies for taking a while here; I plan to review this within the next 24h or so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good! I have two requests after reading through the code:
- Could we try retrieving the recipe
title
from one of the other elements on the page, and filtering out the pipe (|
) and subsequent content from that? I think that would make for more readable recipe titles. - To confirm that the
ingredient_groups
functionality works as expected, could we add another test case for a recipe that involves ingredient groupings?
Can do, I'll use https://www.donnahay.com.au/recipes/snacks-and-sides/smoky-eggplant-dip-with-hand-cut-potato-chips as the target if that works? |
Sounds good - thanks, @a1831319! |
Update based on feedback
Additional tests for testing the ingredient groups
Resolved, thank you @a1831319 @mlduff!
This isn't completely resolved yet - could we use the HTML |
Retrieve recipe names from title element
@@ -38,7 +38,7 @@ def site_name(self): | |||
return "Donna Hay" | |||
|
|||
def title(self): | |||
return self.soup.find("h1", class_="recipe-title__mobile").text | |||
return self.soup.find("title").text.split("|")[0].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self.soup.find("title").text.split("|")[0].strip() | |
html_title = self.soup.find("title") | |
recipe_title, _, _ = html_title.text.partition("|") | |
return recipe_title.strip() |
Edit: call str.partition
instead of str.rpartition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nope.. not quite correct. rpartition
would return an empty result when |
is not found in the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(updated/fixed to use str.partition
instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you @a1831319 @heathrampazis @mlduff @Mooree003!
Ready to merge once the merge conflict in __init__.py
is resolved; the str.partition
usage suggestion is optional.
# mypy: allow-untyped-defs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request is generally ready I think - just some merge conflicts to resolve.
There's a small cleanup opportunity here too - after #1174 we don't need these allow-untyped-defs
mypy
directives, so this can be removed from the file header.
@@ -0,0 +1,62 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note: we've begun checking for a preferred ordering of the JSON key names (not alphabetical; more like priority/review-aid based).
After merging recent changes into your branch, one of the unit tests may begin complaining about the JSON files because of that. There is however a script provided that can automatically fix them -- running python scripts/reorder_json_keys.py
should do that for you.
Resolves #1150
No schema support. Most functions are supported (except for times).
Worked on collaboratively by myself, @heathrampazis , @a1831319 and @Mooree003.