From a19a4d76a891885748ffe9c392b53bbac213fbfa Mon Sep 17 00:00:00 2001 From: Hristo Harsev Date: Sat, 22 Feb 2025 06:58:17 +0200 Subject: [PATCH] docs: Update the index page and remove v14-v15 migration guide --- docs/getting-started/examples.md | 4 +- docs/index.md | 100 +++++++++++++++---------------- docs/misc/migrating-from-v14.md | 98 ------------------------------ mkdocs.yaml | 12 ++-- 4 files changed, 55 insertions(+), 159 deletions(-) delete mode 100644 docs/misc/migrating-from-v14.md diff --git a/docs/getting-started/examples.md b/docs/getting-started/examples.md index 3f18682ac..831af487e 100644 --- a/docs/getting-started/examples.md +++ b/docs/getting-started/examples.md @@ -1,10 +1,10 @@ # Examples -!!! important +!!! warning `recipe-scrapers` is designed to focus **exclusively on HTML parsing**. This core principle guides our development and support. You'll need to implement your own solution - for fetching recipe HTML files and managing network requests. The library works best when you + for fetching recipe HTMLs and managing network requests. The library works best when you provide both the HTML content and its source domain. diff --git a/docs/index.md b/docs/index.md index 3da44e045..cdc51e1c3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,34 +4,29 @@ [![Version](https://img.shields.io/pypi/v/recipe-scrapers.svg)](https://pypi.org/project/recipe-scrapers/) [![Python Version](https://img.shields.io/pypi/pyversions/recipe-scrapers)](https://pypi.org/project/recipe-scrapers/) [![Downloads](https://pepy.tech/badge/recipe-scrapers)](https://pepy.tech/project/recipe-scrapers) -[![GitHub Actions Unittests](https://github.com/hhursev/recipe-scrapers/workflows/unittests/badge.svg?branch=main)](https://github.com/hhursev/recipe-scrapers/actions/) +[![GitHub Actions Unittests](https://github.com/hhursev/recipe-scrapers/actions/workflows/unittests.yaml/badge.svg?branch=main)](https://github.com/hhursev/recipe-scrapers/actions/) [![Coveralls](https://coveralls.io/repos/hhursev/recipe-scraper/badge.svg?branch=main&service=github)](https://coveralls.io/github/hhursev/recipe-scraper?branch=main) [![License](https://img.shields.io/github/license/hhursev/recipe-scrapers)](https://github.com/hhursev/recipe-scrapers/blob/main/LICENSE) --- -`recipe-scrapers` is a Python package designed to extract recipe data from HTM content of -cooking websites. It parses the HTML structure of recipe pages to provide a simple and consistent -API for retrieving structured data like ingredients, instructions, cooking time, and more. -Works with the python versions listed above. +`recipe-scrapers` is a [python](https://www.python.org/) package for extracting recipe data from +cooking websites. It parses recipe information from either standard +[HTML](https://developer.mozilla.org/en-US/docs/Web/HTML) structure, [Schema](https://schema.org/) +markup (including JSON-LD, Microdata, and RDFa formats) or [OpenGraph](https://ogp.me/) metadata found. -## Installation +The package provides a simple and consistent API for retrieving data such as ingredients, instructions, +cooking times, and more. + +Compatible with the Python versions listed above. This package does not circumvent or bypass any +bot protection measures implemented by websites. -You can install `recipe-scrapers` using pip or your preferred Python package manager: !!! tip "Install" ``` console pip install recipe-scrapers ``` -!!! note - - This should produce output about the installation process, with the final line reading: - `Successfully installed recipe-scrapers-`. - - -## Overview - ```python exec="on" import sys sys.path.insert(0, ".") @@ -42,47 +37,15 @@ print(f"There are **{len(SCRAPERS.keys())}** cooking websites currently supporte For a full list check our [Supported Sites](./getting-started/supported-sites.md) section. -With `recipe-scrapers`, you should easily extract structured recipe data such as: - -- title -- ingredients -- instructions -- cooking and preparation times -- yields -- image -- and many more... - -Check out our [Examples](./getting-started/examples.md) section to see how to get started with the library. - -## Core Functionality - -`recipe-scrapers` long term aim is to focus **solely on HTML parsing** and not to handle -networking operations. This design choice provides flexibility in how you retrieve HTML content -and allows you to: - -- Implement your own networking logic -- Handle rate limiting -- Manage caching -- Control error handling -- Use your preferred HTTP client - ## Getting Started -👋 We suspect you've missed a few key links, such as us mentioning the [Examples](./getting-started/examples.md) and -the [Supported Sites](./getting-started/supported-sites.md) section. - -Thus, we drop this tiny Python snippet for you (using Python's built-in `urllib`) to showcase how -you can use this package: +Parsing recipe information can be as simple as: ```python -from urllib.request import urlopen - -from recipe_scrapers import scrape_html +from recipe_scrapers import scrape_me -url = "https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/" -html = urlopen(url).read().decode("utf-8") # retrieves the recipe webpage HTML -scraper = scrape_html(html, org_url=url) +scraper = scrape_me("https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/") scraper.title() scraper.instructions() scraper.to_json() @@ -90,6 +53,37 @@ scraper.to_json() # help(scraper) ``` +!!! warning + `recipe-scrapers` is designed to focus **exclusively on HTML parsing**. + + This core principle guides our development and support. You'll need to implement your own solution + for fetching recipe HTMLs and managing network requests. The library works best when you + provide both the HTML content and its source domain. + + For more advanced implementations, we recommend using: + + ```python + from recipe_scrapers import scrape_html + ``` + + Check out our [Examples](./getting-started/examples.md) section. + + +## Overview + +With `recipe-scrapers`, you should easily extract structured recipe data such as: + +- title +- ingredients +- instructions +- cooking and preparation times +- yields +- image +- and many more... + +Check out our [Examples](./getting-started/examples.md) section to see how to get started with the library. + + ## Why recipe-scrapers Exists Born from late-night coding sessions and a love for both food and programming, `recipe-scrapers` @@ -110,6 +104,6 @@ Today, our library helps power diverse projects across the cooking landscape: We're excited to see what you'll create! Feel free to share your project in our [community showcase](https://github.com/hhursev/recipe-scrapers/issues/9) - we love seeing what others build with the library. -!!! tip "Happy cooking with code!" - While building awesome stuff, remember to be mindful of websites' terms and fair usage - - our [Copyright and Usage Guidelines](copyright-and-usage.md) will help you stay on track. + +While building awesome stuff, remember to be mindful of websites' terms and fair usage - +our [Copyright and Usage](copyright-and-usage.md) will help you stay on track. diff --git a/docs/misc/migrating-from-v14.md b/docs/misc/migrating-from-v14.md deleted file mode 100644 index df93a5904..000000000 --- a/docs/misc/migrating-from-v14.md +++ /dev/null @@ -1,98 +0,0 @@ -# Migrating from v14 to v15 - -!!! warning "Under Construction" - This documentation section is currently being updated and improved. - -## Overview - -Version 15 introduces important changes to the core API of recipe-scrapers, particularly regarding -how recipes are scraped from websites. The main change is the deprecation of the `scrape_me` -function in favor of more explicit HTML parsing methods. - -## Key Changes - -### 1. Deprecation of `scrape_me` - -The `scrape_me` function, which was the primary method for scraping recipes in v14, is being -deprecated. While it still works in v15, you'll receive deprecation warnings when using it: - -```python -# Old v14 approach (deprecated) -from recipe_scrapers import scrape_me -scraper = scrape_me('https://example.com/recipe') # Will show deprecation warning -``` - -### 2. New Recommended Approach - -The new approach separates HTML fetching from parsing: - -```python -# New v15 approach -from urllib.request import urlopen -from recipe_scrapers import scrape_html - -# Fetch HTML (you can use any HTTP client) -url = "https://example.com/recipe" -html = urlopen(url).read().decode("utf-8") - -# Parse HTML -scraper = scrape_html(html, org_url=url) -``` - -## Why This Change? - -1. **Better Separation of Concerns**: The library now focuses solely on HTML parsing, letting -you handle HTTP requests as you see fit -2. **More Flexibility**: You can use your preferred HTTP client (requests, httpx, aiohttp, etc.) -3. **Better Error Handling**: Separate networking issues from parsing issues - -## Migration Steps - -1. Replace `scrape_me` imports: - ```python - # Before - from recipe_scrapers import scrape_me - - # After - from recipe_scrapers import scrape_html - ``` - -2. Update scraping code: - ```python - # Before - scraper = scrape_me('https://example.com/recipe') - - # After - from urllib.request import urlopen - - url = 'https://example.com/recipe' - html = urlopen(url).read().decode("utf-8") - scraper = scrape_html(html, org_url=url) - ``` - -3. If you're using a web framework or need to handle many requests, consider using a more -robust HTTP client: - ```python - # Example with requests - import requests - from recipe_scrapers import scrape_html - - def get_recipe(url): - response = requests.get(url) - response.raise_for_status() - return scrape_html(response.text, org_url=url) - ``` - -## Timeline - -- v14.x.x: Still supported but will only receive critical bug fixes -- v15.x.x: Current stable version with new API -- Future versions: Will build upon the v15 API structure - -## Getting Help - -If you encounter issues during migration: - -1. Check the [GitHub issues](https://github.com/hhursev/recipe-scrapers/issues) for similar problems -2. Open a new issue if you find a bug -3. Join the community discussions for migration-related questions diff --git a/mkdocs.yaml b/mkdocs.yaml index 7383ce498..02e94ad24 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -9,15 +9,15 @@ theme: palette: - scheme: default primary: teal - accent: teal + accent: orange toggle: - icon: material/brightness-7 + icon: material/weather-night name: Switch to dark mode - scheme: slate - primary: teal - accent: teal + primary: indigo + accent: lime toggle: - icon: material/brightness-4 + icon: material/weather-sunny name: Switch to light mode features: - navigation.instant @@ -30,6 +30,7 @@ theme: - navigation.top - navigation.tabs - navigation.tabs.sticky + - content.code.copy nav: - Getting Started: @@ -37,7 +38,6 @@ nav: - Examples: getting-started/examples.md - Supported Sites: getting-started/supported-sites.md - Advanced Usage: getting-started/advanced-usage.md - - Migrating from v14: misc/migrating-from-v14.md - Releases & License: getting-started/releases-and-license.md - Contributing: - Contributing: contributing/home.md