Skip to content

Commit

Permalink
misc: udpate readme.rst, keep it shorter. remove downloads count
Browse files Browse the repository at this point in the history
  • Loading branch information
hhursev committed Feb 22, 2025
1 parent 7a02926 commit dc8b2ff
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 52 deletions.
74 changes: 28 additions & 46 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ recipe-scrapers
.. image:: https://img.shields.io/pypi/pyversions/recipe-scrapers
:target: https://pypi.org/project/recipe-scrapers/
:alt: PyPI - Python Version
.. image:: https://pepy.tech/badge/recipe-scrapers
:target: https://pepy.tech/project/recipe-scrapers
:alt: Downloads
.. image:: https://github.com/hhursev/recipe-scrapers/actions/workflows/unittests.yaml/badge.svg?branch=main
:target: unittests
:alt: GitHub Actions Unittests
Expand All @@ -24,12 +21,6 @@ recipe-scrapers
:target: https://github.com/hhursev/recipe-scrapers/blob/main/LICENSE
:alt: License

-------

A reliable python tool for scraping recipe data from popular cooking websites. Extract structured
recipe information including ingredients, instructions, cooking times, and nutritional data
with ease. Supports 400+ major recipe websites out of the box.


Quick Links
-----------
Expand All @@ -40,8 +31,20 @@ Quick Links
- `Share Project Ideas <https://github.com/hhursev/recipe-scrapers/issues/9>`_


Installing
----------
A Python package for extracting recipe data from cooking websites. Parses recipe information from
either standard `HTML <https://developer.mozilla.org/en-US/docs/Web/HTML>`_ structure,
`Schema <https://schema.org/>`_ markup (including JSON-LD, Microdata, and RDFa formats) or
`OpenGraph <https://ogp.me/>`_ metadata.

The package provides a simple and consistent API for retrieving data such as ingredients, instructions,
cooking times, and more.

Compatible with the Python versions listed above. This package does not circumvent or bypass any
bot protection measures implemented by websites.


Installation
-----------
.. code:: shell
pip install recipe-scrapers
Expand All @@ -51,42 +54,27 @@ Basic Usage
-----------
.. code:: python
from urllib.request import urlopen
from recipe_scrapers import scrape_html
from recipe_scrapers import scrape_me
# Example recipe URL
url = "https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/"
# retrieve the recipe webpage HTML
html = urlopen(url).read().decode("utf-8")
scraper = scrape_me("https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/")
scraper.title()
scraper.instructions()
scraper.to_json()
# for a complete list of methods:
# help(scraper)
# pass the html alongside the url to our scrape_html function
scraper = scrape_html(html, org_url=url)
# Extract recipe information
print(scraper.title()) # "Spinach and Feta Turkey Burgers"
print(scraper.total_time()) # 35
print(scraper.yields()) # "4 servings"
print(scraper.ingredients()) # ['1 pound ground turkey', '1 cup fresh spinach...']
print(scraper.instructions()) # 'Step 1: In a large bowl...'
This package is focused **exclusively on HTML parsing**.

# For a complete list of available methods:
help(scraper)
For advanced implementations, you'll need to implement your own solution for fetching recipe HTMLs
and managing network requests. The library works best when you provide both the HTML content and
its source domain.

You are encouraged to use our *scrape_html* method:

HTTP Clients
------------
Some Python HTTP clients you can use to retrieve HTML include:

- `requests`_: Popular and feature-rich
- `httpx`_: Modern, supports async/await
- `urllib.request`_: Included in Python's standard library

Please refer to their documentation to find out what options (timeout configuration, proxy
support, etc) are available.
.. code:: python
.. _requests: https://pypi.org/project/requests/
.. _httpx: https://pypi.org/project/httpx/
.. _urllib.request: https://docs.python.org/3/library/urllib.request.html
from recipe_scrapers import scrape_html
Supported Sites
Expand All @@ -104,12 +92,6 @@ You can also get the full list programmatically with:
SCRAPERS.keys()
Documentation
-------------
For detailed usage instructions, examples, and API reference, visit our
`documentation <https://docs.recipe-scrapers.com>`_.


Contributing
------------
We welcome contributions! Please read our
Expand Down
21 changes: 18 additions & 3 deletions docs/getting-started/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,24 @@
provide both the HTML content and its source domain.


## Basic Usage
### HTTP Clients

Here's a simple example of how to use the library:
Some Python HTTP clients you can use to retrieve HTML include:

- [requests](https://docs.python-requests.org/en/): Popular and feature-rich
- [httpx](https://www.python-httpx.org/): A fully featured HTTP client for Python 3,
- [aiohttp](https://docs.aiohttp.org/en/): Asynchronous HTTP Client/Server
- [urllib.request](https://docs.python.org/3/library/urllib.request.html): Included in Python's standard library

Please refer to their documentation to find out what options (headers, timeout configuration, proxy
support, etc) are available.

We use the built-in [urllib.requests](https://docs.python.org/3/library/urllib.request.html)
in our examples and assume HTML has been fetched successfully.

## Usage

Example of how to use the library:

```python title="Basic Usage Example" linenums="1"
from urllib.request import urlopen
Expand Down Expand Up @@ -42,7 +57,7 @@ not.

### Core Methods

These methods must be available for all supported websites:
These methods are available for all the supported websites:

!!! warning "Under Construction"
This documentation section is currently being updated and improved.
Expand Down
5 changes: 2 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,16 @@
[![Github](https://img.shields.io/github/stars/hhursev/recipe-scrapers?style=social)](https://github.com/hhursev/recipe-scrapers/)
[![Version](https://img.shields.io/pypi/v/recipe-scrapers.svg)](https://pypi.org/project/recipe-scrapers/)
[![Python Version](https://img.shields.io/pypi/pyversions/recipe-scrapers)](https://pypi.org/project/recipe-scrapers/)
[![Downloads](https://pepy.tech/badge/recipe-scrapers)](https://pepy.tech/project/recipe-scrapers)
[![GitHub Actions Unittests](https://github.com/hhursev/recipe-scrapers/actions/workflows/unittests.yaml/badge.svg?branch=main)](https://github.com/hhursev/recipe-scrapers/actions/)
[![Coveralls](https://coveralls.io/repos/hhursev/recipe-scraper/badge.svg?branch=main&service=github)](https://coveralls.io/github/hhursev/recipe-scraper?branch=main)
[![License](https://img.shields.io/github/license/hhursev/recipe-scrapers)](https://github.com/hhursev/recipe-scrapers/blob/main/LICENSE)

---

`recipe-scrapers` is a [python](https://www.python.org/) package for extracting recipe data from
`recipe-scrapers` is a [Python](https://www.python.org/) package for extracting recipe data from
cooking websites. It parses recipe information from either standard
[HTML](https://developer.mozilla.org/en-US/docs/Web/HTML) structure, [Schema](https://schema.org/)
markup (including JSON-LD, Microdata, and RDFa formats) or [OpenGraph](https://ogp.me/) metadata found.
markup (including JSON-LD, Microdata, and RDFa formats) or [OpenGraph](https://ogp.me/) metadata.

The package provides a simple and consistent API for retrieving data such as ingredients, instructions,
cooking times, and more.
Expand Down

0 comments on commit dc8b2ff

Please sign in to comment.