Paraphrases the content using pegasus from API or website using BeautifulSoup.
- BeautifulSoup
- requests
- urllib
- torch
- requests_html
- transformer
- Pegasus
- Python 3
Fetching Headlines: The script fetches headlines from any category of any website and displays them with their respective indexes to select them.
Fetching Article Content: It allows users to select a headline by its index and fetches the content of the corresponding article along with paraphrasing/summarizing it.
Install the required libraries using pip:
- pip install beautifulsoup4 requests requests-html torch transformers pegasus There might be some additional packages on which pegasus runs on. Install those as well. Choose the website which you want and add it to the base_url parameter. then select the category and add it to the relative_url parameter. Choose a headline by its index, and the script will fetch and display the article content.
The paraphrased content might not always be perfect, and manual review might be necessary depending on the element you want to extract data from. You need to provide the API for the site in which your blog would be uploaded. This script is for educational and demonstration purposes only. Ensure compliance with any website's terms of service when using their content. The Pegasus model used for paraphrasing needs to be fine-tuned for better results in production scenarios.In this project i had to split the data into chunks and further divide the data into paragraphs so ensure maximium accuracy so that the AI couldn't hallucinate.