Skip to content

This extracts the text from a website. Ideal for passing it into a language model for processing and summarization.

License

Notifications You must be signed in to change notification settings

Voltaic314/Website-Text-Extraction

Repository files navigation

Website-Text-Extraction

This extracts the text from a website. Ideal for passing it into a language model for processing and summarization.

This repository is a collection of scripts used to help with extracting body text from a website (like an article) and then writing it to a text file.

This text can then be passed into an LLM of your choosing to summarize the text, train it, whatever you want. I intended to use this for finance related purposes. Like summarizing an article posted for investors or summarizing a new news article to help with trading algorithms.

Ideally in a perfect world you'd get real world news from a news API of some kind, that would be much faster and more efficient than parsing and summarizing all of the HTML body text of a website.

About

This extracts the text from a website. Ideal for passing it into a language model for processing and summarization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages