Skip to content

A Ruby library for data extraction that can be used to make RSS feeds from webpages

License

Notifications You must be signed in to change notification settings

pyrmont/feedstock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feedstock

Gem Version

Feedstock is a Ruby library for extracting information from an HTML/XML document and inserting it into an ERB template. Its primary purpose is to create a feed for a webpage that doesn't offer one.

Rationale

I love RSS feeds.

That's why I think it's a shame not every website has a feed. However, even when a website does have a feed, sometimes it doesn't include quite the mix information that I want. I made Feedstock to solve those two problems.

Feedstock is a Ruby library that you can use to create an Atom or RSS feed. It requires a URL to a document and a hash of rules. The rules tell Feedstock how to extract and transform the data found on the webpage. That data is stuffed into a hash and then run through an ERB template. Feedstock comes with a template but you can use your own, too.

Example

The feeds.inqk.net repository includes an example of how the Feedstock library can be used in practice.

Installation

Feedstock is available as a gem:

$ gem install feedstock

Usage

Feedstock extracts information from a document at a given URL using a collection of rules. The feed is generated by calling Feedstock.feed as below:

# Define the URL
url = "https://example.org"

# Define the rules
rules = { info: { id: url,
                  title: Feedstock::Extract.new(selector: "div.title"),
                  updated: Feedstock::Extract.new(selector: "span.date") },

          entry: { id: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
                   title: Feedstock::Extract.new(selector: "h2"),
                   updated: Feedstock::Extract.new(selector: "span.date"),
                   author: Feedstock::Extract.new(selector: "span.byline"),
                   link: Feedstock::Extract.new(selector: "a", content: { attribute: "href" }),
                   summary: Feedstock::Extract.new(selector: "div.summary") },

          entries: Feedstock::Extract.new(selector: "div.story") }

# Using the default format and template
Feedstock.feed url, rules

# Using the XML format and a user-specified template
Feedstock.feed url, rules, :xml, "podcast.xml"

More information is available in api.md.

Bugs

Found a bug? I'd love to know about it. The best way is to report it in the Issues section on GitHub.

Versioning

Feedstock uses Semantic Versioning 2.0.0.

Licence

Feedstock is released into the public domain. See LICENSE for more details.