openGraphScraper

A simple node module(with TypeScript declarations) for scraping Open Graph and Twitter Card and other metadata off a site.

Note: open-graph-scraper doesn't support browser usage at this time but you can use open-graph-scraper-lite if you already have the HTML and can't use Node's Fetch API.

Installation

npm install open-graph-scraper --save

Usage

const ogs = require('open-graph-scraper');
const options = { url: 'http://ogp.me/' };
ogs(options)
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  })

Results JSON

Check the return for a success flag. If success is set to true, then the url input was valid. Otherwise it will be set to false. The above example will return something like...

{
  ogTitle: 'Open Graph protocol',
  ogType: 'website',
  ogUrl: 'https://ogp.me/',
  ogDescription: 'The Open Graph protocol enables any web page to become a rich object in a social graph.',
  ogImage: [
    {
      height: '300',
      type: 'image/png',
      url: 'https://ogp.me/logo.png',
      width: '300'
    }
  ],
  charset: 'utf-8',
  requestUrl: 'http://ogp.me/',
  success: true
}

Options

Name	Info	Default Value	Required
url	URL of the site.		x
html	You can pass in an HTML string to run ogs on it. (use without options.url)
fetchOptions	Options that are used by the Fetch API	{}
timeout	Request timeout for Fetch (Default is 10 seconds)	10
blacklist	Pass in an array of sites you don't want ogs to run on.	[]
onlyGetOpenGraphInfo	Only fetch open graph info and don't fall back on anything else. Also accepts an array of properties for which no fallback should be used	false
customMetaTags	Here you can define custom meta tags you want to scrape.	[]
urlValidatorSettings	Sets the options used by validator.js for testing the URL	Here

Note: open-graph-scraper uses the Fetch API for requests and most of Fetch's options should work as open-graph-scraper's fetchOptions options.

Types And Import Example

// example of how to get types
import type { SuccessResult } from 'open-graph-scraper/types';
const example: SuccessResult = {
  result: { ogTitle: 'this is a title' },
  error: false,
  response: {},
  html: '<html></html>'
}

// import example
import ogs from 'open-graph-scraper';
const options = { url: 'http://ogp.me/' };
ogs(options)
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  });

Custom Meta Tag Example

const ogs = require('open-graph-scraper');
const options = {
  url: 'https://github.com/jshemas/openGraphScraper',
  customMetaTags: [{
    multiple: false, // is there more than one of these tags on a page (normally this is false)
    property: 'hostname', // meta tag name/property attribute
    fieldName: 'hostnameMetaTag', // name of the result variable
  }],
};
ogs(options)
  .then((data) => {
    const { result } = data;
    console.log('hostnameMetaTag:', result.customMetaTags.hostnameMetaTag); // hostnameMetaTag: github.com
  })

HTML Example

const ogs = require('open-graph-scraper');
const options = {
  html: `<html><head>
  <link rel="icon" type="image/png" href="https://bar.com/foo.png" />
  <meta charset="utf-8" />
  <meta property="og:description" name="og:description" content="html description example" />
  <meta property="og:image" name="og:image" content="https://www.foo.com/bar.jpg" />
  <meta property="og:title" name="og:title" content="foobar" />
  <meta property="og:type" name="og:type" content="website" />
  </head></html>`
};
ogs(options)
  .then((data) => {
    const { result } = data;
    console.log('result:', result);
    // result: {
    //   ogDescription: 'html description example',
    //   ogTitle: 'foobar',
    //   ogType: 'website',
    //   ogImage: [ { url: 'https://www.foo.com/bar.jpg', type: 'jpg' } ],
    //   favicon: 'https://bar.com/foo.png',
    //   charset: 'utf-8',
    //   success: true
    // }
  })

User Agent Example

The request header is set to undici by default. Some sites might block this, and changing the userAgent might work. If not you can try using a proxy for the request and then pass the html into open-graph-scraper.

const ogs = require("open-graph-scraper");
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36';
ogs({ url: 'https://www.wikipedia.org/', fetchOptions: { headers: { 'user-agent': userAgent } } })
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  })

Running the example app

Inside the example folder contains a simple express app where you can run npm ci && npm run start to spin up. Once the app is running, open a web browser and go to http://localhost:3000/scraper?url=http://ogp.me/ to test it out. There is also a Dockerfile if you want to run this example app in a docker container.

Name		Name	Last commit message	Last commit date
Latest commit History 1,190 Commits
.github		.github
example		example
lib		lib
tests		tests
types		types
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.snyk		.snyk
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.declaration.json		tsconfig.declaration.json
tsconfig.json		tsconfig.json
tsconfig.tests.json		tsconfig.tests.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openGraphScraper

Installation

Usage

Results JSON

Options

Types And Import Example

Custom Meta Tag Example

HTML Example

User Agent Example

Running the example app

About

Releases

Packages

Contributors 39

Languages

License

jshemas/openGraphScraper

Folders and files

Latest commit

History

Repository files navigation

openGraphScraper

Installation

Usage

Results JSON

Options

Types And Import Example

Custom Meta Tag Example

HTML Example

User Agent Example

Running the example app

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 39

Languages

Packages