Skip to content

Lexrank sentence salience ranking algorithm for the @retextjs natural language processor

License

Notifications You must be signed in to change notification settings

gorango/retext-lexrank

Repository files navigation

Retext Lexrank

Build Coverage Types Size

Retext plugin for generating unsupervised text summarization using the Lexrank algorithm.

Install

npm i --save retext-lexrank

Use

import { unified } from 'unified'
import latin from 'retext-latin'
import lexrank from 'retext-lexrank'

const processor = unified()
  .use(latin)
  .use(lexrank)

const file = '...' // vfile or text string
const tree = processor.parse(file)

processor.run(tree, file)

Adding the part-of-speech and keywords plugins to the pipeline yields more polarized results.

import { unified } from 'unified'
import latin from 'retext-latin'
import pos from 'retext-pos'
import keywords from 'retext-keywords'
import lexrank from 'retext-lexrank'

const processor = unified()
  .use(latin)
  .use(pos)
  .use(keywords)
  .use(lexrank)

Example

Note

The retext-lexrank plugin works best on medium-to-long samples of text, like web articles, blogs, and essays. The following is a simple example.

Using the classic write-music sample from the unifiedjs use-cases:

Write Music (by Gary Provost)

This sentence has five words. Here are five more words.
Five word sentences are fine. But several together
become monotonous. Listen to what is happening. The
writing is getting boring. The sound of it drones. It's
like a stuck record. The ear demands some variety.

Now listen. I vary the sentence length, and I create
music. Music. The writing sings. It has a pleasant
rhythm, a lilt, a harmony. I use short sentences. And I
use sentences of medium length. And sometimes when I am
certain the reader is rested, I will engage him with a
sentence of considerable length, a sentence that burns
with energy and builds with all the impetus of a
crescendo, the roll of the drums, the crash of the
cymbals—sounds that say listen to this, it is important.

So write with a combination of short, medium, and long
sentences. Create a sound that pleases the reader's ear.
Don't just write words. Write music.

Supplying the above text to the processor, we can then find the top-ranked sentences:

import { selectAll } from 'unist-util-select'
import { toString } from 'nlcst-to-string'

selectAll('SentenceNode', tree)
  .sort(({ data: { lexrank: a } }, { data: { lexrank: b } }) => b - a)
  .slice(0, 3)
  .forEach(sentence => {
    const score = sentence.data.lexrank.toFixed(2)
    console.log(`[${score}]: ${toString(sentence)}`)
  })

Running the above yields:

[1.00]: I vary the sentence length, and I create music.
[0.85]: And I use sentences of medium length.
[0.71]: So write with a combination of short, medium, and long sentences.

Tests

Run npm test to run tests.

Run npm coverage to produce a test coverage report.

License

MIT © Goran Spasojevic

About

Lexrank sentence salience ranking algorithm for the @retextjs natural language processor

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published