Lexentity: A context-aware, medium-neutral entity maker

by Sean Coates

Let's face it--this sentence is much "uglier" than the one below it.
Let’s face it–this sentence is much “prettier” than the one above it.

Lexentity is a simple piece of software that takes HTML as input and outputs a context-aware, medium-neutral representation of that HTML, with apostrophes, quotes, emdashes, ellipses, accents, etc., replaced with their respective numeric XML/Unicode entities.

Context-aware

Context is important. It is especially important when considering a piece of HTML like this:

<p>…and here's the example code:</p>
<pre><code>echo "watermelon!\n";</pre></code>

Contextually, you'd want here's to become here’s, but you certainly don't want the code to read echo “watermelon!\n”;.

A fancy/smart/curly quotes apostrophe is appropriate, but curly quotes in the code are likely to cause a parse error.

Lexentity understands its context, and acts appropriately, my means of lexical analysis, and turning tokens into text, not through a mostly-naive and overly-complicated regular expression.

Medium-neutral

My friend and colleague Jon Gibbins said it best in [http://dotjay.co.uk/2006/sep/named-html-entities-in-rss](this piece on his blog). In modern systems, you can't count on your HTML to always be represented as HTML. It's often (poorly) embedded in RSS or other HTML-like media, as XML.

Therefore, it is important to avoid HTML-specific entities like ” and …, and instead use their Unicode code point to form numeric entities such as …. This ensures proper display on any terminal that can properly render Unicode XML, and avoids missing entity errors.

Demo

Try a demo at http://files.seancoates.com/lexentity/.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
bin		bin
inc		inc
tests		tests
www		www
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexentity: A context-aware, medium-neutral entity maker

Context-aware

Medium-neutral

Demo

About

Releases

Packages

Languages

scoates/lexentity

Folders and files

Latest commit

History

Repository files navigation

Lexentity: A context-aware, medium-neutral entity maker

Context-aware

Medium-neutral

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages