Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaping HTML entities #78

Open
ajhsu opened this issue Oct 3, 2018 · 0 comments
Open

Unescaping HTML entities #78

ajhsu opened this issue Oct 3, 2018 · 0 comments

Comments

@ajhsu
Copy link
Owner

ajhsu commented Oct 3, 2018

Unescaping HTML entities

Understanding HTML entity

The definition of character entity from Wikipedia:

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference (&#nnnn; or &#xhhhh;) and a character entity reference (&name;).

Lists of HTML entities

Related tools

Potential unescaping solutions

Unescaping numeric HTML entities

It's relatively easy to unescape those numeric HTML entities, you only need to pass character code into String.fromCharCode() method to unescape into its raw character.

Comprehensively unescaping HTML entities

In order to comprehensively unescaping HTML entities, including numeric and named characters; We need to leverage other solutions beyond the String.fromCharCode method.

.innerHTML

  • Pros
  • Cons

The he package

  • Pros
  • Cons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant