Skip to content

Commit

Permalink
Fix issues with unused namespace detection
Browse files Browse the repository at this point in the history
Adresses issue #1. Fixes a bug which caused unused namespaces to not be
removed, if no other namespace was used. Fixes a bug which caused
namespaces to be removed, which have been in used by attributes.

Adds a comment to the README.md that namespace removal will *not*
consider namespaces which are only used for a certain sub-tree of the
document.
  • Loading branch information
kristian committed Aug 6, 2020
1 parent c22ea1e commit 197305a
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 27 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

`minify-xml` is a lightweight and fast XML minifier for NodeJS with a command line.

Existing XML minifiers commonly only remove comments and whitespace between tags. This minifier also includes minification of tags, e.g. by collapsing the whitespace between multiple attributes. Additionally the minifier is able to remove any unused namespace declarations. `minify-xml` is based on regular expressions and thus executes blazingly fast.
Existing XML minifiers, such as `pretty-data` often do a pretty (*phun intended*) bad job minifying XML in usually only removing comments and whitespace between tags. `minify-xml` on the other hand also includes minification of tags, e.g. by collapsing the whitespace between multiple attributes and further minifications, such as the removal of unused namespace declarations. `minify-xml` is based on regular expressions and thus executes blazingly fast.

## Installation

Expand All @@ -20,10 +20,10 @@ const xml = `<Tag xmlns:used="used_ns" xmlns:unused="unused_ns">
With the default options all comments will be removed and whitespace
in tags, like spaces between attributes, will be collapsed / removed
-->
<AnotherTag attributeA = "..." attributeB = "..." />
<AnotherTag attributeA = "..." attributeB = "..." />
<!-- By default any unused namespaces will be removed from the tags: -->
<used:NamespaceTag>
<used:NamespaceTag used:attribute = "...">
any valid element content is left unaffected (strangely enough = " ... "
and even > are valid characters in XML, only &lt; must always be encoded)
</used:NamespaceTag>
Expand All @@ -35,7 +35,7 @@ console.log(minifyXML(code));
This outputs the minified XML:

```xml
<Tag xmlns:used="used_ns"><AnotherTag attributeA="..." attributeB="..."/><used:NamespaceTag>
<Tag xmlns:used="used_ns"><AnotherTag attributeA="..." attributeB="..."/><used:NamespaceTag used:attribute="...">
any valid element content is left unaffected (strangely enough = " ... "
and even > are valid characters in XML, only &lt; must always be encoded)
</used:NamespaceTag></Tag>
Expand All @@ -55,7 +55,7 @@ require("minify-xml").minify(`<tag/>`, { ... });

- `collapseWhitespaceInTags` (default: `true`): Collapse whitespace in tags like `<anyTag attributeA = "..." attributeB = "..." />`.

- `removeUnusedNamespaces` (default: `true`): Removes any namespaces from tags, which are not used anywhere in the document, like `<tag xmlns:unused="any_url" />`.
- `removeUnusedNamespaces` (default: `true`): Removes any namespaces from tags, which are not used anywhere in the document, like `<tag xmlns:unused="any_url" />`. Notice the word *anywhere* here, the minifier not does consider the structure of the XML document, thus namespaces which might be only used in a certain sub-tree of elements might not be removed, even though they are not used in that sub-tree.

## CLI

Expand Down
41 changes: 22 additions & 19 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@ function escapeRegExp(string) {
}
function findAllMatches(string, regexp, group) {
var match, matches = [];
while ((match = regexp.exec(string))) {
while ((match = regexp.exec(string))) { if (match[group]) {
matches.push(typeof group === 'number' ? match[group] : match);
} return matches;
} } return matches;
}

// note: this funky looking positive backward reference regular expression is necessary to match contents inside of tags <...>.
// this is due to that literally any character except <&" is allowed to be put next to everywhere in XML. as even > is a allowed
// note: this funky looking positive lookbehind regular expression is necessary to match contents inside of tags <...>. this
// is due to that literally any characters except <&" are allowed to be put next to everywhere in XML. as even > is an allowed
// character, simply checking for (?<=<[^>]*) would not do the trick if e.g. > is used inside of a tag attribute.
const emptyRegexp = new RegExp(), inTagPattern = /(?<=<[^=\s>]+(?:\s+[^=\s>]+\s*=\s*(?:"[^"]*"|'[^']*'))*\1)/;
function replaceInTag(xml, regexp, lookbehind, replacement) {
const emptyRegexp = new RegExp(), inTagPattern = /(?<=<[^\s>]+(?:\s+[^=\s>]+\s*=\s*(?:"[^"]*"|'[^']*'))*\1)/;
function replaceInTags(xml, regexp, lookbehind, replacement) {
if (!replacement) {
replacement = lookbehind;
lookbehind = emptyRegexp;
Expand All @@ -29,16 +29,16 @@ const defaultOptions = {
};

module.exports = {
minify: function(xml, userOptions) {
// mix in the user options
const options = {
minify: function(xml, options) {
// apply the default options
options = {
...defaultOptions,
...(userOptions || {})
...(options || {})
};

// remove XML comments <!-- ... -->
if (options.removeComments) {
xml = xml.replace(/<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)>/g, String());
xml = xml.replace(/<![ \r\n\t]*(?:--(?:[^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)>/g, String());
}

// remove whitespace between tags <anyTag /> <anyOtherTag />
Expand All @@ -48,20 +48,23 @@ module.exports = {

// remove / collapse multiple whitespace in tags <anyTag attributeA = "..." attributeB = "..." />
if (options.collapseWhitespaceInTags) {
xml = replaceInTag(xml, /\s*=\s*/, /\s+[^=\s>]+/, "="); // remove leading / tailing whitespace around = "..."
xml = replaceInTag(xml, /\s+/, " "); // collapse whitespace between attributes
xml = replaceInTag(xml, /\s*(?=\/>)/, String()); // remove whitespace before closing > /> of tags
xml = replaceInTags(xml, /\s*=\s*/, /\s+[^=\s>]+/, "="); // remove leading / tailing whitespace around = "..."
xml = replaceInTags(xml, /\s+/, " "); // collapse whitespace between attributes
xml = replaceInTags(xml, /\s*(?=\/>)/, String()); // remove whitespace before closing > /> of tags
}

// remove namespace declarations which are not used anywhere in the document
// remove namespace declarations which are not used anywhere in the document (limitation: the approach taken here will not consider the structure of the XML document
// thus namespaces which might be only used in a certain sub-tree of elements might not be removed, even though they are not used in that sub-tree)
if (options.removeUnusedNamespaces) {
// the search for all xml namespaces could result in some "fake" namespaces (e.g. if a xmlns:... string is found inside the content of an element), as we do not
// limit the search to the inside of tags. this however comes with no major drawback as we the replace only inside of tags and thus it simplifies the search
var all = findAllMatches(xml, /\sxmlns:([^\s\/]+)=/g, 1), used = findAllMatches(xml, /<([^\s\/]+):/g, 1),
unused = all.filter(ns => !used.includes(ns));
var all = findAllMatches(xml, /\sxmlns:([^\s\/]+)=/g, 1), used = [
...findAllMatches(xml, /<([^\s\/]+):/g, 1), // look for all tags with namespaces
...findAllMatches(xml, /<[^\s>]+(?:\s+(?:([^=\s>]+):[^=\s>]+)\s*=\s*(?:"[^"]*"|'[^']*'))*/g, 1) // look for all attributes with namespaces
], unused = all.filter(ns => !used.includes(ns));

if (used.length) {
xml = replaceInTag(xml, new RegExp(`\\s+xmlns:(?:${ unused.map(escapeRegExp).join("|") })=(?:"[^"]*"|'[^']*')`), String());
if (unused.length) {
xml = replaceInTags(xml, new RegExp(`\\s+xmlns:(?:${ unused.map(escapeRegExp).join("|") })=(?:"[^"]*"|'[^']*')`), String());
}
}

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "minify-xml",
"version": "2.0.0",
"version": "2.0.1",
"description": "Fast XML minifier / compressor / uglifier with a command-line",
"keywords": [
"XML",
Expand Down
4 changes: 2 additions & 2 deletions test.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ const xml = `<Tag xmlns:used="used_ns" xmlns:unused="unused_ns">
With the default options all comments will be removed and whitespace
in tags, like spaces between attributes, will be collapsed / removed
-->
<AnotherTag attributeA = "..." attributeB = "..." />
<AnotherTag attributeA = "..." attributeB = "..." />
<!-- By default any unused namespaces will be removed from the tags: -->
<used:NamespaceTag>
<used:NamespaceTag used:attribute = "...">
any valid element content is left unaffected (strangely enough = " ... "
and even > are valid characters in XML, only &lt; must always be encoded)
</used:NamespaceTag>
Expand Down

0 comments on commit 197305a

Please sign in to comment.