Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to add html and head tags into the whitelist. #1520

Open
extempl opened this issue Apr 23, 2021 · 5 comments
Open

Unable to add html and head tags into the whitelist. #1520

extempl opened this issue Apr 23, 2021 · 5 comments

Comments

@extempl
Copy link

extempl commented Apr 23, 2021

I have found the following comment in the code on the Whitelist class:

The cleaner and these whitelists assume that you want to clean a <code>body</code> fragment of HTML (to add user
 supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the
 document HTML around the cleaned body HTML, or create a whitelist that allows <code>html</code> and <code>head</code>
 elements as appropriate.

And was trying to allow the html/head/meta tags to be included into the result with code like this:

Whitelist.relaxed()
                    .addTags("!DOCTYPE html", "html", "head", "body", "meta", "style")
                    .addAttributes("meta", "charset");

Unfortunately, what I get from this is a code wrapped into <body> tag with tags like <style> moved inside the body.

@extempl
Copy link
Author

extempl commented Apr 23, 2021

Looks like this comment is 10-years old and obsolete. So is there is no way to include out-of-body tags into the process? Pretty critical in my case.

@RyderCRD
Copy link
Contributor

RyderCRD commented Apr 25, 2021

I was trying to reimplement your issue, could you please share url of the html?

@extempl
Copy link
Author

extempl commented Apr 26, 2021

Whitelist whiltelist = Whitelist.relaxed()
                    .addTags("!DOCTYPE html", "html", "head", "body", "meta", "style")
                    .addAttributes("meta", "charset");
String value = "<html><head><style>.some {color: red}</style></head><body>3<script>alert('pwned')</script>4</body></html>";
Jsoup.clean(value, whitelist);
// <body><style>.some {color: red}</style>34</body>

@Ruefors
Copy link

Ruefors commented May 22, 2021

Hello, I just made some modifications to the static method clean to solve the problem you mentioned. Here is my solution for your reference.
replace

Document dirty = parseBodyFragment(bodyHtml, baseUri);

to

Document dirty = parse(bodyHtml, baseUri);

Because that's what causes the head and body to blend together.
Then consider whether the head and body are in the white list respectively.

Ruefors added a commit to Ruefors/jsoup that referenced this issue May 22, 2021
@extempl
Copy link
Author

extempl commented May 22, 2021

Hello, I just made some modifications to the static method clean to solve the problem you mentioned. Here is my solution for your reference.
replace

Document dirty = parseBodyFragment(bodyHtml, baseUri);

to

Document dirty = parse(bodyHtml, baseUri);

Because that's what causes the head and body to blend together.
Then consider whether the head and body are in the white list respectively.

Thanks, I'll check it later, but as far as I remember - it is not enough. You can check my PR for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants