-
-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of ElementTree changes HTML output #558
Comments
I understand this is a breaking change (if not reverted, it should be documented). However I think that security-wise, it's a much better default behavior than the previous one. Maybe we should add an option to the format so that the client code can specify that it already cared for escaping the content (and thus avoid the double-escaping)? |
As long as the escaping is done we can handle it ok (so no need for an additional option), however it would be ideal if we could opt into the new ElementTree format when we are ready. Perhaps to do that you could add a deprecation path using a settings flag so that clients can opt into ElementTree? I think without this you could well get complaints that exports have changed when the next release goes out. Then it could be documented and the change explained (it took me a little while to figure out why our CI build was broken). |
Hi Claude - are you ok with me submitting a PR to try to make this new dependency on ElementTree optional? Then we can plan the deprecation in django-import-export. Please let me know and I'll pull it together. |
Submitting PRs doesn't require any permission 😄 |
Understood... Just wanted to double-check it was something you would consider before I started. |
I looked into whether we could control escaping of text elements using a parameter or similar, but this is not possible (certain characters always get escaped). |
Thanks for suggesting the pull request. However, I'm not thrilled at all by generating HTML by hand as a string (both performance and security-wise). Hopefully other contributors will chime in and tell us their feelings. Did you try on the django-import-export side to conditionally escape based on tablib version? Would it be doable? |
Thanks Claude, I did think there must be a good reason why the HTML has not been generated by hand previously, as it seems like the simplest approach considering we are outputting a table. Perhaps escaping the values by default (i.e. with I cannot see performance being an issue except maybe for very large datasets, and even then I don't think it would be too dissimilar to existing versions.
It would not be doable because the current version of tablib will always escape (because it uses ElementTree). We'd have to find a way to conditionally load a previous version of tablib (i.e. with MarkupPy) to ensure we didn't break any existing implementations. It might be that we have to release our new version with the ElementTree dependency, and see if anyone complains. |
I see django-import-export supports html export without escaping content. Do you think that there are valid use cases for that capability? Is deprecating this an option for you? |
I think there is a valid use case, which is if users already have escaped content in their db, and they want to export without double escaping. However, I don't know if anyone does this, and I think it is unlikely that they do.
Yes, I think this is the way to go. I will pin the latest release of tablib in import-export, and then add a note to advise that escaping will be mandatory in a future release. If anyone complains, I can think again, but otherwise I would then be free to upgrade to the latest version of tablib in future. Thanks for your help with this. Feel free to close the PR if you wish. |
For reference, the import-export PR is here |
I think it might be easier all around to pin to 3.5.0 until you're ready to upgrade, and that the next Tablib should be 4.0.0, and mention the change clearly in the release notes. Thank you though for testing the |
I noticed that this change breaks the CI build in django-import-export. This is for cases when we are escaping harmful characters such as script tags.
If there is already an escaped character in content, then the use of ElementTree will double escape, which will not format correctly. This test (added to
HTMLTests
) will reproduce the issue:(this test passes in v3.3.0)
This test fails because the '&' string is exported as '&' So if someone has 'J & J' in their content (i.e. rendered in escape
& amp;
), they will now get "double escaping" 'J & amp;amp; J'.The text was updated successfully, but these errors were encountered: