Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines in attributes are not escaped #152

Open
jwoudenberg opened this issue Feb 2, 2020 · 1 comment
Open

Newlines in attributes are not escaped #152

jwoudenberg opened this issue Feb 2, 2020 · 1 comment

Comments

@jwoudenberg
Copy link

jwoudenberg commented Feb 2, 2020

Hey there! Thank you everyone involved with creating and maintaining this library!

I'm trying to use this library to generate Junit-style XML reports, which our CI system (Jenkins) then parses and presents in a pretty way to users. I've been running into an issue though which I believe has to do with how this library encodes newlines in attributes (it doesn't).

Given the following XML document description:

doc = 
  Document
    (Prologue [] Nothing [])
    (Element "root" (fromList [("attr", "line\nline")]) [])
    []

renderText produces the following XML:

<?xml version="1.0" encoding="UTF-8"?><root attr="line\nline"/>

Parsing this back using parseText recovers the original value. That sounds good, but it's different from the behavior defined in the spec, which requires a normalization phase before parsing that would turn newlines into a spaces.

The XML parser in our Jenkins CI system does seem to perform this normalization step and so turns newlines in attributes into spaces. If we have something like a GHC compiler error, use this library to encode it into the attribute of an XML document, and then have Jenkins parse it back, then what gets displayed in the UI isn't particularly readable.

The Junit XML format specifies the main error message goes into an attribute. We've been able to work around this by storing it in an element intended to contain a stack trace of an error, and that Jenkins displays ungarbled under a 'Stacktrace' header. It works but isn't great.

This stack overflow answer claims that to avoid this problem the JSON renderer should escape newlines into &#10;. Making the replacement myself in a string before passing it into an attribute for xml-conduit to encode is no use, because xml-conduit will escape the & character. I checked that if I manually construct an XML string containing the &#10; then xml-conduit does parse that back into a newline.

Would a patch to escape newlines into &#10 during rendering be accepted? I imagine there's a risk of breaking people relying on the current behavior.

@k0ral
Copy link
Collaborator

k0ral commented Feb 3, 2020

I would gladly accept a pull-request that makes xml-conduit more standard-compliant.
This warrants the usual backward-incompatible-change precautions:

  • change must be mentioned in the changelog
  • major version bump: A.B+1.C.D
  • old release will be set as preferred version in Hackage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants