Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlAttribute.Value should use HtmlEntity.DeEntitize when setting _value #19

Closed
drbsoftware opened this issue Jun 17, 2017 · 12 comments
Closed
Assignees

Comments

@drbsoftware
Copy link

The Value of attribute data-foo=""Hello"" should be "Hello", not "Hello"
I think the fix is:

        {
            get
            {
                if (_value == null)
                {
                    _value = HtmlEntity.DeEntitize(ownerdocument.Text.Substring(_valuestartindex, _valuelength));
                }
                return _value;
            }
            set
            {
                _value = HtmlEntity.DeEntitize(value);
                if (_ownernode != null)
                {
                    _ownernode._innerchanged = true;
                    _ownernode._outerchanged = true;
                }
            }
        }
@JonathanMagnan JonathanMagnan self-assigned this Jun 18, 2017
@JonathanMagnan
Copy link
Member

JonathanMagnan commented Jun 18, 2017

Hello @drbsoftware ,

Thank you, my developer will look at this issue during the week.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @drbsoftware ,

The v1.5.0-beta92 now use your fix.

Let me know if that's correctly working.

Best Regards,

Jonathan

@yfayerman
Copy link

HtmlEntity.DeEntitize method should not be called when setting HtmlAttribute.Value as it must be properly encoded, see this.

As for getting the value, the property should, IMHO, return its original encoded value, and it's up to the caller whether to decode it or not.

@JonathanMagnan
Copy link
Member

Hello @yfayerman ,

Thank you for reporting. I will have my developer look at it and check if we must do it differently.

Best Regards,

Jonathan

@drbsoftware
Copy link
Author

8.1.2.3 Attributes in https://www.w3.org/TR/html5/syntax.html would seem to suggest that the parsing is the same for attributes and for HTML text.

In our case, we are using custom data-* attributes. The data is stored by JavaScript using the Chromium DOM. We retrieve the attribute contents using HTML Agility Pack. We expect the data retrieved to be the same as the data stored.

Thanks
David

@JonathanMagnan
Copy link
Member

Hello @yfayerman ,

The setter does not longer use DeEntitize.

The getter however still use it in order to solve @drbsoftware first issue. To mimic what a browser does. By example data-foo=""Hello"" are parsed as data-foo="\"Hello\"" in the code source of a browser.

Please @yfayerman , @drbsoftware , let me know if we can close this issue or there is much work to do on it.

Best Regards,

Jonathan

@drbsoftware
Copy link
Author

Jonathan, I think that looks right.
Thanks
David

@yfayerman
Copy link

@JonathanMagnan, that's fine with me.

@JonathanMagnan
Copy link
Member

Closing Comment: Fix confirmed

@JonathanMagnan
Copy link
Member

Hello @drbsoftware , @yfayerman ,

Finally, we had to rollback this fix in v1.5.1. @yfayerman is right, it should be up to the caller instead to DeEntitize or not the value. We cannot automatically DeEntitize it otherwise we will break some existing code like in the issue #32.

However, to make it easier, we added a new property DeEntitizeValue which return the expected value "Hello" if you want it decoded.

Best Regards,

Jonathan

@drbsoftware
Copy link
Author

Hi Jonathan,
It seems to me that the issues here and in #32 are:

  • What is written out should be what is read in (unless modified by options or by external software ).
  • What is accessed by .NET software should be equivalent to what is accessed by JavaScript on the DOM.

I think that the problem encountered by #32 is simply that the fix to #19 was incomplete. HtmlNode.WriteAttribute also needs to be modified to use Entitize.

David
BTW: Thank you for your efforts in keeping this library maintained.

@JonathanMagnan
Copy link
Member

Hello @drbsoftware ,

Thank you ;) We try to do our best!

Using the WebBrowser and some other HTML parser, you are right. The value should return "Hello"

However, the SetAttribute (using a WebBrowser) doesn't DeEntitize the value. " will be instead modified as " instead in the HTML so the value return "

Until we rewrite this library, I believe the best is to stay with the current behavior since a DeEntitizeValue property has been added. If you need, we can, however, add an option.

Way too much people are currently using the library with this behavior, so staying backward compatible is maybe more important in this case than doing the right thing.

Best Regards,

Jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants