Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the data format #160

Open
annevk opened this issue Apr 2, 2023 · 7 comments
Open

Changes to the data format #160

annevk opened this issue Apr 2, 2023 · 7 comments

Comments

@annevk
Copy link
Contributor

annevk commented Apr 2, 2023

I want to make two changes to the ways attributes are serialized to ensure better test coverage:

  • They are no longer sorted. We enforce insertion order as the specification does.
  • We serialize their qualified name, including prefix, if any.

As an example, https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/tests10.dat#L388-L401 looks like

#data
<!DOCTYPE html><body xlink:href=foo xml:lang=en><svg><g xml:lang=en xlink:href=foo></g></svg>
#errors
#document
| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     xlink:href="foo"
|     xml:lang="en"
|     <svg svg>
|       <svg g>
|         xlink href="foo"
|         xml lang="en"

today and the last part would change to

|       <svg g>
|         xml xml:lang="en"
|         xlink xlink:href="foo"

to account for this. This should improve coverage a bit.

@gsnedders
Copy link
Member

Per #127 (comment), @hsivonen said:

Since the non-browser test harnesses for the Validator.nu HTML Parser use the present input formats and are more sensitive to format changes that, as I understand it, the html5lib harness, I'd prefer to avoid format changes and I'd like to keep the non-scripted tree construction tests clearly separate from the scripted ones.

I'm not totally sure what that was specifically about; in principle we've had a format change in e1f5573 which means the scripted/not-scripted distinction is normatively within the test.

That said, I think it is fair to say that we should be relatively conservative with making format changes—it incurs work for quite a lot of people, which means we might want to have a discussion about whether there are other format changes we should make at the same time.

@hsivonen
Copy link
Member

hsivonen commented Apr 3, 2023

The Validator.nu harness has been pretty sensitive to the order of the hash-prefixed sections, since the Validator.nu harness reads the test files as a stream instead of treating them as one big random-access thing.

Changes in serialization (like proposed here) are easier to deal with than having the hash-prefixed sections in variable order.

@gsnedders
Copy link
Member

Changes in serialization (like proposed here) are easier to deal with than having the hash-prefixed sections in variable order.

FWIW, this is another thing I was trying to sort out in #83 years ago, adding linting to assert the order is what it's meant to be.

@not-my-profile
Copy link

How does serializing attributes in insertion order improve test coverage? I'd think that the order doesn't matter.

@annevk
Copy link
Contributor Author

annevk commented Sep 26, 2023

The order matters.

@not-my-profile
Copy link

How does it matter?

@annevk
Copy link
Contributor Author

annevk commented Sep 26, 2023

I'm not sure what you mean. Element attributes are defined to be an insertion order. This is observable through various APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants