Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(aria_metadata): generate ARIA metadata from specification #4055

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

Conaclos
Copy link
Member

@Conaclos Conaclos commented Sep 23, 2024

Summary

Our ARIA metadata are currently handwritten and mostly based on ARIA 1.1 (some of them are in ARIA 1.2).
Some of these metadata are erroneous or incomplete.
I first started to update them manually. However, this took too much time and it is error-prone.

The aria-query NPM package provides a JSON file with the ARIA metadata.
This seems to be manually updated.
I also noticed several issues and some lack of fidelity with the specification (they mix related and base concepts for example).

Thus, I decided to write a script that extracts ARIA metadata directly from the specification.
The script is written in JavaScript and relies on happy-dom NPM package that parse the HTML page of the ARIA specification.
I chose happy-dom because it has less dependencies than jsdom and linkedom. It is also more used than linkedom.
This is a best effort approach, however the generated data seems pretty good (I tried for the three available versions of ARIA).
I placed the script in packages/aria-data.

I updated the biome_aria_metadata build.rs script to generate its code from aria-data.json.
For now, aria-data.json is symlinked to packages/aria-data/aria-data-1-2.json.
I used the prettyplease crate (widely used in the community) to format the output of build.rs.
I think it is a better alternative to starting a process to run rustfmt.

Now that we have a pretty reliable ARIA metadata source, we will be able to transfer more stuff from biome_aria to biome_aria_metadata.

Test Plan

CI must pass.

@Conaclos Conaclos marked this pull request as ready for review September 23, 2024 16:01
@Conaclos Conaclos requested review from a team September 23, 2024 16:01
@github-actions github-actions bot added the A-Tooling Area: internal tools label Sep 23, 2024
Copy link
Contributor

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 48515 48515 0
Passed 47324 47324 0
Failed 1191 1191 0
Panics 0 0 0
Coverage 97.55% 97.55% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6570 6570 0
Passed 2203 2203 0
Failed 4367 4367 0
Panics 0 0 0
Coverage 33.53% 33.53% 0.00%

ts/babel

Test result main count This PR count Difference
Total 680 680 0
Passed 608 608 0
Failed 72 72 0
Panics 0 0 0
Coverage 89.41% 89.41% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 18431 18431 0
Passed 14114 14114 0
Failed 4317 4317 0
Panics 0 0 0
Coverage 76.58% 76.58% 0.00%

Copy link

codspeed-hq bot commented Sep 23, 2024

CodSpeed Performance Report

Merging #4055 will improve performances by 6.28%

Comparing conaclos/aria-data (1b975f4) with main (a3483e4)

Summary

⚡ 1 improvements
✅ 106 untouched benchmarks

Benchmarks breakdown

Benchmark main conaclos/aria-data Change
db_17847247775464589309.json[cached] 13.9 ms 13.1 ms +6.28%

xtask/codegen/Cargo.toml Outdated Show resolved Hide resolved
@Conaclos
Copy link
Member Author

@ematipico I chose happy-dom because it has less dependency than linkedom and has more NPM downloads. Thanks for the suggestion!

@Conaclos Conaclos force-pushed the conaclos/aria-data branch 9 times, most recently from 24b0bc0 to a3aeba5 Compare September 24, 2024 00:23
@DaniGuardiola
Copy link
Contributor

A DOM library like happy-dom or jsdom is perfectly fine and gets the job done, but just wanted to make a note that for most scraping needs of mostly static content you don't really need a DOM, and you can get by with a simple XML/HTML parser like cheerio. This is normally at least some orders of magnitude faster, and there are some tools that simplify this further with a scraping-focused API like surgeon.

All of this said, if this is something that just needs to run manually every once in a while, a DOM library is perfectly fine. However, if there are plans to run this frequently (e.g. in CI for every PR), then I'd strongly suggest switching to a parser approach.

@Conaclos Conaclos merged commit 93c08b9 into main Sep 24, 2024
14 checks passed
@Conaclos Conaclos deleted the conaclos/aria-data branch September 24, 2024 11:43
@ematipico
Copy link
Member

We will rarely run this job, considering that it's computed from source.

@Conaclos
Copy link
Member Author

@DaniGuardiola

Thanks for the pointers.
Indeed, cheerio could be a better option.
I decided to keep happy-dom because it is good enough and as you said it is rarely run.
The script is there for reproducibility and to be able to extract data once ARIA 1.3 is stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants