refactor(aria_metadata): generate ARIA metadata from specification #4055

Conaclos · 2024-09-23T16:01:00Z

Summary

Our ARIA metadata are currently handwritten and mostly based on ARIA 1.1 (some of them are in ARIA 1.2).
Some of these metadata are erroneous or incomplete.
I first started to update them manually. However, this took too much time and it is error-prone.

The aria-query NPM package provides a JSON file with the ARIA metadata.
This seems to be manually updated.
I also noticed several issues and some lack of fidelity with the specification (they mix related and base concepts for example).

Thus, I decided to write a script that extracts ARIA metadata directly from the specification.
The script is written in JavaScript and relies on happy-dom NPM package that parse the HTML page of the ARIA specification.
I chose happy-dom because it has less dependencies than jsdom and linkedom. It is also more used than linkedom.
This is a best effort approach, however the generated data seems pretty good (I tried for the three available versions of ARIA).
I placed the script in packages/aria-data.

I updated the biome_aria_metadata build.rs script to generate its code from aria-data.json.
For now, aria-data.json is symlinked to packages/aria-data/aria-data-1-2.json.
I used the prettyplease crate (widely used in the community) to format the output of build.rs.
I think it is a better alternative to starting a process to run rustfmt.

Now that we have a pretty reliable ARIA metadata source, we will be able to transfer more stuff from biome_aria to biome_aria_metadata.

Test Plan

CI must pass.

github-actions · 2024-09-23T16:26:55Z

Parser conformance results on

js/262

Test result	`main` count	This PR count	Difference
Total	48515	48515	0
Passed	47324	47324	0
Failed	1191	1191	0
Panics	0	0	0
Coverage	97.55%	97.55%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6570	6570	0
Passed	2203	2203	0
Failed	4367	4367	0
Panics	0	0	0
Coverage	33.53%	33.53%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	680	680	0
Passed	608	608	0
Failed	72	72	0
Panics	0	0	0
Coverage	89.41%	89.41%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	18431	18431	0
Passed	14114	14114	0
Failed	4317	4317	0
Panics	0	0	0
Coverage	76.58%	76.58%	0.00%

codspeed-hq · 2024-09-23T16:56:35Z

CodSpeed Performance Report

Merging #4055 will improve performances by 6.28%

_{Comparing conaclos/aria-data (1b975f4) with main (a3483e4)}

Summary

⚡ 1 improvements
✅ 106 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`conaclos/aria-data`	Change
⚡	`db_17847247775464589309.json[cached]`	13.9 ms	13.1 ms	+6.28%

xtask/codegen/Cargo.toml

Conaclos · 2024-09-23T22:51:46Z

@ematipico I chose happy-dom because it has less dependency than linkedom and has more NPM downloads. Thanks for the suggestion!

DaniGuardiola · 2024-09-24T10:03:10Z

A DOM library like happy-dom or jsdom is perfectly fine and gets the job done, but just wanted to make a note that for most scraping needs of mostly static content you don't really need a DOM, and you can get by with a simple XML/HTML parser like cheerio. This is normally at least some orders of magnitude faster, and there are some tools that simplify this further with a scraping-focused API like surgeon.

All of this said, if this is something that just needs to run manually every once in a while, a DOM library is perfectly fine. However, if there are plans to run this frequently (e.g. in CI for every PR), then I'd strongly suggest switching to a parser approach.

ematipico · 2024-09-24T11:44:10Z

We will rarely run this job, considering that it's computed from source.

Conaclos · 2024-09-24T11:49:49Z

@DaniGuardiola

Thanks for the pointers.
Indeed, cheerio could be a better option.
I decided to keep happy-dom because it is good enough and as you said it is rarely run.
The script is there for reproducibility and to be able to extract data once ARIA 1.3 is stable.

Conaclos marked this pull request as ready for review September 23, 2024 16:01

Conaclos requested review from a team September 23, 2024 16:01

github-actions bot added the A-Tooling Area: internal tools label Sep 23, 2024

dyc3 reviewed Sep 23, 2024

View reviewed changes

xtask/codegen/Cargo.toml Outdated Show resolved Hide resolved

Conaclos force-pushed the conaclos/aria-data branch from fa546d0 to a290a03 Compare September 23, 2024 19:38

github-actions bot removed the A-Tooling Area: internal tools label Sep 23, 2024

Conaclos force-pushed the conaclos/aria-data branch from a290a03 to 61061da Compare September 23, 2024 22:48

Conaclos force-pushed the conaclos/aria-data branch 9 times, most recently from 24b0bc0 to a3aeba5 Compare September 24, 2024 00:23

Conaclos force-pushed the conaclos/aria-data branch from a3aeba5 to 1b975f4 Compare September 24, 2024 11:24

refactor(aria_metadata): generate ARIA metadata from specification

1b975f4

Conaclos merged commit 93c08b9 into main Sep 24, 2024
14 checks passed

Conaclos deleted the conaclos/aria-data branch September 24, 2024 11:43

Conaclos mentioned this pull request Oct 10, 2024

📎 Generate ARIA metadata #4241

Closed

WomB0ComB0 mentioned this pull request Nov 1, 2024

[Snyk] Upgrade @biomejs/biome from 1.9.2 to 1.9.3 WomB0ComB0/portfolio#157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(aria_metadata): generate ARIA metadata from specification #4055

refactor(aria_metadata): generate ARIA metadata from specification #4055

Conaclos commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Sep 23, 2024

codspeed-hq bot commented Sep 23, 2024 •

edited

Loading

Conaclos commented Sep 23, 2024

DaniGuardiola commented Sep 24, 2024

ematipico commented Sep 24, 2024

Conaclos commented Sep 24, 2024

refactor(aria_metadata): generate ARIA metadata from specification #4055

refactor(aria_metadata): generate ARIA metadata from specification #4055

Conversation

Conaclos commented Sep 23, 2024 • edited Loading

Summary

Test Plan

github-actions bot commented Sep 23, 2024

Parser conformance results on

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

codspeed-hq bot commented Sep 23, 2024 • edited Loading

CodSpeed Performance Report

Merging #4055 will improve performances by 6.28%

Summary

Benchmarks breakdown

Conaclos commented Sep 23, 2024

DaniGuardiola commented Sep 24, 2024

ematipico commented Sep 24, 2024

Conaclos commented Sep 24, 2024

Conaclos commented Sep 23, 2024 •

edited

Loading

codspeed-hq bot commented Sep 23, 2024 •

edited

Loading