Metric computation #43

foolip · 2022-01-10T16:34:48Z

This is a proposal for how the per-test, per-area and total Interop 2022 metric will work. Compat 2021 scoring worked like this:

Every test is scored between 0 and 1. This avoids tests with many subtests being given much larger weight than for example reftests.
Every area is scored between 0 and 1, adding up the scores and dividing by the number of tests.
The overall score is the sum of area scores divided by the number of areas (5).

I suggest that we do basically the same for Interop 2022. There are 10 new areas in Interop 2022 (see #42) and 5 areas from Interop 2021 to include. Since that works out to a pretty reasonable 1/3 vs. 2/3 weighting for the previous year vs. current year, I propose that we simply treat it as 15 areas, each of which gets equal weight.

P.S. For Interop 2023 we probably won't add 30 areas to get the same 1/3 vs. 2/3 ratio. My idea is that we would then give the new areas 2/3 or the weight and all of the old tests 1/3 of the weight regardless of the number of areas. But we don't need to decide on that now.

foolip · 2022-01-10T16:38:21Z

One minor tweak I'd like to experiment with is using only integer math, and scoring everything between 0 and 1000. This will make the order of summation irrelevant, and will reduce decimal noise in some diffs, which I've found to be a bit of a nuisance for understanding Compat 2021 score changes. I'd suggest 1000 instead of 100 so that we have the option of showing a decimal point.

foolip · 2022-01-13T13:03:22Z

I have prototyped this in Ecosystem-Infra/wpt-results-analysis#73 with a preview in https://gist.github.com/foolip/25c9ed482a0dd802f9bf2eea4544ccac.

There may be bugs and the code needs to be moved to another repo entirely and cleanup up, but it gives some idea of what the starting point for Interop 2022 will be.

gsnedders · 2022-01-13T17:39:45Z

One minor tweak I'd like to experiment with is using only integer math, and scoring everything between 0 and 1000.

Is there any reason to do this versus using something like fraction.js?

foolip · 2022-02-23T16:58:33Z

Metrics computation was implemented in Ecosystem-Infra/wpt-results-analysis#73.

web-platform-tests/rfcs#106 tracks moving this code into the WPT org.

@gsnedders @jgraham and I have discussed integer math vs. other approaches in the matrix chat without a clear conclusion, but Ecosystem-Infra/wpt-results-analysis#90 serves as a reminder that we haven't fully resolved this.

foolip mentioned this issue Jan 11, 2022

Agenda for Jan 13 meeting #44

Closed

foolip closed this as completed Feb 23, 2022

gsnedders added the meta Process and/or repo issues label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric computation #43

Metric computation #43

foolip commented Jan 10, 2022

foolip commented Jan 10, 2022

foolip commented Jan 13, 2022

gsnedders commented Jan 13, 2022

foolip commented Feb 23, 2022

Metric computation #43

Metric computation #43

Comments

foolip commented Jan 10, 2022

foolip commented Jan 10, 2022

foolip commented Jan 13, 2022

gsnedders commented Jan 13, 2022

foolip commented Feb 23, 2022