Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric computation #43

Closed
foolip opened this issue Jan 10, 2022 · 4 comments
Closed

Metric computation #43

foolip opened this issue Jan 10, 2022 · 4 comments
Labels
meta Process and/or repo issues

Comments

@foolip
Copy link
Member

foolip commented Jan 10, 2022

This is a proposal for how the per-test, per-area and total Interop 2022 metric will work. Compat 2021 scoring worked like this:

  • Every test is scored between 0 and 1. This avoids tests with many subtests being given much larger weight than for example reftests.
  • Every area is scored between 0 and 1, adding up the scores and dividing by the number of tests.
  • The overall score is the sum of area scores divided by the number of areas (5).

I suggest that we do basically the same for Interop 2022. There are 10 new areas in Interop 2022 (see #42) and 5 areas from Interop 2021 to include. Since that works out to a pretty reasonable 1/3 vs. 2/3 weighting for the previous year vs. current year, I propose that we simply treat it as 15 areas, each of which gets equal weight.

P.S. For Interop 2023 we probably won't add 30 areas to get the same 1/3 vs. 2/3 ratio. My idea is that we would then give the new areas 2/3 or the weight and all of the old tests 1/3 of the weight regardless of the number of areas. But we don't need to decide on that now.

@foolip
Copy link
Member Author

foolip commented Jan 10, 2022

One minor tweak I'd like to experiment with is using only integer math, and scoring everything between 0 and 1000. This will make the order of summation irrelevant, and will reduce decimal noise in some diffs, which I've found to be a bit of a nuisance for understanding Compat 2021 score changes. I'd suggest 1000 instead of 100 so that we have the option of showing a decimal point.

@foolip
Copy link
Member Author

foolip commented Jan 13, 2022

I have prototyped this in Ecosystem-Infra/wpt-results-analysis#73 with a preview in https://gist.github.com/foolip/25c9ed482a0dd802f9bf2eea4544ccac.

There may be bugs and the code needs to be moved to another repo entirely and cleanup up, but it gives some idea of what the starting point for Interop 2022 will be.

@gsnedders
Copy link
Member

One minor tweak I'd like to experiment with is using only integer math, and scoring everything between 0 and 1000.

Is there any reason to do this versus using something like fraction.js?

@foolip
Copy link
Member Author

foolip commented Feb 23, 2022

Metrics computation was implemented in Ecosystem-Infra/wpt-results-analysis#73.

web-platform-tests/rfcs#106 tracks moving this code into the WPT org.

@gsnedders @jgraham and I have discussed integer math vs. other approaches in the matrix chat without a clear conclusion, but Ecosystem-Infra/wpt-results-analysis#90 serves as a reminder that we haven't fully resolved this.

@foolip foolip closed this as completed Feb 23, 2022
@gsnedders gsnedders added the meta Process and/or repo issues label Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Process and/or repo issues
Projects
None yet
Development

No branches or pull requests

2 participants