Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail the test if the test redirected to a cross-origin page #102

Merged
merged 2 commits into from
Dec 5, 2023

Conversation

pmeenan
Copy link
Member

@pmeenan pmeenan commented Dec 4, 2023

This uses a new agent feature where the test result can be overridden by a custom metric named test_result to fail any test that ends up on a different origin from where the test started.

Sample test of https://www.google.com/ (successful) - here
Sample test of http://www.google.com/ (failed) - here

Screenshot 2023-12-04 at 4 54 35 PM

@pmeenan pmeenan requested a review from rviscomi December 4, 2023 21:55
@rviscomi
Copy link
Member

rviscomi commented Dec 4, 2023

Are all cross-origin redirects necessarily bad? For example, if the only origin we have for example.com is http and during testing it redirects to https, I think it's worth keeping it. However, if we have both http and https versions, and one redirects to the other, I think it's better to dedupe and drop the one that redirects. Similarly, if foo.com redirects to bar.com, that might be ok if bar.com isn't already in the corpus.

@pmeenan
Copy link
Member Author

pmeenan commented Dec 4, 2023

We would already have https://example.com/ in the corpus if it got sufficient traffic and the https page construction and stats aren't representative of the non-https case. Are there any cases you can think of where there IS a redirect but we don't already have the redirect target in the corpus?

@tunetheweb
Copy link
Member

Yeah I think it either should have enough traffic to be in the CrUX dataset independently anyway or it's a redirect we should be ignoring.

The only example I can think of is if they migrate a site between end of the month and the crawl starting. But even then that should be a once off hit and then back in again next month with the correct URL.

@rviscomi
Copy link
Member

rviscomi commented Dec 5, 2023

Yeah I think you're both right that the number of cases this would matter would be insignificantly small.

WITH cross_origin AS (
  SELECT
    httparchive.fn.GET_ORIGIN(page) AS origin
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-11-01' AND
    client = 'mobile' AND
    NOT is_root_page AND
    httparchive.fn.GET_ORIGIN(root_page) != httparchive.fn.GET_ORIGIN(page)
),

root_pages AS (
  SELECT
    httparchive.fn.GET_ORIGIN(root_page) AS origin
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-11-01' AND
    client = 'mobile' AND
    is_root_page
)

SELECT
  COUNTIF(root_pages.origin IS NOT NULL) AS origin_exists,
  COUNT(0) AS total,
  COUNTIF(root_pages.origin IS NOT NULL) / COUNT(0) AS pct
FROM
  cross_origin
LEFT JOIN
  root_pages
USING
  (origin)

This shows that there are 55,113 cross-origin redirects in the mobile dataset, using cross-origin secondary pages as a heuristic. And of those, 40,415 (73%) redirect to an origin that already exists in the corpus (google.com, etc).

dist/test_result.js Outdated Show resolved Hide resolved
Co-authored-by: Rick Viscomi <[email protected]>
Copy link

github-actions bot commented Dec 5, 2023

Custom metrics for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=231205_K8_1

@pmeenan pmeenan merged commit ace4a3f into main Dec 5, 2023
@pmeenan pmeenan deleted the test-result branch December 5, 2023 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants