-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance suite: track Time To First Byte in the front-end #47037
Conversation
Size Change: +9.78 kB (+1%) Total Size: 1.33 MB
ℹ️ View Unchanged
|
Flaky tests detected in ebb4a1b. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4009066654
|
What page to load? I don't have a strong opinion. How a given template is set up or what template do we load would affect the test, though. In my PRs, I've been testing by loading a "Hello World" post using TwentyTwentyThree (block theme with Who visits the page? I think we should test a logged-out user, as that's the real use case we want to improve (most of the visits to a given site are by logged-out users). |
bin/plugin/commands/performance.js
Outdated
* @param {WPRawPerformanceResults} results | ||
* | ||
* @return {WPPerformanceResults} Curated Performance results. | ||
*/ | ||
function curateResults( results ) { | ||
function curateResults( testSuite, results ) { | ||
if ( testSuite === 'front-end' ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WPPerformanceResults
and WPRawPerformanceResults
data structures expect the results to be the same shape. The front-end metrics will be different from the editor ones, so this will no longer be true. I'm not very keen of stretching it this way, though it's simple and works. See c647035
An alternative I considered was to make the tests themselves return a single value. The test would be responsible for curating the value it wants to report, not the performance command. I'm tempted to do it. The only benefit of having an array of values here would be to aggregate them all across the different runs, which is something we don't do at the moment anyway. FWIW, we only do 1 run at the moment (though we did 3 in the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another advantage of asking the test to curate the results themselves is that adding a new metric doesn't involve modifying the performance command, only the test itself.
}, | ||
median | ||
); | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is just a way of making the command independent from the test suite data points. 859e453
*/ | ||
import { activateTheme, createURL, logout } from '@wordpress/e2e-test-utils'; | ||
|
||
describe( 'Front End Performance', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding separate files per theme makes it simple with the way the performance report is set up.
I considered an alternative: having a single test for all themes and keys such as timeToFirstByteTT1
, timeToFirstByteTT3
, etc. That could also work. The performance command could even have some code to extract those keys into separate tables. However, at that point, the performance command is not independent from the actual tests and data. I thought the current approach was simpler to understand and navigate.
); | ||
const [ navigationTiming ] = JSON.parse( navigationTimingJson ); | ||
results.timeToFirstByte.push( | ||
navigationTiming.responseStart - navigationTiming.requestStart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTFB, by definition, is the time from startTime
to responseStart
. I guess we can just do responseStart
here? Or should we use a different metric or a different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: We should change the name to something like requestTime
. And maybe also include TTFB additionally if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some updates on what we could do next:
|
} ); | ||
|
||
it( 'Time To First Byte (TTFB)', async () => { | ||
let i = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we're testing it 10 times here but only 5 times in TT1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch! Modified both to use 16 runs. The reason for choosing 16 is that calculating the 75th percentile means that we discard the 4 worse results, which gives us some room for avoiding out of the ordinary platform-specific numbers. 12 could also do: 9 is the 75th percentile, so we'd discard the worse 3 results.
bin/plugin/commands/performance.js
Outdated
'front-end-twentytwentyone', | ||
'front-end-twentytwentythree', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to change these names into more semantic ones. Like front-end-classic-theme
and front-end-block-theme
?
I've attempted this in dfbff02, waiting for the tests to run to see if this works... |
Pushed changes to clarify this: it reports the TTFB as median and as 75th percentile. I'd be interested in adding specific timings for the server, that's why I thought of using responseStart-requestStart. However, it sounds like using an approach like this could be more interesting, and it'll help us to add more WordPress specific hooks as desired. So thought we should stick to TTFB for the moment. |
Where can I find the wp-env configuration for CI? I want to make sure the values we use for those variables in this test and production like (false, essentially). |
I'm interested in shipping a first version of this test. We can do this in a follow-up. |
run: | | ||
npm run wp-env start | ||
npm run wp-env -- run tests-cli "wp theme update twentytwentyone --version=1.7" | ||
npm run wp-env -- run tests-cli "wp theme update twentytwentythree --version=1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these downgrade the theme if it has a more recent version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea was to keep the theme version consistent across the tests so the themes couldn't influence the results. It's unlikely, but I think it's still good to keep things as consistent as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The rationale for this is to be able to control the environment so that only the Gutenberg PR affects the metrics. At some point, we should manually update the themes, but at least this will be an intentional change and won't affect any unrelated PR.
If they're not being modified anywhere, I assume we're using the default values, which are described here in the docs. It looks like they're set to true on the development instance, and false on the test instance. |
const navigationTimingJson = await page.evaluate( () => | ||
JSON.stringify( performance.getEntriesByType( 'navigation' ) ) | ||
); | ||
const [ navigationTiming ] = JSON.parse( navigationTimingJson ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
evaluate
should automatically serialize/deserialize JSON data, so we should be able to just do:
const [ navigationTimeing ] = await page.evaluate( () =>
performance.getEntriesByType( 'navigation' )
);
In addition, we already have some performance utils available, and this probably belongs in the getLoadingDurations
util. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up for LCP at #47938 |
closes #33645
What?
This PR aims to add the basic structure to have performance tests executed for the front-end, in addition to the existing test for post & site editor.
Why?
By defining some measures and track them over time, regressions will be caught up earlier in the cycle and there will be clarity about the impact of any given PR.
How?
The test measures two themes: TwentyTwentyOne as a classic theme and TwentyTwentyThree as a block theme. It loads the homepage of each with no posts. These gives use a baseline. In follow-ups, we can add more tests (load a specific post, etc.).
It reports the Time To First Byte, defined as the time it took since the browser requested the resource until the response started. It's reported as a median and as the 75th percentile.
Testing Instructions
Verify that the results for front-end test suite are reported in the Performance Test job:
TODO
Follow ups