Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay Single User Journey Finish #129991

Closed
wants to merge 4 commits into from
Closed

Delay Single User Journey Finish #129991

wants to merge 4 commits into from

Conversation

suchcodemuchwow
Copy link
Contributor

@suchcodemuchwow suchcodemuchwow commented Apr 12, 2022

We have seen some missing events from the single user performance tests and as it's also mentioned in this #129990 issue we want to delay journey finish to guarantee all events reported properly to APM.

@suchcodemuchwow suchcodemuchwow self-assigned this Apr 12, 2022
@suchcodemuchwow suchcodemuchwow added performance release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v8.2.0 labels Apr 12, 2022
@@ -166,6 +168,7 @@ export class PerformanceTestingService extends FtrService {

private async tearDown(page: Page, client: CDPSession, context: BrowserContext) {
if (page) {
await sleep(5000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can simply use await page.waitForTimeout(5000); ?

@@ -166,6 +168,7 @@ export class PerformanceTestingService extends FtrService {

private async tearDown(page: Page, client: CDPSession, context: BrowserContext) {
if (page) {
await sleep(5000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you add a comment why it's necessary? With a link to flushInterval config option.
  2. Let's maybe configure flushInterval explicitly to avoid relying on the default value? Having it configured, we can set the delay slightly above the flushInterval value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using rum agent here and we don't have flushInterval on node agent. Ref: https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html#apm-flush. But I will add relevant comment to explain why do we use flush

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don’t use the RUM agent in this file, but we can configure flushInterval for the RUM agent on Kibana client-side

env: {
ELASTIC_APM_ACTIVE: process.env.ELASTIC_APM_ACTIVE,
ELASTIC_APM_CONTEXT_PROPAGATION_ONLY: 'false',
ELASTIC_APM_ENVIRONMENT: process.env.CI ? 'ci' : 'development',
ELASTIC_APM_TRANSACTION_SAMPLE_RATE: '1.0',
ELASTIC_APM_SERVER_URL: APM_SERVER_URL,
ELASTIC_APM_SECRET_TOKEN: APM_PUBLIC_TOKEN,
// capture request body for both errors and request transactions
// https://www.elastic.co/guide/en/apm/agent/nodejs/current/configuration.html#capture-body
ELASTIC_APM_CAPTURE_BODY: 'all',
// capture request headers
// https://www.elastic.co/guide/en/apm/agent/nodejs/current/configuration.html#capture-headers
ELASTIC_APM_CAPTURE_HEADERS: true,
// request body with bigger size will be trimmed.
// 300_000 is the default of the APM server.
// for a body with larger size, we might need to reconfigure the APM server to increase the limit.
// https://www.elastic.co/guide/en/apm/agent/nodejs/current/configuration.html#long-field-max-length
ELASTIC_APM_LONG_FIELD_MAX_LENGTH: 300_000,
ELASTIC_APM_GLOBAL_LABELS: Object.entries({
ftrConfig: `x-pack/test/performance/tests/config.playwright`,
performancePhase: process.env.TEST_PERFORMANCE_PHASE,
journeyName: process.env.JOURNEY_NAME,
testJobId,

and then use the same / a bit longer delay in the tearDown step.
Did I miss anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That's still the test runners config and I don't know if we are able to provide env variable to change flush interval.
  2. I also liked that idea however but I'm not sure about this scenario: let's say there are multiple apm events are being sent it's going to close the page when it gets the first response which is not we want ? @mshustov @vigneshshanmugam maybe it doesn't work like that ?

Copy link
Contributor

@mshustov mshustov Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also liked that idea however but I'm not sure about this scenario: let's say there are multiple apm events are being sent it's going to close the page when it gets the first response which is not we want ?

It might happen in theory. We can extend this logic to count the number of outcoming and responded requests.

this.rumRequestsInFlight = []

await page.route('**', async (route, request) => {
   if(isRumReportingRequest(route)) {
   rumRequestsInFlight.push(request.response);
});

function tearDown() {
  // wait for the one more reporting call
  await page.waitForResponse((response) => response.url().includes('/rum/events'));
  // make sure all the requests were finished
  await Promise.all(rumRequestsInFlight);

@vigneshshanmugam is there API to disable metrics collection by the RUM agent and enforce it to report already collected events?

maybe it doesn't work like that ?

@suchcodemuchwow how would you suggest this problem be solved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say there are multiple apm events are being sent it's going to close the page when it gets the first response which is not we want ?
Agreed, if there are multiple transactions in the page within the flush interval, we would need extra guard to count for the no of possible APM server calls from the agent.

The better way to fix this would be

  1. (Recommended) - Update the RUM agent to the latest version which makes sure the events queue in the agent is flushed before the users leaves the page and we dont need any workarounds here. elastic/apm-agent-rum-js@2429814
  2. For the time being, I think counting strategy based on the api server calls could work.
  3. (Hacky workaround) - Have not tested it though.
page.evaluate(() => {
  const apmServer = window.elasticApm.serviceFactory.getService('ApmServer')
  apmServer.queue.flush()
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Recommended) - Update the RUM agent to the latest version

Opened #130765

@suchcodemuchwow suchcodemuchwow requested a review from spalger April 19, 2022 12:02
@suchcodemuchwow suchcodemuchwow marked this pull request as ready for review April 19, 2022 12:04
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @suchcodemuchwow

Copy link
Member

@dmlemeshko dmlemeshko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change, but probably makes sense to have other suggested from APM team in this PR?

Comment on lines 171 to 172
apm.flush();
await client.detach();
Copy link
Member

@dmlemeshko dmlemeshko Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigneshshanmugam suggested to add
await page.waitForResponse((response) => response.url().includes('/rum/events')); after flush() to make sure the RUM agent response is sent to APM server before browser is killed.

@suchcodemuchwow
Copy link
Contributor Author

Closing this since we don't need any workaround after updating new rum agent.

cc: @mshustov @vigneshshanmugam

1 similar comment
@suchcodemuchwow
Copy link
Contributor Author

Closing this since we don't need any workaround after updating new rum agent.

cc: @mshustov @vigneshshanmugam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting performance release_note:skip Skip the PR/issue when compiling release notes v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants