-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Optimize services overview #69648
[APM] Optimize services overview #69648
Conversation
a65f141
to
c5fe710
Compare
bool: { | ||
filter: projection.body.query.bool.filter.concat({ | ||
term: { | ||
[PROCESSOR_EVENT]: 'transaction', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[PROCESSOR_EVENT]: 'transaction', | |
[PROCESSOR_EVENT]: ProcessorEvent.transaction, |
@@ -19,72 +20,315 @@ import { | |||
} from '../../helpers/setup_request'; | |||
import { getServicesProjection } from '../../../../common/projections/services'; | |||
|
|||
const MAX_NUMBER_OF_SERVICES = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have a settings option instead of hardcoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sqren thoughts here? We didn't have one before AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are arguments for making it configurable but no-one ever asked for this and I'd rather not make it configurable if it's not needed so... let's leave it for now, and make it configurable if someone asks for it.
...projection.body.query.bool.filter, | ||
{ | ||
terms: { | ||
[PROCESSOR_EVENT]: ['metric', 'error', 'transaction'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[PROCESSOR_EVENT]: ['metric', 'error', 'transaction'], | |
[PROCESSOR_EVENT]: [ProcessorEvent.metric, ProcessorEvent.error, ProcessorEvent.transaction], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think there is value in having a lightweight APM client.
In this case the PROCESSOR_EVENT
would be added automatically based on the indicies.
query: { | ||
bool: { | ||
filter: [ | ||
...projection.body.query.bool.filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the getTransactionDurationAvg
you used concat:
filter: projection.body.query.bool.filter.concat
and here you used spread syntax.
...projection.body.query.bool.filter,
I'm not against neither implementations, but maybe we could pick one so we have consistency?
|
||
return aggregations.services.buckets.map((bucket) => ({ | ||
name: bucket.key as string, | ||
value: (bucket.agent_name.hits.hits[0]?._source as { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's possible, but can't we define the response type we expect when calling setup.client.search<T>
? Then we could avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but then we have to create a separate variable for params and use client.search<{ agent: { name: string } }, typeof params>
. Which means we lose type checks and autocompletion when we create the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As soon as you start defining generic arguments TS will no longer infer others, AFAIK).
...projection.body.query.bool.filter, | ||
{ | ||
term: { | ||
[PROCESSOR_EVENT]: 'transaction', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[PROCESSOR_EVENT]: 'transaction', | |
[PROCESSOR_EVENT]: ProcessorEvent.transaction, |
|
||
const serviceBuckets = aggs?.services.buckets || []; | ||
const deltaAsMinutes = (setup.end - setup.start) / 1000 / 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could extract deltaAsMinutes
to a function and use it on getTransactionRate
too.
|
||
const projection = getServicesProjection({ setup }); | ||
const arrayUnionToCallable = <T extends any[]>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of don't get why it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A clarifying comment would help
}; | ||
|
||
const getAgentName = async ({ setup, projection }: AggregationParams) => { | ||
const response = await setup.client.search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about breaking moving each query into a separate file to improve readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted for all queries in one separate file. LMK what you think.
|
||
const getAgentName = async ({ setup, projection }: AggregationParams) => { | ||
const response = await setup.client.search( | ||
mergeProjection(projection, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to get rid of the projections if we remove the counts? I think that would make all queries 10x more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make it a little easier, as the contextual filters don't have to care about the aggregation needed to calculate a count. But there's a bunch of other stuff that a projection takes care of:
- range filter on
@timestamp
- type-checking projection-specific parameters like
service.name
ortransaction.type
- querying the appropriate index
I think the lightweight APM client you are proposing is trying to solve at least some of these things. I think we should also take latency metrics into account which will complicate things a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projections creates an inversion of control which makes it very hard to see what's actually going on.
range filter on @timestamp
We already have a helper for this?
querying the appropriate index
We can make a helper for this, or like you say, we can use a lightweight client.
type-checking projection-specific parameters like service.name or transaction.type
Not sure about what this is but it would be great if we can avoid IOC.
c5fe710
to
43f55a5
Compare
{ | ||
terms: { | ||
[PROCESSOR_EVENT]: ['transaction', 'error', 'metric'], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note on the inversion-of-control problem introduced by projections:
PROCESSOR_EVENT
should be aligned with the queried indices. In this case the indicies are specified in the projection. It is therefore possible for the indicies and processor event filter to go out of sync without us noticing.
Pinging @elastic/apm-ui (Team:apm) |
const totalTransactions = transactions?.doc_count || 0; | ||
export async function getServicesItems(setup: ServicesItemsSetup) { | ||
const params = { | ||
projection: getServicesProjection({ setup, noEvents: true }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does noEvents
mean? Is this to simulate a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a no-op, but with noEvents: true
it doesn't make a decision about what events should be queried. It's not great, but the alternatives (having to define indices + processor.event term filters in a lot of other places) are worse IMO. It made me think though, our client and our projections could support { apm: { events: ProcessorEvent[] }
as a top-level prop, and prohibit the use of index
. The client's search function would then unpack apm.events
to an index and a terms query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made me think though, our client and our projections could support
{ apm: { events: ProcessorEvent[] }
as a top-level prop, and prohibit the use ofindex
. The client's search function would then unpackapm.events
to an index and a terms query.
I think it's a good idea to prohibit the use of index and instead use apm.events
to specify both the indicies to query and the processor.events
to filter by.
This is similar to what I tried to suggest in #67397 - although I called it docTypes
and was probably not super clear. I've renamed it to apmEvents
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sqren I think I understand the need better now 👍 and IMO a top-level prop lends itself better to composition than a second function argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the client idea, we should move away from the mega object setup
has become and I'd like to move away from mergeProjections
and the overuse of spreading: #70157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
getTransactionRate, | ||
getErrorRate, | ||
getEnvironments, | ||
} from './get_metrics'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this is related to the metric based ui or agent runtime metrics we should find another term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to get_services_items_stats
. Also pluralised the function names.
?.value ?? null, | ||
environments: | ||
environments.find((service) => service.name === serviceName)?.value ?? | ||
[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suddenly got a lot more complex. Any reason why transactionRate
and the others are now arrays instead of scalar values?
Shouldn't transactionRate
be a numeric value in the first place, and not an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now using joinByKey
to make this code easier to understand.
] = await Promise.all([ | ||
getTransactionDurationAvg(params), | ||
getAgentName(params), | ||
getTransactionRate(params), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are retrieving more than one transaction rate
getTransactionRate(params), | |
getTransactionRates(params), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
transactionsPerMinute: | ||
transactionRate.find((service) => service.name === serviceName) | ||
?.value ?? 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transactionsPerMinute: | |
transactionRate.find((service) => service.name === serviceName) | |
?.value ?? 0, | |
transactionsPerMinute: transactionRates?.[serviceName] ?? 0, |
return arrayUnionToCallable(aggregations.services.buckets).map((bucket) => { | ||
const transactionsPerMinute = bucket.doc_count / deltaAsMinutes; | ||
return { | ||
name: bucket.key as string, | ||
value: transactionsPerMinute, | ||
}; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return arrayUnionToCallable(aggregations.services.buckets).map((bucket) => { | |
const transactionsPerMinute = bucket.doc_count / deltaAsMinutes; | |
return { | |
name: bucket.key as string, | |
value: transactionsPerMinute, | |
}; | |
}); | |
return arrayUnionToCallable(aggregations.services.buckets).reduce< | |
Record<string, number> | |
>((acc, bucket) => { | |
const transactionsPerMinute = bucket.doc_count / deltaAsMinutes; | |
const serviceName = bucket.key as string; | |
return { ...acc, [serviceName]: transactionsPerMinute }; | |
}, {}); |
Have you talked to the ES team about this? The changes in this PR replaces simplicity and readability with performance. So we should make sure the performance gains are worth it. |
@elasticmachine merge upstream |
indices['apm_oss.errorIndices'], | ||
indices['apm_oss.transactionIndices'], | ||
], | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be changed by your PR for new client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! there will no longer be a need for noEvents
.
[...agentNames, ...transactionRates], | ||
'serviceName' | ||
); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the description and tests ❤️
U extends UnionToIntersection<T>, | ||
V extends keyof T & keyof U | ||
>(items: T[], key: V): JoinedReturnType<T, U, V> { | ||
return items.reduce<JoinedReturnType<T, U, V>>((prev, current) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find it useful to distinguish between the accumulator and the items that are being iterated.
return items.reduce<JoinedReturnType<T, U, V>>((prev, current) => { | |
return items.reduce<JoinedReturnType<T, U, V>>((acc, item) => { |
const response = await client.search( | ||
mergeProjection(projection, { | ||
size: 0, | ||
index: indices['apm_oss.transactionIndices'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment more on projections: In this case I don't know how to see that the projection won't override the index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding - but this is a merge, so values are applied from left to right. ie, indices['apm_oss.transactionIndices']
will override whatever the projection has defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes good sense
aggs: { | ||
services: { | ||
terms: { | ||
...projection.body.aggs.services.terms, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR specifically but I want to invite you into my head and how I try to reason about the projections :)
I wonder what these terms are so I try to search for them. If I search for projection.body.aggs.services.terms
, body.aggs.services.terms
, aggs.services.terms
or services.terms
I don't get any useful results.
Then I search for projection
and get a lot results:
So I give up searching for terms
and instead go into manual mode and search for getTransactionDurationAverages
which leads me to getServicesItems
which is calling getServicesProjection
where I eventually find:
I can now see that the terms
I'm looking for are on line 46. Great! But I notice it's a ternary so I have to back up a bit and figure out how getServicesProjection
is called. I now see that getServicesProjection
is called with noEvents=true
so the actual line I'm looking for is 42 : []
. 🏅
It's definitely possible to find these things but it requires patience and focus and not super easy on Github (would have been easier in an editor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, nice. Wish I could do that on Github - for these things it'll be better to check out for sure.
And I now realise that the value I found was wrong (looking in the query instead of the agg 😳)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VSCode's github plugin might help btw, have you tried it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have - a while ago. I should try it again
💛 Build succeeded, but was flaky
Test FailuresKibana Pipeline / kibana-xpack-agent / Chrome X-Pack UI Functional Tests.x-pack/test/functional/apps/lens/smokescreen·ts.lens app lens smokescreen tests should allow creation of lens visualizationsStandard Out
Stack Trace
Build metrics
History
To update your PR or re-run it, just comment with: |
return []; | ||
} | ||
|
||
return aggregations.services.buckets.map((bucket) => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this in a few places and would be really nice if we could do something like:
const buckets = aggregations.services.buckets as Buckets<string, {
agent: {
name: string;
};
}>
return buckets.map(bucket => {
serviceName: bucket.key,
agentName: bucket.agent_name.hits.hits[0]?._source.agent.name
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively:
type Bucket = <string, { agent: { name: string; }; }>;
return buckets.map(bucket: Bucket => {
serviceName: bucket.key,
agentName: bucket.agent_name.hits.hits[0]?._source.agent.name
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's that easy, but I might be missing something. You'd also need to define agent_name
, the name of the sub-aggregation, and the fact that it's a top_hits
aggregation. Let me see if we still need it anyway if we resolve the document types from apm.events
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Any way to narrow the source down even more? I seem to recall we could do that with the client's first generic argument: ApmEventsClient<Transaction>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's based on whatever you specify in apm.events
. In this case, it searches on error, metric and transaction indices, that's why it's a union type. If it were just ProcessorEvent.transaction
, the type would be Transaction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wow. I guess that's perfect then 👍
* master: (199 commits) [Telemetry] Add documentation about Application Usage (elastic#70624) [Ingest Manager] Improve agent unenrollment with unenroll action (elastic#70031) Handle timeouts on creating templates (elastic#70635) [Lens] Add ability to set colors for y-axis series (elastic#70311) [Uptime] Use elastic charts donut (elastic#70364) [Ingest Manager] Update registry URL to point to snapshot registry (elastic#70687) [Composable template] Create / Edit wizard (elastic#70220) [APM] Optimize services overview (elastic#69648) [Ingest Pipelines] Load from json (elastic#70297) [Rum Dashbaord] Rum selected service view (elastic#70579) [Uptime] Prevent duplicate requests on load for index status (elastic#70585) [ML] Changing shared module setup function parameters (elastic#70589) [Ingest Manager] Add ability to sort to agent configs and package configs (elastic#70676) [Alerting] document requirements for developing new action types (elastic#69164) Fixed adding an extra space character on selecting alert variable in action text fields (elastic#70028) [Maps] show vector tile labels on top (elastic#69444) chore(NA): upgrade to lodash@4 (elastic#69868) Add Snapshot Restore README with quick-testing steps. (elastic#70494) [EPM] Use higher priority than default templates (elastic#70640) [Maps] Fix cannot select Solid fill-color when removing fields (elastic#70621) ...
* master: [Lens] Fitting functions (elastic#69820) [Telemetry] Add documentation about Application Usage (elastic#70624) [Ingest Manager] Improve agent unenrollment with unenroll action (elastic#70031) Handle timeouts on creating templates (elastic#70635) [Lens] Add ability to set colors for y-axis series (elastic#70311) [Uptime] Use elastic charts donut (elastic#70364) [Ingest Manager] Update registry URL to point to snapshot registry (elastic#70687) [Composable template] Create / Edit wizard (elastic#70220) [APM] Optimize services overview (elastic#69648)
Friendly reminder: Looks like this PR hasn’t been backported yet. |
Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
(Keeping it in draft for now as I need #69384 to test on a specific cluster).This decreases the loading time of the services overview by splitting up the requests to Elasticsearch.
On the cluster used for testing, time for
/api/apm/services
goes down from about ~55s to ~29s. I think that's worth the increase in complexity but open to other opinions of course.A couple of changes here:
track_total_hits
rather than a terms aggregation which is more costly.Tested by comparing data in master: