-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Service Map data background task and API endpoint #50844
Conversation
💔 Build Failed
|
a07a4f8
to
16b45f0
Compare
💔 Build Failed
|
16b45f0
to
2ea4358
Compare
💔 Build Failed
|
if (fetchedTasks.docs.length) { | ||
await taskManager.remove('servicemap-processor'); | ||
} | ||
await taskManager.schedule({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the point of this is to prevent this task being scheduled more than once, but if that is the case, then you can now use ensureScheduled
: #50232
x-pack/legacy/plugins/apm/index.ts
Outdated
|
||
if (config.get('xpack.apm.serviceMapEnabled')) { | ||
initializeServiceMaps(server).catch(error => { | ||
throw error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in an unhandled promise exception
* or more contributor license agreements. Licensed under the Elastic License; | ||
* you may not use this file except in compliance with the Elastic License. | ||
*/ | ||
import { idx } from '@kbn/elastic-idx'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idx doing a comeback??? :D
if (start && end) { | ||
return callApmApi({ | ||
pathname: '/api/apm/service-map', | ||
params: { query: { start, end } } | ||
pathname: '/api/apm/service-map/all', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a service is called "all"?
}); | ||
} | ||
|
||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
@@ -38,18 +38,24 @@ export default function apmOss(kibana) { | |||
spanIndices: Joi.string().default('apm-*'), | |||
metricsIndices: Joi.string().default('apm-*'), | |||
onboardingIndices: Joi.string().default('apm-*'), | |||
apmAgentConfigurationIndex: Joi.string().default('.apm-agent-configuration') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it intentional to re-add this?
start, | ||
end, | ||
environment, | ||
uiFilters: JSON.stringify(uiFiltersOmitEnv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally don't send environment
like a separate query params but just keep it in uiFilters
. Can't we do the same here?
export const serviceMapRoute = createRoute(() => ({ | ||
path: '/api/apm/service-map', | ||
path: '/api/apm/service-map/{serviceName}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about /api/apm/service-map/{serviceName}
and /api/apm/service-map/all
can be combined to a single route if serviceName
is converted to an optional query param
return getServiceMap(); | ||
handler: async ({ context, request }) => { | ||
if (!context.config['xpack.apm.serviceMapEnabled']) { | ||
return new Boom('Not found', { statusCode: 404 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serviceMapEnabled
should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or, are there any reasons to let users disable it? We don't allow this for any other features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can safely remove this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after talking with folks, the consensus was to keep the config until FF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to wait until FF if we decided to do this?
}; | ||
}>; | ||
}; | ||
}> = idx(aggregations, _ => _.conns.buckets) || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar is this something our magic agg types should be able to infer instead of them being spelled out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is no longer an issue, correct?
'Unable to set up required scripts for APM Service Maps:' | ||
); | ||
server.log('error', error); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the intention to continue initialization if setupRequiredScripts
fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was the idea, the result will be some error messages in kibana logs & service maps UI not displaying any nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's that useful? What about just terminating initialization as soon as something goes wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still being worked on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file has undergone a metamorphosis since then. It's not responsible for initializing the endpoint that a user hits to kick off the task in that user's scope. This endpoint had to be created here since it needs access to the task manager, so it wasn't able to live with the other APM APIs.
term: { | ||
'task.taskType': 'serviceMap' | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: what do you think about single-lining terms like this:
{ term: { _id: "task:servicemap-processor" } },
{ term: { "task.taskType": "serviceMap" } }
|
||
function interestingTransactions(since?: string, afterKey?: any) { | ||
if (!since) { | ||
since = 'now-1h'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was this value chosen? Should it be a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw what about adding it to the function signature?
query: { | ||
bool: { | ||
filter: [ | ||
{ exists: { field: 'destination.address' } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constants
{ exists: { field: 'destination.address' } }, | ||
{ exists: { field: 'trace.id' } }, | ||
{ exists: { field: 'span.duration.us' } }, | ||
{ range: { '@timestamp': { gt: since } } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using start
and end
like in the rest of the codebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in understanding that end
is currently ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nvm. This is not the function called by the end-user but the task manager.
body | ||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about, I think this file would benefit from being refactored into smaller functions, having better types, and being less mutable (if possible).
|
||
return new ArrayList(conns) | ||
} | ||
return []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for this? I think that's quiet important since we have no static safety - not even syntax highlighting - to notify us of any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw ran it through a formatter to fix some minor indentation and spacing issues:`
void extractChildren(def caller, def spans, def upstream, def conns, def count) {
// TODO: simplify this
if (spans.containsKey(caller.id)) {
for (s in spans[caller.id]) {
if (caller.span_type == 'external') {
upstream.add(caller.service_name + "/" + caller.environment);
def conn = new HashMap();
conn.caller = caller;
conn.callee = s;
conn.upstream = new ArrayList(upstream);
conns.add(conn);
extractChildren(s, spans, upstream, conns, count);
upstream.remove(upstream.size() - 1);
} else {
extractChildren(s, spans, upstream, conns, count);
}
}
} else {
// no connection found
def conn = new HashMap();
conn.caller = caller;
conn.upstream = new ArrayList(upstream);
conn.upstream.add(caller.service_name + "/" + caller.environment);
conns.add(conn);
}
}
def conns = new HashSet();
def spans = new HashMap();
// merge results from shards
for (state in states) {
for (s in state.entrySet()) {
def v = s.getValue();
def k = s.getKey();
if (!spans.containsKey(k)) {
spans[k] = v;
} else {
for (p in v) {
spans[k].add(p);
}
}
}
}
if (spans.containsKey(null) && spans[null].size() > 0) {
def node = spans[null][0];
def upstream = new ArrayList();
extractChildren(node, spans, upstream, conns, 0);
return new ArrayList(conns)
}
return [];
source: ` | ||
void extractChildren(def caller, def spans, def upstream, def conns, def count) { | ||
// TODO: simplify this | ||
if (spans.containsKey(caller.id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still the plan to simplify this?
// TODO: simplify this | ||
if (spans.containsKey(caller.id)) { | ||
for(s in spans[caller.id]) { | ||
if (caller.span_type=='external') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see external
being used a couple of places. Should it be a constant like we have for transaction types?
if: "ctx.span != null && ctx.span.type == 'ext'", | ||
field: 'span.type', | ||
value: 'external' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we rewrite ext
to external
. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time, the value was not consistent, so this normalized everything to external
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also note that this ingest pipeline will not be merged.
// get current apm ingest pipeline | ||
apmIngestPipeline = await getIngestPipelineApm(callCluster); | ||
} catch (error) { | ||
if (error.statusCode !== 404) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment about why we are swallowing 404 errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm. comment below is sufficient 👍
x-pack/plugins/apm/server/index.ts
Outdated
@@ -33,6 +36,9 @@ export function mergeConfigs(apmOssConfig: APMOSSConfig, apmConfig: APMXPackConf | |||
'xpack.apm.ui.maxTraceItems': apmConfig['ui.maxTraceItems'], | |||
'xpack.apm.ui.transactionGroupBucketSize': apmConfig['ui.transactionGroupBucketSize'], | |||
'xpack.apm.autocreateApmIndexPattern': apmConfig.autocreateApmIndexPattern, | |||
'xpack.apm.serviceMapIndexPattern': apmConfig.serviceMapIndexPattern, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serviceMapIndexPattern
is misleading. It's not an index pattern (in the Kibana sense). This should follow the names of the other keys: serviceMapIndices
.
Even though this is an xpack feature I'm contemplating whether it'll make more sense to keep this in apm_oss
where all other indices are configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it belongs in the apm_oss set of configs, since it's a premium feature. it will not be accessed if the use doesn't pass a license check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know, but imo it feels weird as an xpack user that you have to configure your indices differently. Let's get a second opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmorelli25 i'm interested in your take on this. Currently all of the apm indicies are namespaced under apm.oss
. Examples: apm_oss.errorIndices
, apm_oss.transactionIndices
and apm_oss.spanIndices
. Now we are also going to add a setting for Service Maps indices.
From a docs perspective what do you think makes more sense to:
apm_oss.serviceMapIndices
(consistency with other index configuration option names)xpack.apm.serviceMapIndices
(Service Maps are only available in basic so is not used for anything in oss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Service Maps are only available in basic...)
should only be available in platinum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote is for xpack.apm.serviceMapIndices
.
it feels weird as an xpack user that you have to configure your indices differently.
I agree, it does feel a little weird, but I think it's less weird than namespacing a platinum feature under the oss
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Then let's stick to xpack.apm.serviceMapIndices
.
590ceba
to
0287001
Compare
|
||
params.body.query.bool.filter.push({ | ||
wildcard: { | ||
[CONNECTION_UPSTREAM_LIST]: `${upstreamServiceName}/${upstreamEnvironment}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this discussion has come up before but how much time have we spent looking into having service.name
and service.environment
in separate fields?
If the problem is due to querying (because it's an array of objects) could we used nested fields instead?
That would make querying faster and imo simpler (although nested fields does introduce a slightly different syntax)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar any opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps do both? service.name
, service.environment
, and service.identifier
(or something more appropriately named). I'm not sure if we should use nested, aggregations become more complicated and it doesn't work with index sorting, which we will probably not use in practice but I think the use case for nested needs to be really compelling as there are some drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with the current approach is that it is not longer ECS compliant so we can't apply ui filters like normally. Additionally wildcard
can be terribly slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with having both for possible future use cases. Although currently I don't think we want to query on CONNECTION_UPSTREAM_LIST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a proper look now, might be missing some context
) | ||
); | ||
return traceConnections.flatMap( | ||
mapTraceToBulkServiceConnection(apmIndices, servicesInTrace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Making mapTraceToBulkServiceConnection
a higher-order function seems slightly unnecessary. Perhaps just:
return traceConnections.flatMap(traceConnection => mapTraceToBulkServiceConnection(apmIndices, servicesInTrace, traceConnection)
esClient: ESClient; | ||
}) { | ||
const params = { | ||
index: targetApmIndices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately possible to see which indices are being queried. Everywhere else apmIndices
is passed in. Can we do the same here?
index: targetApmIndices, | |
index: [ | |
apmIndices['apm_oss.transactionIndices'], | |
apmIndices['apm_oss.spanIndices'] | |
] |
esClient: ESClient; | ||
}) { | ||
const params = { | ||
index: targetApmIndices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
} | ||
}, | ||
aggs: { | ||
tracesample: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracesample: { | |
trace_sample: { |
'span.subtype', | ||
'service.name', | ||
'service.environment', | ||
'destination.address' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using constants here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it but felt it was best to avoid interpolation of the script source. I didn't want it to look like i was generating dynamic scripts from our code.
source: `return state.mappedDocs` | ||
}; | ||
|
||
export const reduceServiceConnsScript = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for each of these scripts about what they are doing?
- creates required scripts in elasticsearch - adds ingest pipeline that sets destination.address from span names - schedules periodic task that create connection documents - implements endpoints that query/filters connections to render the service map
- replace this.kbnServer.afterPluginsInit and task object fetch with taskManager.ensureScheduled - simplify routes: move serviceName to query param and consolidate endpoints - clean up code in get_service_map.ts, constants, typescript fixes - add typescript support for composite aggregations
- broke up the task into more files for better organization and testing
….address from span names
- conform to ESC fields - performance improvments to get service connections - add checks for empty bulk requests
default security settings. The kibana user is responsible for creating and index to the `apm-service-connections` data index, while the apm user is resposible for kicking off the scheduled task and reading from `apm-*` indices.
0287001
to
f0aec81
Compare
💔 Build FailedHistory
To update your PR or re-run it, just comment with: |
closing this in favor of the runtime implementation #54027 |
Addresses #48996.
Adds data later for the service maps feature in APM:
apm-service-connections
apm-service-connections
index/api/apm/service-map
that query/filters service connections to render the service mapIn order to test this out without agents setting
destination.address
, you can install an ingest pipeline that approximates agent support:when you're done testing, you can uninstall the dev pipeline with: