Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Service Map data background task and API endpoint #50844

Closed
wants to merge 15 commits into from

Conversation

ogupte
Copy link
Contributor

@ogupte ogupte commented Nov 16, 2019

Addresses #48996.

Adds data later for the service maps feature in APM:

  • creates new index apm-service-connections
  • schedules periodic task that stores service connection documents in the apm-service-connections index
  • implements endpoint /api/apm/service-map that query/filters service connections to render the service map

Screen Shot 2019-11-26 at 12 13 28 AM

In order to test this out without agents setting destination.address, you can install an ingest pipeline that approximates agent support:

git clone https://github.com/ogupte/apm-ui-seed.git
cd apm-ui-seed
yarn
bin/apm-ui-service-map-ingest --host='http://localhost:9200' --auth='elastic:password' install

when you're done testing, you can uninstall the dev pipeline with:

bin/apm-ui-service-map-ingest --host='http://localhost:9200' --auth='elastic:password' uninstall

@ogupte ogupte added the v7.6.0 label Nov 16, 2019
@ogupte ogupte self-assigned this Nov 16, 2019
@elasticmachine
Copy link
Contributor

💔 Build Failed

@ogupte ogupte force-pushed the apm-48996-service-map-data branch from a07a4f8 to 16b45f0 Compare November 22, 2019 10:25
@elasticmachine
Copy link
Contributor

💔 Build Failed

@ogupte ogupte force-pushed the apm-48996-service-map-data branch 2 times, most recently from 16b45f0 to 2ea4358 Compare November 26, 2019 08:12
@elasticmachine
Copy link
Contributor

💔 Build Failed

if (fetchedTasks.docs.length) {
await taskManager.remove('servicemap-processor');
}
await taskManager.schedule({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the point of this is to prevent this task being scheduled more than once, but if that is the case, then you can now use ensureScheduled: #50232


if (config.get('xpack.apm.serviceMapEnabled')) {
initializeServiceMaps(server).catch(error => {
throw error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in an unhandled promise exception

* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/
import { idx } from '@kbn/elastic-idx';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idx doing a comeback??? :D

if (start && end) {
return callApmApi({
pathname: '/api/apm/service-map',
params: { query: { start, end } }
pathname: '/api/apm/service-map/all',
Copy link
Member

@sorenlouv sorenlouv Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a service is called "all"?

});
}

// @ts-ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

@@ -38,18 +38,24 @@ export default function apmOss(kibana) {
spanIndices: Joi.string().default('apm-*'),
metricsIndices: Joi.string().default('apm-*'),
onboardingIndices: Joi.string().default('apm-*'),
apmAgentConfigurationIndex: Joi.string().default('.apm-agent-configuration')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to re-add this?

start,
end,
environment,
uiFilters: JSON.stringify(uiFiltersOmitEnv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally don't send environment like a separate query params but just keep it in uiFilters. Can't we do the same here?

export const serviceMapRoute = createRoute(() => ({
path: '/api/apm/service-map',
path: '/api/apm/service-map/{serviceName}',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about /api/apm/service-map/{serviceName} and /api/apm/service-map/all can be combined to a single route if serviceName is converted to an optional query param

return getServiceMap();
handler: async ({ context, request }) => {
if (!context.config['xpack.apm.serviceMapEnabled']) {
return new Boom('Not found', { statusCode: 404 });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceMapEnabled should be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or, are there any reasons to let users disable it? We don't allow this for any other features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can safely remove this config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after talking with folks, the consensus was to keep the config until FF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason to wait until FF if we decided to do this?

};
}>;
};
}> = idx(aggregations, _ => _.conns.buckets) || [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgieselaar is this something our magic agg types should be able to infer instead of them being spelled out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is no longer an issue, correct?

'Unable to set up required scripts for APM Service Maps:'
);
server.log('error', error);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the intention to continue initialization if setupRequiredScripts fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was the idea, the result will be some error messages in kibana logs & service maps UI not displaying any nodes

Copy link
Member

@sorenlouv sorenlouv Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's that useful? What about just terminating initialization as soon as something goes wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still being worked on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file has undergone a metamorphosis since then. It's not responsible for initializing the endpoint that a user hits to kick off the task in that user's scope. This endpoint had to be created here since it needs access to the task manager, so it wasn't able to live with the other APM APIs.

term: {
'task.taskType': 'serviceMap'
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what do you think about single-lining terms like this:

  { term: { _id: "task:servicemap-processor" } },
  { term: { "task.taskType": "serviceMap" } }


function interestingTransactions(since?: string, afterKey?: any) {
if (!since) {
since = 'now-1h';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was this value chosen? Should it be a constant?

Copy link
Member

@sorenlouv sorenlouv Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw what about adding it to the function signature?

query: {
bool: {
filter: [
{ exists: { field: 'destination.address' } },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use constants

{ exists: { field: 'destination.address' } },
{ exists: { field: 'trace.id' } },
{ exists: { field: 'span.duration.us' } },
{ range: { '@timestamp': { gt: since } } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using start and end like in the rest of the codebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that end is currently ignored?

Copy link
Member

@sorenlouv sorenlouv Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nvm. This is not the function called by the end-user but the task manager.

body
});
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about, I think this file would benefit from being refactored into smaller functions, having better types, and being less mutable (if possible).


return new ArrayList(conns)
}
return [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for this? I think that's quiet important since we have no static safety - not even syntax highlighting - to notify us of any issues.

Copy link
Member

@sorenlouv sorenlouv Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw ran it through a formatter to fix some minor indentation and spacing issues:`

void extractChildren(def caller, def spans, def upstream, def conns, def count) {
  // TODO: simplify this
  if (spans.containsKey(caller.id)) {
    for (s in spans[caller.id]) {
      if (caller.span_type == 'external') {
        upstream.add(caller.service_name + "/" + caller.environment);
        def conn = new HashMap();
        conn.caller = caller;
        conn.callee = s;
        conn.upstream = new ArrayList(upstream);
        conns.add(conn);
        extractChildren(s, spans, upstream, conns, count);
        upstream.remove(upstream.size() - 1);
      } else {
        extractChildren(s, spans, upstream, conns, count);
      }
    }
  } else {
    // no connection found
    def conn = new HashMap();
    conn.caller = caller;
    conn.upstream = new ArrayList(upstream);
    conn.upstream.add(caller.service_name + "/" + caller.environment);
    conns.add(conn);
  }
}

def conns = new HashSet();
def spans = new HashMap();

// merge results from shards
for (state in states) {
  for (s in state.entrySet()) {
    def v = s.getValue();
    def k = s.getKey();
    if (!spans.containsKey(k)) {
      spans[k] = v;
    } else {
      for (p in v) {
        spans[k].add(p);
      }
    }
  }
}

if (spans.containsKey(null) && spans[null].size() > 0) {
  def node = spans[null][0];
  def upstream = new ArrayList();
  extractChildren(node, spans, upstream, conns, 0);
  return new ArrayList(conns)
}

return [];

source: `
void extractChildren(def caller, def spans, def upstream, def conns, def count) {
// TODO: simplify this
if (spans.containsKey(caller.id)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still the plan to simplify this?

// TODO: simplify this
if (spans.containsKey(caller.id)) {
for(s in spans[caller.id]) {
if (caller.span_type=='external') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see external being used a couple of places. Should it be a constant like we have for transaction types?

if: "ctx.span != null && ctx.span.type == 'ext'",
field: 'span.type',
value: 'external'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we rewrite ext to external. Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time, the value was not consistent, so this normalized everything to external.

see elastic/apm-agent-nodejs#1225 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also note that this ingest pipeline will not be merged.

// get current apm ingest pipeline
apmIngestPipeline = await getIngestPipelineApm(callCluster);
} catch (error) {
if (error.statusCode !== 404) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about why we are swallowing 404 errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. comment below is sufficient 👍

@@ -33,6 +36,9 @@ export function mergeConfigs(apmOssConfig: APMOSSConfig, apmConfig: APMXPackConf
'xpack.apm.ui.maxTraceItems': apmConfig['ui.maxTraceItems'],
'xpack.apm.ui.transactionGroupBucketSize': apmConfig['ui.transactionGroupBucketSize'],
'xpack.apm.autocreateApmIndexPattern': apmConfig.autocreateApmIndexPattern,
'xpack.apm.serviceMapIndexPattern': apmConfig.serviceMapIndexPattern,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceMapIndexPattern is misleading. It's not an index pattern (in the Kibana sense). This should follow the names of the other keys: serviceMapIndices.
Even though this is an xpack feature I'm contemplating whether it'll make more sense to keep this in apm_oss where all other indices are configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it belongs in the apm_oss set of configs, since it's a premium feature. it will not be accessed if the use doesn't pass a license check.

Copy link
Member

@sorenlouv sorenlouv Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know, but imo it feels weird as an xpack user that you have to configure your indices differently. Let's get a second opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmorelli25 i'm interested in your take on this. Currently all of the apm indicies are namespaced under apm.oss. Examples: apm_oss.errorIndices, apm_oss.transactionIndices and apm_oss.spanIndices. Now we are also going to add a setting for Service Maps indices.

From a docs perspective what do you think makes more sense to:

  • apm_oss.serviceMapIndices (consistency with other index configuration option names)
  • xpack.apm.serviceMapIndices (Service Maps are only available in basic so is not used for anything in oss)

Copy link
Contributor Author

@ogupte ogupte Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Service Maps are only available in basic...)

should only be available in platinum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is for xpack.apm.serviceMapIndices.

it feels weird as an xpack user that you have to configure your indices differently.

I agree, it does feel a little weird, but I think it's less weird than namespacing a platinum feature under the oss prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Then let's stick to xpack.apm.serviceMapIndices.

@ogupte ogupte force-pushed the apm-48996-service-map-data branch from 590ceba to 0287001 Compare December 14, 2019 13:17
@ogupte ogupte marked this pull request as ready for review December 16, 2019 17:24
@ogupte ogupte requested a review from a team as a code owner December 16, 2019 17:24

params.body.query.bool.filter.push({
wildcard: {
[CONNECTION_UPSTREAM_LIST]: `${upstreamServiceName}/${upstreamEnvironment}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this discussion has come up before but how much time have we spent looking into having service.name and service.environment in separate fields?
If the problem is due to querying (because it's an array of objects) could we used nested fields instead?

That would make querying faster and imo simpler (although nested fields does introduce a slightly different syntax)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgieselaar any opinion on this?

Copy link
Member

@dgieselaar dgieselaar Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps do both? service.name, service.environment, and service.identifier (or something more appropriately named). I'm not sure if we should use nested, aggregations become more complicated and it doesn't work with index sorting, which we will probably not use in practice but I think the use case for nested needs to be really compelling as there are some drawbacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with the current approach is that it is not longer ECS compliant so we can't apply ui filters like normally. Additionally wildcard can be terribly slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having both for possible future use cases. Although currently I don't think we want to query on CONNECTION_UPSTREAM_LIST

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a proper look now, might be missing some context

)
);
return traceConnections.flatMap(
mapTraceToBulkServiceConnection(apmIndices, servicesInTrace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Making mapTraceToBulkServiceConnection a higher-order function seems slightly unnecessary. Perhaps just:

return traceConnections.flatMap(traceConnection => mapTraceToBulkServiceConnection(apmIndices, servicesInTrace, traceConnection)

esClient: ESClient;
}) {
const params = {
index: targetApmIndices,
Copy link
Member

@sorenlouv sorenlouv Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately possible to see which indices are being queried. Everywhere else apmIndices is passed in. Can we do the same here?

Suggested change
index: targetApmIndices,
index: [
apmIndices['apm_oss.transactionIndices'],
apmIndices['apm_oss.spanIndices']
]

esClient: ESClient;
}) {
const params = {
index: targetApmIndices,
Copy link
Member

@sorenlouv sorenlouv Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

}
},
aggs: {
tracesample: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracesample: {
trace_sample: {

'span.subtype',
'service.name',
'service.environment',
'destination.address'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using constants here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it but felt it was best to avoid interpolation of the script source. I didn't want it to look like i was generating dynamic scripts from our code.

source: `return state.mappedDocs`
};

export const reduceServiceConnsScript = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for each of these scripts about what they are doing?

- creates required scripts in elasticsearch
- adds ingest pipeline that sets destination.address from span names
- schedules periodic task that create connection documents
- implements endpoints that query/filters connections to render the service map
- replace this.kbnServer.afterPluginsInit and task object fetch with taskManager.ensureScheduled
- simplify routes: move serviceName to query param and consolidate endpoints
- clean up code in get_service_map.ts, constants, typescript fixes
- add typescript support for composite aggregations
- broke up the task into more files for better organization and testing
- conform to ESC fields
- performance improvments to get service connections
- add checks for empty bulk requests
default security settings. The kibana user is responsible for creating
and index to the `apm-service-connections` data index, while the apm
user is resposible for kicking off the scheduled task and reading from
`apm-*` indices.
@ogupte ogupte force-pushed the apm-48996-service-map-data branch from 0287001 to f0aec81 Compare December 21, 2019 09:48
@kibanamachine
Copy link
Contributor

💔 Build Failed

History

  • 💔 Build #15584 failed 02870016934cfb3ef73f2fcbb63a63c20af5005f
  • 💔 Build #14704 failed 590ceba4a828e42bf68fce7ff53062fba5bf8dca
  • 💔 Build #14475 failed a5edd893e54fc7de40768b42e604e850a602fb77
  • 💔 Build #14454 failed 06b07ee2a7baf5d62fb7813a154a45adbb7f3793
  • 💔 Build #14417 failed ec845675f89142b7e7b4a1fb5b1e90a27b139916

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@elastic elastic deleted a comment from elasticmachine Dec 21, 2019
@ogupte
Copy link
Contributor Author

ogupte commented Jan 11, 2020

closing this in favor of the runtime implementation #54027

@ogupte ogupte closed this Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants