Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering support for metadata payload #1916

Closed
joshdover opened this issue Dec 15, 2020 · 7 comments · Fixed by #2045
Closed

Add filtering support for metadata payload #1916

joshdover opened this issue Dec 15, 2020 · 7 comments · Fixed by #2045
Assignees
Labels
agent-nodejs Make available for APM Agents project planning. kibana
Milestone

Comments

@joshdover
Copy link

In Kibana's APM instrumentation, we are now capturing stats from developer's local environments. One issue is that we cannot filter some PII such as the developer's username from the path that is contained in the process.args array in the metadata that is sent along with APM traces:

image

Currently, we're including a notice that some personal data may be captured with an option for disabling this. Ideally, we'd be able filter this out similar to how we can filter out transaction data.

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Dec 15, 2020
@trentm
Copy link
Member

trentm commented Dec 15, 2020

Some quick notes and a first hack:

  1. patch to apm-nodejs-http-client.git
diff --git a/index.js b/index.js
index c27fb71..2bb3a0d 100644
--- a/index.js
+++ b/index.js
@@ -495,7 +495,9 @@ function onStream (client, onerror) {

     // All requests to the APM Server must start with a metadata object
     if (!client._encodedMetadata) {
-      client._encodedMetadata = client._encode({ metadata: client._conf.metadata }, Client.encoding.METADATA)
+      // XXX HACK: This should be done *once*, but as late as possible.
+      var filteredMetadata = client._conf.metadataFilters.process(client._conf.metadata)
+      client._encodedMetadata = client._encode({ metadata: filteredMetadata }, Client.encoding.METADATA)
     }
     stream.write(client._encodedMetadata)
   }
  1. patch to apm-agent-nodejs.git
diff --git a/lib/agent.js b/lib/agent.js
index cd75056..56038c1 100644
--- a/lib/agent.js
+++ b/lib/agent.js
@@ -42,6 +42,7 @@ function Agent () {
   this._errorFilters = new Filters()
   this._transactionFilters = new Filters()
   this._spanFilters = new Filters()
+  this._metadataFilters = new Filters()
   this._transport = null

   this.lambda = lambda(this)
@@ -221,6 +224,12 @@ Agent.prototype.addFilter = function (fn) {
   this.addErrorFilter(fn)
   this.addTransactionFilter(fn)
   this.addSpanFilter(fn)
+  // XXX Decide if this would be a breaking change. For example the *default
+  // filter example in the docs* will break because it assumes
+  // `payload.context`. However that should *already* break for filtering
+  // error payloads.
+  // https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html#apm-add-filter
+  this.addMetadataFilter(fn)
 }

 Agent.prototype.addErrorFilter = function (fn) {
@@ -250,6 +259,15 @@ Agent.prototype.addSpanFilter = function (fn) {
   this._spanFilters.push(fn)
 }

+Agent.prototype.addMetadataFilter = function (fn) {
+  if (typeof fn !== 'function') {
+    this.logger.error('Can\'t add filter of type %s', typeof fn)
+    return
+  }
+
+  this._metadataFilters.push(fn)
+}
+
 Agent.prototype.captureError = function (err, opts, cb) {
   if (typeof opts === 'function') return this.captureError(err, null, opts)

diff --git a/lib/config.js b/lib/config.js
index 2702181..e69e6f5 100644
--- a/lib/config.js
+++ b/lib/config.js
@@ -253,6 +253,7 @@ class Config {
           globalLabels: maybePairsToObject(conf.globalLabels),
           hostname: conf.hostname,
           environment: conf.environment,
+          metadataFilters: agent._metadataFilters,

           // Sanitize conf
           truncateKeywordsAt: config.INTAKE_STRING_MAX_SIZE,
  1. which allows me to write a (poor) filter somewhat for your case like this:
apm.addMetadataFilter(function myFilt(payload) {
  if (payload.process && payload.process.argv) {
    const user = new RegExp(process.env.USER, 'g')
    payload.process.argv = payload.process.argv.map((arg) => {
      return arg.replace(user, '[REDACTED]')
    })
  }
  return payload
})

and then observe that filtering in traffic to apm-server:

[2020-12-15T22:33:03.579Z]  INFO: mockapmserver/26715 on pink.local: request (req.remoteAddress=::ffff:127.0.0.1, req.remotePort=60675)
    POST /intake/v2/events HTTP/1.1
    accept: application/json
    user-agent: elasticapm-node/3.9.0 elastic-apm-http-client/9.4.2 node/10.23.0
    content-type: application/x-ndjson
    content-encoding: gzip
    host: localhost:8200
    connection: keep-alive
    transfer-encoding: chunked

    {"metadata":{"service":{"name":"esapp","environment":"development","runtime":{"name":"node","version":"10.23.0"},"language":{"name":"javascript"},"agent":{"name":"nodejs","version":"3.9.0"},"framework":{"name":"express","version":"4.17.1"},"version":"1.0.1"},"process":{"pid":28461,"ppid":27276,"title":"node","argv":["/Users/[REDACTED]/.nvm/versions/node/v10.23.0/bin/node","/Users/[REDACTED]/tm/play/esapp.js"]},"system":{"hostname":"pink.local","architecture":"x64","platform":"darwin"}}}
    {"span":{"name":...

@trentm trentm self-assigned this Dec 15, 2020
@trentm
Copy link
Member

trentm commented Dec 15, 2020

Notes:

  • I think probably not include this in "addFilters" because it really is a different category of data to filter: data that doesn't change frequently.
  • This change should consider the soon-to-come cloud metadata work (Collect cloud metadata #1815) to ensure that the filter runs after that lazily collected cloud metadata.

@AlexanderWert AlexanderWert added this to the 7.12 milestone Feb 2, 2021
@trentm
Copy link
Member

trentm commented Feb 22, 2021

@joshdover Would your use case be handled with a config option to fully disable including process args in the "metadata"? I ask because the Java APM agent has a include_process_args config options for this. In fact, the Java agent does not include process args by default -- mainly because it can be large and overkill to include in the metadata that is sent for every payload to APM server.

If this handles your use case, it would be faster as well.

@tylersmalley I noticed you +1'd this above. Do you have a different use case that would be better served with a apm.addMetadataFilter(...)?

Currently my preference is to:

  • add a matching include_process_args-like config option to support excluding process args from metadata
  • add a ticket for the next major release of the node.js agent to exclude process args by default
  • only add apm.addMetadataFilter(...) once we have a decent use case for it from some user(s).

@tylersmalley
Copy link

I don't think the process args are the only place that the full path is included, which is the issue since it includes the username.

@trentm
Copy link
Member

trentm commented Feb 25, 2021

@tylersmalley Thanks! You are right. process.title is the other one that can hold the full path. I'm sold on the use case then. I'll get a PR moving for adding a metadata filter, then.

@tylersmalley
Copy link

Much appreciated, thanks!!

@trentm
Copy link
Member

trentm commented Apr 15, 2021

This will be in v3.14.0 of the agent, which we hope to release early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning. kibana
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants