Add filtering support for metadata payload #1916

joshdover · 2020-12-15T20:29:14Z

In Kibana's APM instrumentation, we are now capturing stats from developer's local environments. One issue is that we cannot filter some PII such as the developer's username from the path that is contained in the process.args array in the metadata that is sent along with APM traces:

Currently, we're including a notice that some personal data may be captured with an option for disabling this. Ideally, we'd be able filter this out similar to how we can filter out transaction data.

The text was updated successfully, but these errors were encountered:

trentm · 2020-12-15T22:41:22Z

Some quick notes and a first hack:

patch to apm-nodejs-http-client.git

diff --git a/index.js b/index.js
index c27fb71..2bb3a0d 100644
--- a/index.js
+++ b/index.js
@@ -495,7 +495,9 @@ function onStream (client, onerror) {

     // All requests to the APM Server must start with a metadata object
     if (!client._encodedMetadata) {
-      client._encodedMetadata = client._encode({ metadata: client._conf.metadata }, Client.encoding.METADATA)
+      // XXX HACK: This should be done *once*, but as late as possible.
+      var filteredMetadata = client._conf.metadataFilters.process(client._conf.metadata)
+      client._encodedMetadata = client._encode({ metadata: filteredMetadata }, Client.encoding.METADATA)
     }
     stream.write(client._encodedMetadata)
   }

patch to apm-agent-nodejs.git

diff --git a/lib/agent.js b/lib/agent.js
index cd75056..56038c1 100644
--- a/lib/agent.js
+++ b/lib/agent.js
@@ -42,6 +42,7 @@ function Agent () {
   this._errorFilters = new Filters()
   this._transactionFilters = new Filters()
   this._spanFilters = new Filters()
+  this._metadataFilters = new Filters()
   this._transport = null

   this.lambda = lambda(this)
@@ -221,6 +224,12 @@ Agent.prototype.addFilter = function (fn) {
   this.addErrorFilter(fn)
   this.addTransactionFilter(fn)
   this.addSpanFilter(fn)
+  // XXX Decide if this would be a breaking change. For example the *default
+  // filter example in the docs* will break because it assumes
+  // `payload.context`. However that should *already* break for filtering
+  // error payloads.
+  // https://www.elastic.co/guide/en/apm/agent/nodejs/current/agent-api.html#apm-add-filter
+  this.addMetadataFilter(fn)
 }

 Agent.prototype.addErrorFilter = function (fn) {
@@ -250,6 +259,15 @@ Agent.prototype.addSpanFilter = function (fn) {
   this._spanFilters.push(fn)
 }

+Agent.prototype.addMetadataFilter = function (fn) {
+  if (typeof fn !== 'function') {
+    this.logger.error('Can\'t add filter of type %s', typeof fn)
+    return
+  }
+
+  this._metadataFilters.push(fn)
+}
+
 Agent.prototype.captureError = function (err, opts, cb) {
   if (typeof opts === 'function') return this.captureError(err, null, opts)

diff --git a/lib/config.js b/lib/config.js
index 2702181..e69e6f5 100644
--- a/lib/config.js
+++ b/lib/config.js
@@ -253,6 +253,7 @@ class Config {
           globalLabels: maybePairsToObject(conf.globalLabels),
           hostname: conf.hostname,
           environment: conf.environment,
+          metadataFilters: agent._metadataFilters,

           // Sanitize conf
           truncateKeywordsAt: config.INTAKE_STRING_MAX_SIZE,

which allows me to write a (poor) filter somewhat for your case like this:

apm.addMetadataFilter(function myFilt(payload) {
  if (payload.process && payload.process.argv) {
    const user = new RegExp(process.env.USER, 'g')
    payload.process.argv = payload.process.argv.map((arg) => {
      return arg.replace(user, '[REDACTED]')
    })
  }
  return payload
})

and then observe that filtering in traffic to apm-server:

[2020-12-15T22:33:03.579Z]  INFO: mockapmserver/26715 on pink.local: request (req.remoteAddress=::ffff:127.0.0.1, req.remotePort=60675)
    POST /intake/v2/events HTTP/1.1
    accept: application/json
    user-agent: elasticapm-node/3.9.0 elastic-apm-http-client/9.4.2 node/10.23.0
    content-type: application/x-ndjson
    content-encoding: gzip
    host: localhost:8200
    connection: keep-alive
    transfer-encoding: chunked

    {"metadata":{"service":{"name":"esapp","environment":"development","runtime":{"name":"node","version":"10.23.0"},"language":{"name":"javascript"},"agent":{"name":"nodejs","version":"3.9.0"},"framework":{"name":"express","version":"4.17.1"},"version":"1.0.1"},"process":{"pid":28461,"ppid":27276,"title":"node","argv":["/Users/[REDACTED]/.nvm/versions/node/v10.23.0/bin/node","/Users/[REDACTED]/tm/play/esapp.js"]},"system":{"hostname":"pink.local","architecture":"x64","platform":"darwin"}}}
    {"span":{"name":...

trentm · 2020-12-15T22:48:49Z

Notes:

I think probably not include this in "addFilters" because it really is a different category of data to filter: data that doesn't change frequently.
This change should consider the soon-to-come cloud metadata work (Collect cloud metadata #1815) to ensure that the filter runs after that lazily collected cloud metadata.

trentm · 2021-02-22T19:47:16Z

@joshdover Would your use case be handled with a config option to fully disable including process args in the "metadata"? I ask because the Java APM agent has a include_process_args config options for this. In fact, the Java agent does not include process args by default -- mainly because it can be large and overkill to include in the metadata that is sent for every payload to APM server.

If this handles your use case, it would be faster as well.

@tylersmalley I noticed you +1'd this above. Do you have a different use case that would be better served with a apm.addMetadataFilter(...)?

Currently my preference is to:

add a matching include_process_args-like config option to support excluding process args from metadata
add a ticket for the next major release of the node.js agent to exclude process args by default
only add apm.addMetadataFilter(...) once we have a decent use case for it from some user(s).

tylersmalley · 2021-02-25T19:13:23Z

I don't think the process args are the only place that the full path is included, which is the issue since it includes the username.

trentm · 2021-02-25T21:01:10Z

@tylersmalley Thanks! You are right. process.title is the other one that can hold the full path. I'm sold on the use case then. I'll get a PR moving for adding a metadata filter, then.

tylersmalley · 2021-02-26T01:37:57Z

Much appreciated, thanks!!

Fixes: #1916

trentm · 2021-04-15T22:45:26Z

This will be in v3.14.0 of the agent, which we hope to release early next week.

github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Dec 15, 2020

trentm self-assigned this Dec 15, 2020

AlexanderWert added the kibana label Feb 2, 2021

AlexanderWert added this to the 7.12 milestone Feb 2, 2021

AlexanderWert modified the milestones: 7.12, 7.13 Mar 29, 2021

trentm mentioned this issue Mar 31, 2021

fix: blocking behaviour under load elastic/apm-nodejs-http-client#144

Merged

trentm mentioned this issue Apr 15, 2021

feat: client.addMetadataFilter(fn) elastic/apm-nodejs-http-client#154

Merged

trentm added a commit that referenced this issue Apr 15, 2021

feat: add apm.addMetadataFilter(fn)

a85b3c3

Fixes: #1916

trentm mentioned this issue Apr 15, 2021

feat: add apm.addMetadataFilter(fn) #2045

Merged

6 tasks

trentm closed this as completed in #2045 Apr 15, 2021

trentm added a commit that referenced this issue Apr 15, 2021

feat: add apm.addMetadataFilter(fn) (#2045)

df84561

Fixes: #1916

trentm mentioned this issue Apr 19, 2021

Bump elastic-apm-node from v3.10.0 to v3.14.0 elastic/kibana#97509

Merged

1 task

trentm mentioned this issue May 4, 2021

redact PII from self-instrumentation APM metadata elastic/kibana#99153

Open

trentm mentioned this issue Nov 15, 2022

Specify agent logging preamble elastic/apm#713

Merged

6 tasks

david-luna mentioned this issue May 26, 2023

chore: add logging preamble #3383

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filtering support for metadata payload #1916

Add filtering support for metadata payload #1916

joshdover commented Dec 15, 2020

trentm commented Dec 15, 2020 •

edited

Loading

trentm commented Dec 15, 2020

trentm commented Feb 22, 2021

tylersmalley commented Feb 25, 2021

trentm commented Feb 25, 2021

tylersmalley commented Feb 26, 2021

trentm commented Apr 15, 2021

Add filtering support for metadata payload #1916

Add filtering support for metadata payload #1916

Comments

joshdover commented Dec 15, 2020

trentm commented Dec 15, 2020 • edited Loading

trentm commented Dec 15, 2020

trentm commented Feb 22, 2021

tylersmalley commented Feb 25, 2021

trentm commented Feb 25, 2021

tylersmalley commented Feb 26, 2021

trentm commented Apr 15, 2021

trentm commented Dec 15, 2020 •

edited

Loading