feat: add config[:obfuscation_limit] to pg and mysql2 #224

reid-rigo · 2022-12-23T15:41:58Z

Enable the following:

c.use 'OpenTelemetry::Instrumentation::PG', db_statement: :obfuscate, obfuscation_limit: 1000

to allow users to control obfuscation performance via a configurable limit.

Additionally, when SQL is over the limit, truncate it to the first regex match to make output more useful for analysis:

SELECT * from users where users.id =... 
SQL truncated (> 1000 characters)

linux-foundation-easycla · 2022-12-23T15:42:03Z

The committers listed above are authorized under a signed CLA.

✅ login: reid-rigo / name: Reid Lynch (279c669, e79fd95)

reid-rigo · 2022-12-23T16:11:47Z

I'm working with my employer on the CLA and will leave this as a draft until I'm authorized.

ahayworth · 2023-01-21T01:03:26Z

@reid-rigo Thanks for the PR, and let us know once the CLA is signed.

That said, I'm a little conflicted about this PR:

If you (personally) are looking for a higher limit, doesn't that imply that we are perhaps being too conservative here and should instead just raise the limit for everyone?
If you (personally) are looking for a lower limit, doesn't that imply that we are not being conservative enough and should instead lower the limit for everyone?

It may help to know that while span size is a concern in general, the main motivation behind the truncation limit is to avoid running a regular expression that might a long time in a hot code path. Truncating it after the first match implies running that regular expression, and I think that defeats one of the the purposes of the check to begin with.

With all that in mind - can you elaborate on your goals with this PR? That might help us understand what the right course of action should be.

Also worth mentioning - we might consider conditionally enabling or disabling this at all based on Ruby version. The latest release has greatly improved the speed of regular expressions and also introduced a "timeout" that could also help keep this code path speedy.

ref: https://bugs.ruby-lang.org/issues/19104 https://bugs.ruby-lang.org/issues/17837

reid-rigo · 2023-01-21T14:21:29Z

@ahayworth, in my opinion the current limit is far too low. I really want to understand the performance of big queries, and the milliseconds spent obfuscating the SQL is likely to be dwarfed by the query itself. As a reference, New Relic's limit is 16384.

As for the use or index(regex), in my testing it's dramatically faster than running the full gsub. And of course the truncated query still gives me a lot to look at.

The Ruby regex development is really interesting, and I saw it shortly after working on this. I haven't tested it yet, but increasing the default limit based on Ruby version could make a lot of sense.

Edit: Also I think we'll have the CLA signed soon.

arielvalentin · 2023-01-21T14:50:24Z

cc: #32

ahayworth · 2023-01-23T15:47:30Z

As for the use or index(regex), in my testing it's dramatically faster than running the full gsub. And of course the truncated query still gives me a lot to look at.

Interesting, that's not what I would have expected. Ruby always has interesting surprises for me, even after many years with it. 😄 It's a good surprise, though!

in my opinion the current limit is far too low. I really want to understand the performance of big queries, and the milliseconds spent obfuscating the SQL is likely to be dwarfed by the query itself. As a reference, New Relic's limit is 16384.
...
but increasing the default limit based on Ruby version could make a lot of sense

I would support increasing it, regardless of Ruby version based on New Relic's defaults actually. I am curious if @robertlaurin has any thoughts here; I recall he had done some benchmarking on sanitization performance last year.

arielvalentin · 2023-01-23T15:58:25Z

Anecdotes: Sanitizing the SQL queries is very slow in our prod environments. I do not have any profiles to share at the moment but I will do my best to share something with y'all.

I am still interested in finding a good way to optimize or offload the SQL sanitization from the users request path but did not invest enough time in that recently. Technically we should not be doing any scrubbing in the SDK but there isn't anything in the collector that does this for us AFAIK.

I think that is why #32 will be critical for our users so we may optimize as much as we can in a single module/package.

reid-rigo · 2023-02-06T20:37:35Z

@ahayworth, the CLA is signed. I'd be happy to separate out the truncation part of this to keep this PR purely about the obfuscation limit.

@arielvalentin, would you able to benchmark truncation with one of your big SQL queries? Here's what I used to test:

require "benchmark"

COMPONENTS_REGEX_MAP = {
  single_quotes: /'(?:[^']|'')*?(?:\\'.*|'(?!'))/,
  dollar_quotes: /(\$(?!\d)[^$]*?\$).*?(?:\1|$)/,
  uuids: /\{?(?:[0-9a-fA-F]\-*){32}\}?/,
  numeric_literals: /-?\b(?:[0-9]+\.)?[0-9]+([eE][+-]?[0-9]+)?\b/,
  boolean_literals: /\b(?:true|false|null)\b/i,
  comments: /(?:#|--).*?(?=\r|\n|$)/i,
  multi_line_comments: /\/\*(?:[^\/]|\/[^*])*?(?:\*\/|\/\*.*)/
}.freeze

POSTGRES_COMPONENTS = %i[
  single_quotes
  dollar_quotes
  uuids
  numeric_literals
  boolean_literals
  comments
  multi_line_comments
].freeze

generated_postgres_regex = Regexp.union(POSTGRES_COMPONENTS.map { |component| COMPONENTS_REGEX_MAP[component] })
sql = <<SQL
  ...
SQL

puts "SQL length: #{sql.length}"

times = 1000
Benchmark.bm(16) do |x|
  x.report("full gsub") do
    times.times do
      sql.gsub(generated_postgres_regex, '?')
    end
  end
  x.report("first index") do
    times.times do
      sql[..sql.index(generated_postgres_regex) - 1]
    end
  end
end

arielvalentin · 2023-04-03T18:42:26Z

Very sorry about this falling off of my radar. I did not try the proposed changes out against some of our more gnarly production queries but I will schedule time to do this later in the week of 2023-04-03.

I will also add that we use trilogy and not mysql2 since that gem has not had active maintenance for quite some time.

@ahayworth would you mind re-engaging here? I would like to see about resolving your concerns.

github-actions · 2023-05-04T01:50:43Z

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

ericmustin · 2023-05-19T12:44:24Z

this should stay open i think

arielvalentin · 2023-05-19T13:27:28Z

I agree!

I think we should review this again and see about getting it merged.

instrumentation/mysql2/test/opentelemetry/instrumentation/mysql2/instrumentation_test.rb

Follow up from open-telemetry#224

* feat: Add Obfuscation Limit to Trilogy Follow up from #224 * fix: copy-paste error

feat: add config[:obfuscation_limit] to pg and mysql2

279c669

add ellipsis

e79fd95

Merge branch 'main' into feature-sql-obfuscation-limit

eb7cb65

reid-rigo marked this pull request as ready for review February 22, 2023 21:40

reid-rigo requested review from fbogsany, mwear, robertlaurin, dazuma, ericmustin, arielvalentin, ahayworth, plantfansam and robbkidd as code owners February 22, 2023 21:40

github-actions bot added the stale Marks an issue/PR stale label May 4, 2023

github-actions bot closed this May 19, 2023

ericmustin reopened this May 19, 2023

arielvalentin added keep Ensures stale-bot keeps this issue/PR open and removed stale Marks an issue/PR stale labels May 19, 2023

Merge branch 'main' into feature-sql-obfuscation-limit

25b24f1

arielvalentin approved these changes May 25, 2023

View reviewed changes

arielvalentin reviewed May 25, 2023

View reviewed changes

instrumentation/mysql2/test/opentelemetry/instrumentation/mysql2/instrumentation_test.rb Outdated Show resolved Hide resolved

fix: remove whitespace

e9e68db

arielvalentin merged commit b369020 into open-telemetry:main May 25, 2023

arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this pull request May 25, 2023

feat: Add Obfuscation Limit to Trilogy

28dec0e

Follow up from open-telemetry#224

arielvalentin added a commit that referenced this pull request May 25, 2023

feat: Add Obfuscation Limit Option to Trilogy (#477)

234738c

* feat: Add Obfuscation Limit to Trilogy Follow up from #224 * fix: copy-paste error

kaylareopelle mentioned this pull request Sep 5, 2024

fix!: Return message when sql is over the obfuscation limit #1149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add config[:obfuscation_limit] to pg and mysql2 #224

feat: add config[:obfuscation_limit] to pg and mysql2 #224

reid-rigo commented Dec 23, 2022 •

edited

Loading

linux-foundation-easycla bot commented Dec 23, 2022 •

edited

Loading

reid-rigo commented Dec 23, 2022

ahayworth commented Jan 21, 2023

reid-rigo commented Jan 21, 2023 •

edited

Loading

arielvalentin commented Jan 21, 2023

ahayworth commented Jan 23, 2023

arielvalentin commented Jan 23, 2023

reid-rigo commented Feb 6, 2023

arielvalentin commented Apr 3, 2023

github-actions bot commented May 4, 2023

ericmustin commented May 19, 2023

arielvalentin commented May 19, 2023

feat: add config[:obfuscation_limit] to pg and mysql2 #224

feat: add config[:obfuscation_limit] to pg and mysql2 #224

Conversation

reid-rigo commented Dec 23, 2022 • edited Loading

linux-foundation-easycla bot commented Dec 23, 2022 • edited Loading

reid-rigo commented Dec 23, 2022

ahayworth commented Jan 21, 2023

reid-rigo commented Jan 21, 2023 • edited Loading

arielvalentin commented Jan 21, 2023

ahayworth commented Jan 23, 2023

arielvalentin commented Jan 23, 2023

reid-rigo commented Feb 6, 2023

arielvalentin commented Apr 3, 2023

github-actions bot commented May 4, 2023

ericmustin commented May 19, 2023

arielvalentin commented May 19, 2023

reid-rigo commented Dec 23, 2022 •

edited

Loading

linux-foundation-easycla bot commented Dec 23, 2022 •

edited

Loading

reid-rigo commented Jan 21, 2023 •

edited

Loading