-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add config[:obfuscation_limit] to pg and mysql2 #224
feat: add config[:obfuscation_limit] to pg and mysql2 #224
Conversation
I'm working with my employer on the CLA and will leave this as a draft until I'm authorized. |
@reid-rigo Thanks for the PR, and let us know once the CLA is signed. That said, I'm a little conflicted about this PR:
It may help to know that while span size is a concern in general, the main motivation behind the truncation limit is to avoid running a regular expression that might a long time in a hot code path. Truncating it after the first match implies running that regular expression, and I think that defeats one of the the purposes of the check to begin with. With all that in mind - can you elaborate on your goals with this PR? That might help us understand what the right course of action should be. Also worth mentioning - we might consider conditionally enabling or disabling this at all based on Ruby version. The latest release has greatly improved the speed of regular expressions and also introduced a "timeout" that could also help keep this code path speedy. ref: https://bugs.ruby-lang.org/issues/19104 https://bugs.ruby-lang.org/issues/17837 |
@ahayworth, in my opinion the current limit is far too low. I really want to understand the performance of big queries, and the milliseconds spent obfuscating the SQL is likely to be dwarfed by the query itself. As a reference, New Relic's limit is 16384. As for the use or The Ruby regex development is really interesting, and I saw it shortly after working on this. I haven't tested it yet, but increasing the default limit based on Ruby version could make a lot of sense. Edit: Also I think we'll have the CLA signed soon. |
cc: #32 |
Interesting, that's not what I would have expected. Ruby always has interesting surprises for me, even after many years with it. 😄 It's a good surprise, though!
I would support increasing it, regardless of Ruby version based on New Relic's defaults actually. I am curious if @robertlaurin has any thoughts here; I recall he had done some benchmarking on sanitization performance last year. |
Anecdotes: Sanitizing the SQL queries is very slow in our prod environments. I do not have any profiles to share at the moment but I will do my best to share something with y'all. I am still interested in finding a good way to optimize or offload the SQL sanitization from the users request path but did not invest enough time in that recently. Technically we should not be doing any scrubbing in the SDK but there isn't anything in the collector that does this for us AFAIK. I think that is why #32 will be critical for our users so we may optimize as much as we can in a single module/package. |
@ahayworth, the CLA is signed. I'd be happy to separate out the truncation part of this to keep this PR purely about the obfuscation limit. @arielvalentin, would you able to benchmark truncation with one of your big SQL queries? Here's what I used to test: require "benchmark"
COMPONENTS_REGEX_MAP = {
single_quotes: /'(?:[^']|'')*?(?:\\'.*|'(?!'))/,
dollar_quotes: /(\$(?!\d)[^$]*?\$).*?(?:\1|$)/,
uuids: /\{?(?:[0-9a-fA-F]\-*){32}\}?/,
numeric_literals: /-?\b(?:[0-9]+\.)?[0-9]+([eE][+-]?[0-9]+)?\b/,
boolean_literals: /\b(?:true|false|null)\b/i,
comments: /(?:#|--).*?(?=\r|\n|$)/i,
multi_line_comments: /\/\*(?:[^\/]|\/[^*])*?(?:\*\/|\/\*.*)/
}.freeze
POSTGRES_COMPONENTS = %i[
single_quotes
dollar_quotes
uuids
numeric_literals
boolean_literals
comments
multi_line_comments
].freeze
generated_postgres_regex = Regexp.union(POSTGRES_COMPONENTS.map { |component| COMPONENTS_REGEX_MAP[component] })
sql = <<SQL
...
SQL
puts "SQL length: #{sql.length}"
times = 1000
Benchmark.bm(16) do |x|
x.report("full gsub") do
times.times do
sql.gsub(generated_postgres_regex, '?')
end
end
x.report("first index") do
times.times do
sql[..sql.index(generated_postgres_regex) - 1]
end
end
end |
Very sorry about this falling off of my radar. I did not try the proposed changes out against some of our more gnarly production queries but I will schedule time to do this later in the week of 2023-04-03. I will also add that we use @ahayworth would you mind re-engaging here? I would like to see about resolving your concerns. |
👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
this should stay open i think |
I agree! I think we should review this again and see about getting it merged. |
instrumentation/mysql2/test/opentelemetry/instrumentation/mysql2/instrumentation_test.rb
Outdated
Show resolved
Hide resolved
Follow up from open-telemetry#224
* feat: Add Obfuscation Limit to Trilogy Follow up from #224 * fix: copy-paste error
Enable the following:
to allow users to control obfuscation performance via a configurable limit.
Additionally, when SQL is over the limit, truncate it to the first regex match to make output more useful for analysis: