Performance issues #87

mcmire · 2020-06-03T20:11:06Z

I have recently noticed some performance issues with the gem when dealing with large diffs with deeply nested structures, which I've confirmed by temporarily switching off require "super_diff/rspec" at work. I don't have anything that points to what would be causing this, but for context the test looked like:

expect(some_record).to have_attributes(
  # ...
)

where some_record is an instance of a god model. I should look into this to see if there's any low-hanging fruit. It's also probably worth adding a benchmark as a test to ensure that this doesn't become a problem in the future once a fix is implemented.

The text was updated successfully, but these errors were encountered:

mcmire · 2020-09-17T22:43:01Z

Upon closer inspection, the culprit seems to be attribute_names in SuperDiff::RSpec::OperationTreeBuilders::ObjectHavingAttributes. When used in the context of an ActiveRecord object this method could be called thousands of times. Caching this helps a ton.

We should add a performance test to ensure that this isn't a problem going forward (a test that consists of several large ActiveRecord objects, ideally).

dbackeus · 2021-05-19T15:04:56Z

On this note I just tried this gem for the first time and we have specs using graphql-ruby looking like:

results = OurSchema.execute(...)
expect(results).to match hash_including(...)

For just a single one of these, super_diff spent about 20 seconds producing 10MB of wall of text output via the GraphQL::Query::Result class. Just adding .to_h to the results object solves the issue but its quite the gotcha.

Is there any option to simply disable super_diff for unknown types? Or could we configure it to automatically run .to_h when encountering GraphQL::Query::Result objects?

mcmire · 2021-05-19T16:51:41Z

Yeah I can see how that would be annoying. There is a way to configure super_diff to process objects in a custom way, but it depends on which matcher you're using to match that object and which objects are being compared to each other. However, this customization works best if you're using eq in your expectation to compare one thing to another. If anywhere in your expectation you're using a "partial match" matcher (match hash_including(...), match an_object_having_attributes(...), include(...), have_attributes(...), etc.) then that will take priority and there isn't a way to tell super_diff to do something different when it sees a different type of object.

Regardless, it seems like you ran into the slowdown you mentioned because you were trying to use hash_including against something that wasn't a hash (GraphQL::Query::Result), and because of this, the logic that super_diff was running to do the comparison was more granular than intended. What I imagine happened is that the gem fell back to displaying the full contents of the Result object when it generated the diff. It probably iterated through all of the instance variables and spat out their values, so because Result has a @query variable it might have tried to print out the full contents of this object (and I imagine that is very large). I think you could have written the expectation the following way to narrow the logic super_diff was performing so that it was only concerned with printing out the result of to_h instead of the rest of the Result object:

expect(results).to have_attributes(
  to_h: a_hash_including(
    # ...
  )
)

Or, using something that was closer to what you went with:

expect(results.to_h).to include(
  # ...
)

In the future, you might have to be more careful about which matchers you use, so as to prevent super_diff from doing more work than is needed to generate a diff. That said, I would be curious to know what output you got to confirm the assumption I'm making above and see how I could make super_diff better for you. I would not be surprised if the way super_diff inspects full objects is slow itself and needs some fine-tuning to make it more performant. So if there's a way you can go into more detail about what sort of output super_diff generated then that would help me!

dbackeus · 2021-05-19T19:43:43Z

Thanks for the reply!

Yeah, I already spent some time combing through our graphql specs adding .to_h to a few hundred schema executions. However I'm not sure that's a solution that scales across a large development department where most developers would expect rspec to "just work" without being intimately familiar with super_diff. Principle of least surprise and all that.

Possibly worth the tradeoff though. After all I have been thinking about having something like this for more than a decade 😅

mcmire · 2021-05-19T20:18:22Z

However I'm not sure that's a solution that scales across a large development department where most developers would expect rspec to "just work" without being intimately familiar with super_diff. Principle of least surprise and all that.

Ah... okay. I'm all for that too, I definitely don't want people rewriting their tests to accommodate this tool. So now I'm curious — if you turn off super_diff, and you run a GraphQL test that you know fails, what does RSpec spit out for the diff?

abachman · 2021-11-16T15:43:09Z

Hey, just hit this same issue in a fairly large GraphQL codebase within a federated graph spanning 10+ projects. Had a simple seeming test that I expected to fail and it completely locked rspec generating the many megabytes of colored stdout text. Pegged a CPU on a 2021 macbook pro and printed for more than a minute :P

Basically:

  let(:query) do 
    execute_query(query: "query { things { id } }")
    # returning: <GraphQL::Query::Result @to_h={"data" => {"things" => [{"id" => "1"}]}}>
  end

  let(:expectation) do
    {"data" => {"things" => [{"id" => "1"}, {"id" => "2"}]}}
  end

  it "matches expectation" do 
    expect(query).to eq(expectation)
  end

One really really simple dumb fix is to add an attributes_for_super_diff method to the query result in a spec support file:

class GraphQL::Query::Result
  def attributes_for_super_diff
    to_h
  end
end

But that doesn't give the nice diff'd output:

1) GraphQL matches expectation
     Failure/Error: expect(query).to eq(expectation)

                       
       Expected #<GraphQL::Query::Result "data" => { "things" => [ { "id" => "1" }] }>
               #^ this is all yellow
          to eq { "data" => { "things" => [{ "id" => "1"}, { "id" => "2" }] } }
               #^ this is all purple
     # ./spec/graphql/queries/example_spec.rb:12:in `block (2 levels) in <top (required)>'

I've tried reverse engineering the custom differ + object tree builder + object tree + object tree flattener code in the library and ended up with the same output but way more code.

I feel like all I really want to say is "replace this object with its to_h method before doing super diffing". Is that possible?

abachman · 2021-11-16T16:15:01Z

Ahhhhh, answering my own question after writing ~100 lines of code and then deleting ~80 of 'em.

This seems to be the minimal custom SuperDiff code for treating a GraphQL::Query::Result like a plain-old-ruby-hash.

My spec/support/super_diff.rb file looks like this:

class GraphQL::Query::Result
  def attributes_for_super_diff
    to_h
  end
end

module GraphQL
  class Differ < ::SuperDiff::Differs::Base
    def self.applies_to?(expected, actual)
      expected.is_a?(::Hash) &&
        actual.is_a?(::GraphQL::Query::Result)
    end

    def operation_tree_builder_class
      GraphQL::OperationTreeBuilder
    end
  end

  class OperationTreeBuilder < SuperDiff::OperationTreeBuilders::Hash
    def self.applies_to?(value)
      value.is_a?(::GraphQL::Query::Result)
    end

    def initialize(actual:, **rest)
      @actual = actual.to_h
      super
    end
  end
end

SuperDiff.configure do |config|
  config.add_extra_differ_class(GraphQL::Differ)
end

Which produces this output:

1) GraphQL matches expectation
     Failure/Error: expect(result).to eq(expectation)

       Expected #<GraphQL::Query::Result "data" => { "things" => [ { "id" => "1" }] }>
          to eq { "data" => { "things" => [{ "id" => "1"}, { "id" => "2" }] } }

       Diff:

       ┌ (Key) ──────────────────────────┐
       │ ‹-› in expected, not in actual  │
       │ ‹+› in actual, not in expected  │
       │ ‹ › in both expected and actual │
       └─────────────────────────────────┘

         {
           "data" => {
             "things" => [
               {
                 "id" => "1"
               }
       -       {
       -         "id" => "2"
       -       }
             ]
           }
         }
     # ./spec/graphql/queries/example_spec.rb:12:in `block (2 levels) in <top (required)>'

Which is exactly what I wanted!

mcmire · 2021-11-16T16:53:33Z

Awesome, I'm happy you were able to figure it out @abachman!

mcmire added the 🐛 Bug label Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues #87

Performance issues #87

mcmire commented Jun 3, 2020 •

edited

Loading

mcmire commented Sep 17, 2020 •

edited

Loading

dbackeus commented May 19, 2021 •

edited

Loading

mcmire commented May 19, 2021 •

edited

Loading

dbackeus commented May 19, 2021 •

edited

Loading

mcmire commented May 19, 2021

abachman commented Nov 16, 2021 •

edited

Loading

abachman commented Nov 16, 2021

mcmire commented Nov 16, 2021

Performance issues #87

Performance issues #87

Comments

mcmire commented Jun 3, 2020 • edited Loading

mcmire commented Sep 17, 2020 • edited Loading

dbackeus commented May 19, 2021 • edited Loading

mcmire commented May 19, 2021 • edited Loading

dbackeus commented May 19, 2021 • edited Loading

mcmire commented May 19, 2021

abachman commented Nov 16, 2021 • edited Loading

abachman commented Nov 16, 2021

mcmire commented Nov 16, 2021

mcmire commented Jun 3, 2020 •

edited

Loading

mcmire commented Sep 17, 2020 •

edited

Loading

dbackeus commented May 19, 2021 •

edited

Loading

mcmire commented May 19, 2021 •

edited

Loading

dbackeus commented May 19, 2021 •

edited

Loading

abachman commented Nov 16, 2021 •

edited

Loading