Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues #87

Open
mcmire opened this issue Jun 3, 2020 · 8 comments
Open

Performance issues #87

mcmire opened this issue Jun 3, 2020 · 8 comments
Labels

Comments

@mcmire
Copy link
Collaborator

mcmire commented Jun 3, 2020

I have recently noticed some performance issues with the gem when dealing with large diffs with deeply nested structures, which I've confirmed by temporarily switching off require "super_diff/rspec" at work. I don't have anything that points to what would be causing this, but for context the test looked like:

expect(some_record).to have_attributes(
  # ...
)

where some_record is an instance of a god model. I should look into this to see if there's any low-hanging fruit. It's also probably worth adding a benchmark as a test to ensure that this doesn't become a problem in the future once a fix is implemented.

@mcmire
Copy link
Collaborator Author

mcmire commented Sep 17, 2020

Upon closer inspection, the culprit seems to be attribute_names in SuperDiff::RSpec::OperationTreeBuilders::ObjectHavingAttributes. When used in the context of an ActiveRecord object this method could be called thousands of times. Caching this helps a ton.

We should add a performance test to ensure that this isn't a problem going forward (a test that consists of several large ActiveRecord objects, ideally).

@dbackeus
Copy link

dbackeus commented May 19, 2021

On this note I just tried this gem for the first time and we have specs using graphql-ruby looking like:

results = OurSchema.execute(...)
expect(results).to match hash_including(...)

For just a single one of these, super_diff spent about 20 seconds producing 10MB of wall of text output via the GraphQL::Query::Result class. Just adding .to_h to the results object solves the issue but its quite the gotcha.

Is there any option to simply disable super_diff for unknown types? Or could we configure it to automatically run .to_h when encountering GraphQL::Query::Result objects?

@mcmire
Copy link
Collaborator Author

mcmire commented May 19, 2021

Yeah I can see how that would be annoying. There is a way to configure super_diff to process objects in a custom way, but it depends on which matcher you're using to match that object and which objects are being compared to each other. However, this customization works best if you're using eq in your expectation to compare one thing to another. If anywhere in your expectation you're using a "partial match" matcher (match hash_including(...), match an_object_having_attributes(...), include(...), have_attributes(...), etc.) then that will take priority and there isn't a way to tell super_diff to do something different when it sees a different type of object.

Regardless, it seems like you ran into the slowdown you mentioned because you were trying to use hash_including against something that wasn't a hash (GraphQL::Query::Result), and because of this, the logic that super_diff was running to do the comparison was more granular than intended. What I imagine happened is that the gem fell back to displaying the full contents of the Result object when it generated the diff. It probably iterated through all of the instance variables and spat out their values, so because Result has a @query variable it might have tried to print out the full contents of this object (and I imagine that is very large). I think you could have written the expectation the following way to narrow the logic super_diff was performing so that it was only concerned with printing out the result of to_h instead of the rest of the Result object:

expect(results).to have_attributes(
  to_h: a_hash_including(
    # ...
  )
)

Or, using something that was closer to what you went with:

expect(results.to_h).to include(
  # ...
)

In the future, you might have to be more careful about which matchers you use, so as to prevent super_diff from doing more work than is needed to generate a diff. That said, I would be curious to know what output you got to confirm the assumption I'm making above and see how I could make super_diff better for you. I would not be surprised if the way super_diff inspects full objects is slow itself and needs some fine-tuning to make it more performant. So if there's a way you can go into more detail about what sort of output super_diff generated then that would help me!

@dbackeus
Copy link

dbackeus commented May 19, 2021

Thanks for the reply!

Yeah, I already spent some time combing through our graphql specs adding .to_h to a few hundred schema executions. However I'm not sure that's a solution that scales across a large development department where most developers would expect rspec to "just work" without being intimately familiar with super_diff. Principle of least surprise and all that.

Possibly worth the tradeoff though. After all I have been thinking about having something like this for more than a decade 😅

@mcmire
Copy link
Collaborator Author

mcmire commented May 19, 2021

However I'm not sure that's a solution that scales across a large development department where most developers would expect rspec to "just work" without being intimately familiar with super_diff. Principle of least surprise and all that.

Ah... okay. I'm all for that too, I definitely don't want people rewriting their tests to accommodate this tool. So now I'm curious — if you turn off super_diff, and you run a GraphQL test that you know fails, what does RSpec spit out for the diff?

@abachman
Copy link

abachman commented Nov 16, 2021

Hey, just hit this same issue in a fairly large GraphQL codebase within a federated graph spanning 10+ projects. Had a simple seeming test that I expected to fail and it completely locked rspec generating the many megabytes of colored stdout text. Pegged a CPU on a 2021 macbook pro and printed for more than a minute :P

Basically:

  let(:query) do 
    execute_query(query: "query { things { id } }")
    # returning: <GraphQL::Query::Result @to_h={"data" => {"things" => [{"id" => "1"}]}}>
  end

  let(:expectation) do
    {"data" => {"things" => [{"id" => "1"}, {"id" => "2"}]}}
  end

  it "matches expectation" do 
    expect(query).to eq(expectation)
  end

One really really simple dumb fix is to add an attributes_for_super_diff method to the query result in a spec support file:

class GraphQL::Query::Result
  def attributes_for_super_diff
    to_h
  end
end

But that doesn't give the nice diff'd output:

1) GraphQL matches expectation
     Failure/Error: expect(query).to eq(expectation)

                       
       Expected #<GraphQL::Query::Result "data" => { "things" => [ { "id" => "1" }] }>
               #^ this is all yellow
          to eq { "data" => { "things" => [{ "id" => "1"}, { "id" => "2" }] } }
               #^ this is all purple
     # ./spec/graphql/queries/example_spec.rb:12:in `block (2 levels) in <top (required)>'

I've tried reverse engineering the custom differ + object tree builder + object tree + object tree flattener code in the library and ended up with the same output but way more code.

I feel like all I really want to say is "replace this object with its to_h method before doing super diffing". Is that possible?

@abachman
Copy link

Ahhhhh, answering my own question after writing ~100 lines of code and then deleting ~80 of 'em.

This seems to be the minimal custom SuperDiff code for treating a GraphQL::Query::Result like a plain-old-ruby-hash.

My spec/support/super_diff.rb file looks like this:

class GraphQL::Query::Result
  def attributes_for_super_diff
    to_h
  end
end

module GraphQL
  class Differ < ::SuperDiff::Differs::Base
    def self.applies_to?(expected, actual)
      expected.is_a?(::Hash) &&
        actual.is_a?(::GraphQL::Query::Result)
    end

    def operation_tree_builder_class
      GraphQL::OperationTreeBuilder
    end
  end

  class OperationTreeBuilder < SuperDiff::OperationTreeBuilders::Hash
    def self.applies_to?(value)
      value.is_a?(::GraphQL::Query::Result)
    end

    def initialize(actual:, **rest)
      @actual = actual.to_h
      super
    end
  end
end

SuperDiff.configure do |config|
  config.add_extra_differ_class(GraphQL::Differ)
end

Which produces this output:

1) GraphQL matches expectation
     Failure/Error: expect(result).to eq(expectation)

       Expected #<GraphQL::Query::Result "data" => { "things" => [ { "id" => "1" }] }>
          to eq { "data" => { "things" => [{ "id" => "1"}, { "id" => "2" }] } }

       Diff:

       ┌ (Key) ──────────────────────────┐
       │ ‹-› in expected, not in actual  │
       │ ‹+› in actual, not in expected  │
       │ ‹ › in both expected and actual │
       └─────────────────────────────────┘

         {
           "data" => {
             "things" => [
               {
                 "id" => "1"
               }
       -       {
       -         "id" => "2"
       -       }
             ]
           }
         }
     # ./spec/graphql/queries/example_spec.rb:12:in `block (2 levels) in <top (required)>'

Which is exactly what I wanted!

@mcmire
Copy link
Collaborator Author

mcmire commented Nov 16, 2021

Awesome, I'm happy you were able to figure it out @abachman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants