Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove N+1 obj creation in flatten_arranged_rels #17325

Merged

Conversation

NickLaMuro
Copy link
Member

The recursive nature of the original code, this meant that a new array was created for every call of the method.

This new form now only creates 2 arrays, and speeds up the process to flatten the rels by 2x.

Benchmarks

With 2.7k records

BEFORE
======

Warming up --------------------------------------
flatten_arranged_rels
run 1                    2.239k i/100ms
run 2                    2.208k i/100ms

Calculating -------------------------------------
flatten_arranged_rels
run 1                    23.328k (±17.4%) i/s -    114.189k in   5.050293s
run 2                    23.520k (±17.6%) i/s -    114.816k in   5.043277s


AFTER
=====

Warming up --------------------------------------
flatten_arranged_rels
run 1                    4.787k i/100ms
run 2                    4.771k i/100ms
Calculating -------------------------------------
flatten_arranged_rels
run 1                    48.839k (± 7.8%) i/s -    244.137k in   5.033849s
run 2                    49.114k (± 9.3%) i/s -    248.092k in   5.101038s

With 8.3k records

BEFORE
======

Warming up --------------------------------------
flatten_arranged_rels
run 1                     370.000  i/100ms
run 2                     353.000  i/100ms
Calculating -------------------------------------
flatten_arranged_rels
run 1                     3.902k (±16.6%) i/s -     19.240k in   5.078666s
run 2                     3.589k (±17.7%) i/s -     17.650k in   5.089621s


AFTER
=====

Warming up --------------------------------------
flatten_arranged_rels
run 1                     859.000  i/100ms
run 2                     859.000  i/100ms
Calculating -------------------------------------
flatten_arranged_rels
run 1                     8.521k (±12.7%) i/s -     41.232k in   5.018254s
run 2                     8.525k (± 8.5%) i/s -     42.950k in   5.077203s

The recursive nature of the original code, this meant that a new array
was created for every call of the method.

This new form now only creates 2 arrays, and speeds up the process to
flatten the rels by 2x in most cases.
@NickLaMuro NickLaMuro force-pushed the flatten_arranged_rels_optimization branch from 1f91343 to fcfd86d Compare April 22, 2018 14:35
@miq-bot
Copy link
Member

miq-bot commented Apr 22, 2018

Checked commit NickLaMuro@fcfd86d with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 🍰

@NickLaMuro
Copy link
Member Author

@miq-bot assign @kbrock
@miq-bot add_labels performance, enhancement

@kbrock kbrock merged commit 32fadeb into ManageIQ:master May 4, 2018
@Fryguy
Copy link
Member

Fryguy commented May 8, 2018

@NickLaMuro Doesn't this end up modifying the caller (i.e. modifying the relationships hash by manipulating the values)? I was trying to avoid that originally.

@NickLaMuro
Copy link
Member Author

@Fryguy No (at least I don't think), because I am pretty sure both .keys and .values create new Arrays that reference the keys and values respectively, if I am not mistaken.

@Fryguy
Copy link
Member

Fryguy commented May 8, 2018

Yes, the .values part does but the iteration here https://github.com/ManageIQ/manageiq/pull/17325/files#diff-e652e9eb772d8d9d7e873e5d891c0c9bR82 doesn't, I don't think.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 8, 2018

@Fryguy okay, I see where it looks like it does that, but in reality, it doesn't pop it from the kids hash, but the Array that is created when .values is called. For example:

irb:001> hash = {:foo => [{:bar => []}, {:baz => [{:qux => []}]}]}
=> {:foo=>[{:bar=>[]}, {:baz=>[{:qux=>[]}]}]}
irb:002> remaining_children = hash.values
=> [[{:bar=>[]}, {:baz=>[{:qux=>[]}]}]]

You will notice that the array that is created is a "doubly nested" at the top level. This means that when .pop is called on it it will return:

[{:bar=>[]}, {:baz=>[{:qux=>[]}]}]

which is the value of the :foo key. When << is called on the kids of :bar, the remaining_children array will then become:

[[]]

So we are only mutating the .values array that we created within this method, and passing around a bunch of object references otherwise. Make sense?

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 8, 2018

I think another thing to note is that we are doing a .pop and iterating directly on the result (meaning the result is what we are iterating on), instead of doing a .pop inside of the iteration which is what is more commonly done (the "iteration" being the each in this case, not the surrounding while/until).

@Fryguy
Copy link
Member

Fryguy commented May 9, 2018

This is good 👍

I think your example Hash there threw me a bit, because it's not accurate. So, I instrumented and got the following

[8] pry(main)> Relationship.flatten_arranged_rels({:foo => {:bar => {}, :baz => {:qux => {}}}})
relationships: 70297681056560
relationships[:foo]: 70297681056600
relationships[:foo][:bar]: 70297681056680
relationships[:foo][:baz]: 70297681056620
relationships[:foo][:baz][:qux]: 70297681056660
remaining_children: [{:bar=>{}, :baz=>{:qux=>{}}}] [70297681056600]
rel: bar 4149468
kids: {} 70297681056680
rel: baz 14284188
kids: {:qux=>{}} 70297681056620
remaining_children: [{}, {:qux=>{}}] [70297681056680, 70297681056620]
rel: qux 14284508
kids: {} 70297681056660
remaining_children: [{}, {}] [70297681056680, 70297681056660]
remaining_children: [{}] [70297681056680]
=> [:foo, :bar, :baz, :qux]

Those numbers are object_ids, and you can see that while the "real" values are being used directly as I expected, they are ultimately only being shuffled around, and not modified directly. As you said, the .pop is only called on the outer array only, which is a copy anyway.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 9, 2018

@Fryguy oh crap, you are right, the values are just nested hashes, not nested arrays of hashes... derp 😩

(would also explain why I was having issues when trying this in IRB...).

Cool, glad it made sense and I wasn't mutating anything I shouldn't have. I was conscious of making sure I didn't do that when I wrote it, but kinda forgot how I ended up pulling it off after not looking at it for a week... Pretty sure I had some existing specs in there that failed when I did it wrong, but thanks for making sure I wasn't crazy and doing some due diligence. Very much appreciated!

@jrafanie jrafanie added this to the Sprint 85 Ending May 7, 2018 milestone May 22, 2018
@simaishi
Copy link
Contributor

Added fine/yes and gaprindashvili/yes as per https://bugzilla.redhat.com/show_bug.cgi?id=1569626

simaishi pushed a commit that referenced this pull request Jun 21, 2018
…ization

Remove N+1 obj creation in flatten_arranged_rels
(cherry picked from commit 32fadeb)

https://bugzilla.redhat.com/show_bug.cgi?id=1593798
@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit a6dc75ee3d0eb686d8c3dab9c37b8a4a4865a6d9
Author: Keenan Brock <[email protected]>
Date:   Fri May 4 10:40:36 2018 -0400

    Merge pull request #17325 from NickLaMuro/flatten_arranged_rels_optimization
    
    Remove N+1 obj creation in flatten_arranged_rels
    (cherry picked from commit 32fadeb91c0e63e34efa4c9321ec52a6bdb176ae)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1593798

simaishi pushed a commit that referenced this pull request Jun 21, 2018
…ization

Remove N+1 obj creation in flatten_arranged_rels
(cherry picked from commit 32fadeb)

https://bugzilla.redhat.com/show_bug.cgi?id=1593797
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit 245963eed00817208e34ae869f25f90ddf383216
Author: Keenan Brock <[email protected]>
Date:   Fri May 4 10:40:36 2018 -0400

    Merge pull request #17325 from NickLaMuro/flatten_arranged_rels_optimization
    
    Remove N+1 obj creation in flatten_arranged_rels
    (cherry picked from commit 32fadeb91c0e63e34efa4c9321ec52a6bdb176ae)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1593797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants