Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36715 - Speed up host fact retrieval when you only need a subset of facts #9819

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

jeremylenz
Copy link
Contributor

@jeremylenz jeremylenz commented Sep 1, 2023

A user with about 3k hosts has a report that runs host.facts on each host multiple times. This was causing the report generation time to be over an hour, since host.facts runs a few includes which result in quite a few SQL queries:

fact_values.includes(:fact_name).collect do |fact|
        hash[fact.fact_name.name] = fact.value
      end

This PR adds a new argument to host.facts so you can pass specific fact names. It also changes collect to each in the above code because we don't use the resulting arrays, we just iterate and build the final hash.

The report template can look something like this:

<%- load_hosts(search: input('Hosts filter')).each_record do |host| -%>
<%- host_facts = host.facts(['net::interface::eth0::ipv4_address', 'network::hostname']) -%>
<%-   report_row(
        'Hostname': host_facts['network::hostname'],
        'IP': host_facts['net::interface::eth0::ipv4_address']
    ) -%>
<%- end -%>

<%= report_render -%>

@theforeman-bot
Copy link
Member

Issues: #36715

example '@host.facts_with_names([\'dmi::memory::size\', \'net::interface::eth0\']) # => { "dmi::memory::size"=>"16 GB", "net::interface::eth0"=>"00:50:56:8a:5c:3c" }', desc: 'Getting specific facts'
end
def facts_with_names(names_query = [], *args)
filtered_names = Set.new([names_query, *args].flatten.compact)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason the report renderer splats the first argument, turning

host.facts_with_names(['fact1', 'fact2'])

into

host.facts_with_names('fact1', 'fact2')

resulting in an ArgumentError if I don't handle it here.

@jeremylenz jeremylenz marked this pull request as ready for review September 8, 2023 18:53
@parthaa
Copy link
Contributor

parthaa commented Sep 14, 2023

I 'd recommend this approach instead of facts_with_names. In my benchmark tests it was way faster.

diff --git a/app/models/host/base.rb b/app/models/host/base.rb
index 36fbe72..4f1d980 100644
--- a/app/models/host/base.rb
+++ b/app/models/host/base.rb
@@ -221,13 +221,16 @@ module Host
       example '@host.facts["uptime"] # => "30 days"', desc: 'Getting specific fact value, +uptime+ in this case'
       aliases :facts
     end
-    def facts_hash
+    def facts_hash(fact_names = [])
       hash = {}
-      fact_values.includes(:fact_name).collect do |fact|
+      query = fact_values.includes(:fact_name)
+      query = query.where(fact_names: {name: fact_names}) if fact_names.present?
+      query.collect do |fact|
         hash[fact.fact_name.name] = fact.value
       end
       hash
     end

For example with my change

fact_names = [
    'ansible_local::gls_ansible_ctd_posture::_ctd',
    'ansible_local::gls_ansible_timestemp::_timestemp',
    'ansible_local::gls_ansible_model::_model',
    'ansible_local::gls_ansible_mac::_mac_address',
    'ansible_local::gls_ansible_ipaddress::_gls_ansible_ipaddress',
    'ansible_local::gls_ansible_owner::_owner',
    'ansible_local::gls_ansible_lrt::_lab_lrt',
    'ansible_local::gls_ansible_hostname::_gls_ansible_hostname'
  ]

hashes = []
Benchmark.measure do 
  hashes = Host.all.map {|host| host.facts(fact_names)  }
end
=> #<Benchmark::Tms:0x000055fc793fe1b8 @label="", @real=4.964442664990202, @cstime=0.0, @cutime=0.0, @stime=0.068133, @utime=3.543512999999999, @total=3.611645999999999>

About 3.6 seconds for 2851 Hosts.

vs

hashes1 = []
Benchmark.measure do 
  hashes1 = Host.limit(500).map {|host| host.facts_with_names(fact_names) }
end
=> #<Benchmark::Tms:0x000055fcf3c900b8 @label="", @real=39.416546951048076, @cstime=0.0, @cutime=0.0, @stime=0.7837529999999999, @utime=30.203799, @total=30.987552>

30 seconds for 500 host records

@jeremylenz jeremylenz force-pushed the 36715-speed-up-host-facts branch 2 times, most recently from dae49ec to 6ed24f7 Compare September 15, 2023 16:59
@jeremylenz
Copy link
Contributor Author

@parthaa updated 👍

@jeremylenz jeremylenz changed the title Fixes #36715 - Add methods to speed up host fact retrieval Fixes #36715 - Speed up host fact retrieval when you only need a subset of facts Sep 15, 2023
Copy link
Contributor

@parthaa parthaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work well for me. APJ

@jeremylenz
Copy link
Contributor Author

jeremylenz commented Sep 15, 2023

In our tests on a system with ~2900 hosts, this new approach has improved report generation time from over 1 hour to about 11-13 seconds.

image

app/models/host/base.rb Outdated Show resolved Hide resolved
@jeremylenz jeremylenz force-pushed the 36715-speed-up-host-facts branch from 6ed24f7 to 525fd78 Compare September 15, 2023 19:27
app/models/host/base.rb Outdated Show resolved Hide resolved
app/models/host/base.rb Outdated Show resolved Hide resolved
app/models/host/base.rb Outdated Show resolved Hide resolved
@@ -214,17 +214,20 @@ def set_comment(parser)
desc 'Note that available facts depend on what facts have been uploaded to Foreman,
typical sources are Puppet facter, subscription manager etc.
The facts can be out of date, this macro only provides access to the value stored in the database.'
optional :fact_names, Array, desc: 'A list of fact names to return. If empty all facts are returned'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofedoren is this also correct documentation for *args parameters, or is there a specific tag for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, my bad for not noticing this earlier :/

Well, it's not as wrong. It's still quite optional :) But for *args params, I'd use one of list/splat/rest instead of optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you submit a follow up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure: #9850

@jeremylenz jeremylenz force-pushed the 36715-speed-up-host-facts branch from f7e012c to 51eebec Compare September 18, 2023 16:38
ekohl
ekohl previously approved these changes Sep 18, 2023
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be interested to see some benchmarks of the new code in the latest iteration.

@jeremylenz
Copy link
Contributor Author

I ran it again and report generation time (on our system with ~2900 hosts) is down to 9 seconds, not bad!
image

@jeremylenz jeremylenz force-pushed the 36715-speed-up-host-facts branch from 51eebec to 73fb064 Compare September 18, 2023 17:07
@jeremylenz
Copy link
Contributor Author

Squashed

@ekohl
Copy link
Member

ekohl commented Sep 18, 2023

[test katello]

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the improvement (looks like join & pluck shaved off another few seconds), but I wonder if there is test coverage for this new functionality. Can we also assert that it only performs a single DB query?

app/models/host/base.rb Show resolved Hide resolved
@jeremylenz
Copy link
Contributor Author

test coverage for this new functionality. Can we also assert that it only performs a single DB query?

Did a bit of research and it looks like there is a "bullet" gem that helps with this, but I didn't want to add any new gems. So I added a helper method in test_helper; see what you think.

Turns out .pluck was adding an additional SQL query being on a separate line, so I fixed that as well.

I also improved facts to cache the results for a single host. This is because a lot of reports call host.facts[xxx] repeatedly, and not all users may immediately switch to the new facts(['fact_name1', 'fact_name2']) syntax:

'CTD': host.facts['ansible_local::gls_ansible_ctd_posture::_ctd'],
 'TimeStemp': host.facts['ansible_local::gls_ansible_timestemp::_timestemp'],
 'Model': host.facts['ansible_local::gls_ansible_model::_model'],

Added tests to cover all the new functionality.

def assert_sql_queries(num_of_queries)
queries = []
ActiveSupport::Notifications.subscribe("sql.active_record") do |name, start, finish, id, payload|
queries << payload[:sql] if payload[:sql] =~ /SELECT/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this also catch UPDATE and DROP and INSERT statements?

(but probably out of scope right now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be argued that those are SQL "statements" and not "queries" :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given SELECT can change data (hello triggers!), I see no difference. But as said, this is probably more than was asked in the original comment by Ewoud :)

@evgeni
Copy link
Member

evgeni commented Sep 20, 2023

I had promised you a review when I am back, but it seems Ewoud did beat me, so we're good :)

@@ -119,6 +119,16 @@ def after_teardown
Foreman::Plugin.send(:clear, @plugins_backup, @registries_backup)
@clear_plugins = nil
end

def assert_sql_queries(num_of_queries)
Copy link
Contributor

@parthaa parthaa Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def assert_sql_queries(num_of_queries)
def assert_sql_queries(num_of_queries, match = /SELECT/)

followed by

 queries << payload[:sql] if payload[:sql] =~ match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Updated 👍

@jeremylenz jeremylenz force-pushed the 36715-speed-up-host-facts branch from eeba8cf to cc6ece6 Compare September 20, 2023 15:17
@jeremylenz
Copy link
Contributor Author

Seems this will need another approving review since I added those tests.

@jeremylenz
Copy link
Contributor Author

@evgeni @ekohl any more comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants