Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats endpoint #1384

Merged
merged 1 commit into from
Sep 22, 2017
Merged

Stats endpoint #1384

merged 1 commit into from
Sep 22, 2017

Conversation

jaym
Copy link
Contributor

@jaym jaym commented Aug 24, 2017

No description provided.

@jaym jaym requested a review from a team August 24, 2017 15:23
@jaym jaym mentioned this pull request Aug 24, 2017

monitored_pools() ->
% TODO(jaym) 08-23-17: Move this out to configuration
[sqerl, oc_chef_authz_http, chef_index_http, chef_depsolver].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but I suspect that keygen, chef_objects and data_collector may also have pools to monitor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried those, didn't work :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have been thinking of this:

(oc_erchef@api)1> chef_keygen_cache:status().
[{keys,10},
 {max,10},
 {max_workers,2},
 {cur_max_workers,2},
 {inflight,[]},
 {avail_workers,2},
 {start_size,0}]

@@ -56,6 +56,12 @@
{error_logger_hwm, <%= @log_rotation['max_messages_per_second'] %>}
]},

{prometheus, [{collectors, [default,
<% if node['private_chef']['postgresql']['enable'] && !node['private_chef']['postgresql']['external'] -%>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything about the stats we collect that would make it hard to collect for external DBs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not hard, but I was more thinking about who owns the service. The idea was, if the service is not on the box, it's not our problem to monitor. I could be convinced otherwise, however there might be weirdness if say multiple frontends were to report a service that they didn't own.

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read too late that this was WIP -- So, for what it's worth, some in-progress-nitpicks...

-module(chef_wm_stats).

-ifdef(TEST).
-compile(export_all).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This doesn't export more than -export() already exports, does it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should export everything in the file (for any tests to use them).

{"text/plain", to_text}
], Req, State}.

to_json(Req, State) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant for later inclusion? I wonder if this should provide application/json if it doesn't actually provide it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll be working on this today. Prometheus is a secondary format and json will be default

-define(A2B(X), erlang:atom_to_binary(X, utf8)).

init(_Any) ->
{ok, <<"{}">>}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use of passing <<"{}">> around as State? (Am I missing something?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I just copy pasted from elsewhere. Will take a closer look here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up.

to_text/2]).

-include_lib("webmachine/include/webmachine.hrl").
-define(A2B(X), erlang:atom_to_binary(X, utf8)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Is this used?

Callback = fun (MF) ->
Data = get(?PROCESS_DICT_STORAGE),
put(?PROCESS_DICT_STORAGE, [mf_to_erl(MF) | Data])
end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of a bummer that the API here doesn't let us pass around an accumulator instead.

Copy link
Contributor Author

@jaym jaym Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea :(
Would have been a lot cleaner

" SUM(heap_blks_read) as heap_blks_read, SUM(heap_blks_hit) as heap_blks_hit,"
" SUM(idx_blks_read) as idx_blks_read, SUM(idx_blks_hit) as idx_blks_hit, SUM(toast_blks_read) as toast_blks_read,"
" SUM(toast_blks_hit) as toast_blks_hit, SUM(tidx_blks_read) as tidx_blks_read,"
" SUM(tidx_blks_hit) as tidx_blks_hit FROM pg_stat_all_tables, pg_statio_all_tables">>}.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is inaccurate. "," does a cartesian join and inflates the number of final rows. I'm still trying to figure out if there was a join here we wanted or if we just wanted to get a composite query and how to do that (because reading postgresql to me is like spaniard trying to read Brazilian Portuguese... so close but yet so far).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no join. just a composite of 2 queries, perhaps i was trying to be too clever

Copy link
Contributor

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments regarding the password generation here. Other than that it looks good to me.

I'm a bit nervous about the use of the process dictionary, but I don't have a lot of reasoning to back up that nervousness so we can probably roll with it.

command "openssl passwd -apr1 '#{stats_api_passwd}' >> #{stats_passwd_file}"
action :nothing
sensitive true
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to store this password back into the secrets store somewhere. Perhaps rather than doing this as an execute we can move this into a library and then generate the password when we generate the other passwords in the initial configuration parsing.

Copy link

@ksubrama ksubrama Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I get this password from the secret store (unless I don't understand how this works). I do chef-server-ctl add-secret opscode_erchef stats_password "12345". This derived password is just a salted hash of that password.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this so that it autogenerates a password if not provided and also just generates the file in one go (instead of 2 steps). This way the nginx server can get restarted properly only when the password file actually changes.

group OmnibusHelper.new(node).ownership['group']
notifies :run, 'execute[stats_api_append_password]', :immediately
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of the veil work, we try not to put any passwords outside of the veil store when the user has insecure_addon_compat false. We've never really talked about how strict we want to be about that moving forward since that work was done to address some particular customer concerns.

Unfortunately, from the docs it looks like it might be hard/impossible to use the basic_auth module with out sticking the credential in that file. I suppose we could change over to using access_by_lua, but that seems a bit complex for this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The password is salted and hashed from my understanding.

cat /var/opt/opscode/nginx/stats_htpasswd
foo:$apr1$S.a9NOEq$YiWDrTTJb68HjyFjm9CmH/```

So it might be ok?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://nginx.org/en/docs/http/ngx_http_auth_basic_module.html does seem to mention that the actual algorithms used might be... weak? I get suspicious when I see MD5 in there. But there seems to be no other nice way to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksubrama Ah, yeah, somehow I overlooked the fact that this is in fact NOT a "plain text password" in that file. I think that means we are OK.


-spec format(Registry :: prometheus_registry:registry()) -> binary().
format(Registry) ->
put(?PROCESS_DICT_STORAGE, []),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with using the process dictionary here is out of ignorance. I don't know if putting what could be a lot of data in the process dictionary will cause us any problems down the road. The prometheus code uses the ram_disk module for all of it's formatters; however I have no idea if that is better or worse. Perhaps for now this is just something we live with but keep an eye on.

Copy link
Contributor Author

@jaym jaym Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wanted to use ram file, but I couldn't figure out how to do it without rolling my own json serializer. That being said, there shouldn't be a lot of data. The other option I could think of was creating a gen server to store the state.

If it makes you feel better, webmachine already makes fairly heavy use of the process dictionary: https://github.com/webmachine/webmachine/blob/82472528d8a735d28fa92a7c6e03fc97cb4d1913/src/webmachine_decision_core.erl#L28

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on leaving this as is for now. Thanks for following up.

@@ -647,6 +648,14 @@ def gen_redundant(node_name, topology)
end
end

def ensure_stats_password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC for this feature says:

A username and password will be generated if it not provided.

So, I think we should probably be treating this like the other non-optional passwords where we generate the username and password if they are not provided.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed that. Ok - I need to figure out how to save a generated password.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

{<<"seq_tup_read">>, {pg_stat_seq_tup_read, counter, "Number of live rows fetched by sequential scans", fun erlang:binary_to_integer/1}},
{<<"idx_scan">>, {pg_stat_idx_scan, counter, "Number of index scans initiated", fun erlang:binary_to_integer/1}},
{<<"idx_tup_fetch">>, {pg_stat_tup_fetch, counter, "Number of live rows fetched by index scans", fun erlang:binary_to_integer/1}},
{<<"n_tup_ins">>, {pg_stat_n_tup_ins, counter, "Number of rows inserted", fun erlang:binary_to_integer/1}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it clear in these descriptions that these are sums across all tables perhaps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jaym
Copy link
Contributor Author

jaym commented Sep 11, 2017

@ksubrama you mentioned you had some pedant tests. If so, could you push them up.

@ksubrama ksubrama force-pushed the jdm/SUSTAIN-673 branch 2 times, most recently from 9c33b3f to 9675732 Compare September 12, 2017 21:25
stat["metrics"].each do |metric|
expect(metric).to have_key("value")
if type == "GAUGE" || type == "COUNTER"
expect(Integer(metric["value"])).to be_a(Integer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guages can be float as well

response_type_map.each do |name, type|
stat = response.find { |s| s["name"] == name }
expect(stat["metrics"]).not_to be_empty
stat["metrics"].each do |metric|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of iterating inside the it, is it possible to create an it for each so that if there is a failure, you know what is broken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% I think there is a bug in webmachine where it wont allow us to use
% 'text/plain; version=0.0.4'.
% TODO: Understand https://github.com/basho/webmachine/blob/develop/src/webmachine_util.erl#L140-L158
{{"text/plain",[{"version","0.0.4"}]}, to_text},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to follow the rfc, we're supposed to accept &format=.
Seeing as we dont do that, we should probably remove text/plain for now. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather us just quickly implement a format parameter, but I understand that there is probably some desire to get this PR wrapped up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@@ -17,6 +17,15 @@ def ownership
{"owner" => owner, "group" => group}
end

def apr1_password(password)
cmd = Mixlib::ShellOut.new("openssl passwd -apr1 '#{password}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a full path to our embedded openssl install here?

@jaym Do we need to do anything here for FIPS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPENSSL_FIPS=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i highly doubt that will work with fips. It might be safer to use crypt() instead. Not sure you can have a working system where crypt() doesn't work

Copy link
Contributor

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a couple of comments but am marking this as approved. Feel free to address the comments as you see fit.

We should err on the side of getting this in and then iterating on it since it isn't in the critical path of any requests.

@jaym jaym requested a review from a team September 13, 2017 17:53
{"text/plain", to_text}
], Req, State}.
{"text/plain", to_text}],
case wrq:get_qs_value("format", Req) of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

# Run command once and cache the value.
STATS_PASSWORD =
begin
cmd = Mixlib::ShellOut.new("chef-server-ctl show-secret opscode_erchef stats_password")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively to this, we can assume the password is in the environment:

https://github.com/chef/chef-server/blob/master/oc-chef-pedant/lib/pedant.rb#L79-L80

And then do something like this to feed it into the env at runtime:

https://github.com/chef/chef-server/blob/master/omnibus/files/private-chef-ctl-commands/test.rb#L8-L9

Perhaps also putting a guard around the tests to disable them if the secret doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaym
Copy link
Contributor Author

jaym commented Sep 22, 2017

👍

Currently returns some basic erlang vm statistics in the prometheus text
format. This is the first step towards RFC #93.

Basic auth support is present out of the box.  The user can edit the
['opscode-erchef']['stats_auth_enable'] flag to change this.
The actual password is stored in opscode_erchef.stats_password in
chef veil.  A default password is generated if the user does not
specify one before running chef-server-ctl reconfigure.

The end-point is accessible at /_stats.  You can use ?format=json or
?format=text to access the available metric in json or prometheus
formats.

The stats returned are documented at
#1385.

```
  # TYPE erlang_vm_memory_atom_bytes_total gauge
  # HELP erlang_vm_memory_atom_bytes_total The total amount of memory currently allocated for atoms. This memory is part of the memory presented as system memory.
  erlang_vm_memory_atom_bytes_total{usage="used"} 1455697
  erlang_vm_memory_atom_bytes_total{usage="free"} 12848
  # TYPE erlang_vm_memory_bytes_total gauge
  # HELP erlang_vm_memory_bytes_total The total amount of memory currently allocated. This is the same as the sum of the memory size for processes and system.
  erlang_vm_memory_bytes_total{kind="system"} 45984768
  erlang_vm_memory_bytes_total{kind="processes"} 35098912
  # TYPE erlang_vm_dets_tables gauge
  # HELP erlang_vm_dets_tables Erlang VM DETS Tables count
  erlang_vm_dets_tables 0
  # TYPE erlang_vm_ets_tables gauge
  # HELP erlang_vm_ets_tables Erlang VM ETS Tables count
  erlang_vm_ets_tables 64
  # TYPE erlang_vm_memory_processes_bytes_total gauge
  # HELP erlang_vm_memory_processes_bytes_total The total amount of memory currently allocated for the Erlang processes.
  erlang_vm_memory_processes_bytes_total{usage="used"} 35091312
  erlang_vm_memory_processes_bytes_total{usage="free"} 7600
  # TYPE erlang_vm_memory_system_bytes_total gauge
  # HELP erlang_vm_memory_system_bytes_total The total amount of memory currently allocated for the emulator that is not directly related to any Erlang process. Memory presented as processes is not included in this memory.
  erlang_vm_memory_system_bytes_total{usage="atom"} 1468545
  erlang_vm_memory_system_bytes_total{usage="binary"} 803360
  erlang_vm_memory_system_bytes_total{usage="code"} 33479913
  erlang_vm_memory_system_bytes_total{usage="ets"} 2372400
  erlang_vm_memory_system_bytes_total{usage="other"} 7860550
  # TYPE erlang_vm_statistics_context_switches counter
  # HELP erlang_vm_statistics_context_switches Total number of context switches since the system started
  erlang_vm_statistics_context_switches 29549971
  # TYPE erlang_vm_statistics_garbage_collection_number_of_gcs counter
  # HELP erlang_vm_statistics_garbage_collection_number_of_gcs Garbage collection: number of GCs
  erlang_vm_statistics_garbage_collection_number_of_gcs 1632986
  # TYPE erlang_vm_statistics_garbage_collection_words_reclaimed counter
  # HELP erlang_vm_statistics_garbage_collection_words_reclaimed Garbage collection: words reclaimed
  erlang_vm_statistics_garbage_collection_words_reclaimed 12664754817
  # TYPE erlang_vm_statistics_garbage_collection_bytes_reclaimed counter
  # HELP erlang_vm_statistics_garbage_collection_bytes_reclaimed Garbage collection: bytes reclaimed
  erlang_vm_statistics_garbage_collection_bytes_reclaimed 101318038536
  # TYPE erlang_vm_statistics_bytes_received_total counter
  # HELP erlang_vm_statistics_bytes_received_total Total number of bytes received through ports
  erlang_vm_statistics_bytes_received_total 795724439
  # TYPE erlang_vm_statistics_bytes_output_total counter
  # HELP erlang_vm_statistics_bytes_output_total Total number of bytes output to ports
  erlang_vm_statistics_bytes_output_total 604157373
  # TYPE erlang_vm_statistics_reductions_total counter
  # HELP erlang_vm_statistics_reductions_total Total reductions
  erlang_vm_statistics_reductions_total 4946085703
  # TYPE erlang_vm_statistics_run_queues_length_total gauge
  # HELP erlang_vm_statistics_run_queues_length_total Total length of the run-queues
  erlang_vm_statistics_run_queues_length_total 0
  # TYPE erlang_vm_statistics_runtime_milliseconds counter
  # HELP erlang_vm_statistics_runtime_milliseconds The sum of the runtime for all threads in the Erlang runtime system. Can be greater than wall clock time
  erlang_vm_statistics_runtime_milliseconds 573750
  # TYPE erlang_vm_statistics_wallclock_time_milliseconds counter
  # HELP erlang_vm_statistics_wallclock_time_milliseconds Information about wall clock. Same as erlang_vm_statistics_runtime_milliseconds except that real time is measured
  erlang_vm_statistics_wallclock_time_milliseconds 5495986
  # TYPE erlang_vm_ets_limit gauge
  # HELP erlang_vm_ets_limit The maximum number of ETS tables allowed.
  erlang_vm_ets_limit 2053
  # TYPE erlang_vm_logical_processors gauge
  # HELP erlang_vm_logical_processors The detected number of logical processors configured in the system.
  erlang_vm_logical_processors 4
  # TYPE erlang_vm_logical_processors_available gauge
  # HELP erlang_vm_logical_processors_available The detected number of logical processors available to the Erlang runtime system.
  erlang_vm_logical_processors_available 4
  # TYPE erlang_vm_logical_processors_online gauge
  # HELP erlang_vm_logical_processors_online The detected number of logical processors online on the system.
  erlang_vm_logical_processors_online 4
  # TYPE erlang_vm_port_count gauge
  # HELP erlang_vm_port_count The number of ports currently existing at the local node.
  erlang_vm_port_count 61
  # TYPE erlang_vm_port_limit gauge
  # HELP erlang_vm_port_limit The maximum number of simultaneously existing ports at the local node.
  erlang_vm_port_limit 65536
  # TYPE erlang_vm_process_count gauge
  # HELP erlang_vm_process_count The number of processes currently existing at the local node.
  erlang_vm_process_count 395
  # TYPE erlang_vm_process_limit gauge
  # HELP erlang_vm_process_limit The maximum number of simultaneously existing processes at the local node.
  erlang_vm_process_limit 262144
  # TYPE erlang_vm_schedulers gauge
  # HELP erlang_vm_schedulers The number of scheduler threads used by the emulator.
  erlang_vm_schedulers 4
  # TYPE erlang_vm_schedulers_online gauge
  # HELP erlang_vm_schedulers_online The number of schedulers online.
  erlang_vm_schedulers_online 4
  # TYPE erlang_vm_smp_support untyped
  # HELP erlang_vm_smp_support 1 if the emulator has been compiled with SMP support, otherwise 0.
  erlang_vm_smp_support 1
  # TYPE erlang_vm_threads untyped
  # HELP erlang_vm_threads 1 if the emulator has been compiled with thread support, otherwise 0.
  erlang_vm_threads 1
  # TYPE erlang_vm_thread_pool_size gauge
  # HELP erlang_vm_thread_pool_size The number of async threads in the async thread pool used for asynchronous driver calls.
  erlang_vm_thread_pool_size 10
  # TYPE erlang_vm_time_correction untyped
  # HELP erlang_vm_time_correction 1 if time correction is enabled, otherwise 0.
  erlang_vm_time_correction 1

  # TYPE erchef_pooler_members_in_use gauge
  # HELP erchef_pooler_members_in_use Number of pool members currently being used.
  erchef_pooler_members_in_use{pool_name="sqerl"} 0
  erchef_pooler_members_in_use{pool_name="oc_chef_authz_http"} 0
  erchef_pooler_members_in_use{pool_name="chef_index_http"} 0
  erchef_pooler_members_in_use{pool_name="chef_depsolver"} 0
  # TYPE erchef_pooler_members_free gauge
  # HELP erchef_pooler_members_free Number of pool members currently available.
  erchef_pooler_members_free{pool_name="sqerl"} 20
  erchef_pooler_members_free{pool_name="oc_chef_authz_http"} 25
  erchef_pooler_members_free{pool_name="chef_index_http"} 25
  erchef_pooler_members_free{pool_name="chef_depsolver"} 5
  # TYPE erchef_pooler_members_max gauge
  # HELP erchef_pooler_members_max Max number of pool members allowed in the pool.
  erchef_pooler_members_max{pool_name="sqerl"} 20
  erchef_pooler_members_max{pool_name="oc_chef_authz_http"} 100
  erchef_pooler_members_max{pool_name="chef_index_http"} 100
  erchef_pooler_members_max{pool_name="chef_depsolver"} 5
  # TYPE erchef_pooler_queued_requestors gauge
  # HELP erchef_pooler_queued_requestors Number of requestors blocking to take a pool member.
  erchef_pooler_queued_requestors{pool_name="sqerl"} 0
  erchef_pooler_queued_requestors{pool_name="oc_chef_authz_http"} 0
  erchef_pooler_queued_requestors{pool_name="chef_index_http"} 0
  erchef_pooler_queued_requestors{pool_name="chef_depsolver"} 0
  # TYPE erchef_pooler_queued_requestors_max gauge
  # HELP erchef_pooler_queued_requestors_max Max number of requestors allowed to block on taking pool member.
  erchef_pooler_queued_requestors_max{pool_name="sqerl"} 20
  erchef_pooler_queued_requestors_max{pool_name="oc_chef_authz_http"} 50
  erchef_pooler_queued_requestors_max{pool_name="chef_index_http"} 50
  erchef_pooler_queued_requestors_max{pool_name="chef_depsolver"} 50

  # TYPE pg_stat_seq_scan counter
  # HELP pg_stat_seq_scan Number of sequential scans initiated
  pg_stat_seq_scan 5047095
  # TYPE pg_stat_seq_tup_read counter
  # HELP pg_stat_seq_tup_read Number of live rows fetched by sequential scans
  pg_stat_seq_tup_read 415028001
  # TYPE pg_stat_idx_scan counter
  # HELP pg_stat_idx_scan Number of index scans initiated
  pg_stat_idx_scan 23751732
  # TYPE pg_stat_tup_fetch counter
  # HELP pg_stat_tup_fetch Number of live rows fetched by index scans
  pg_stat_tup_fetch 25953870
  # TYPE pg_stat_n_tup_ins counter
  # HELP pg_stat_n_tup_ins Number of rows inserted
  pg_stat_n_tup_ins 52452
  # TYPE pg_stat_n_tup_upd counter
  # HELP pg_stat_n_tup_upd Number of rows updated
  pg_stat_n_tup_upd 73461
  # TYPE pg_stat_n_tup_del counter
  # HELP pg_stat_n_tup_del Number of rows deleted
  pg_stat_n_tup_del 31020
  # TYPE pg_stat_n_live_tup gauge
  # HELP pg_stat_n_live_tup Estimated number of live rows
  pg_stat_n_live_tup 92355
  # TYPE pg_stat_n_dead_tup gauge
  # HELP pg_stat_n_dead_tup Estimated number of dead rows
  pg_stat_n_dead_tup 29046
  # TYPE pg_stat_heap_blocks_read counter
  # HELP pg_stat_heap_blocks_read Number of disk blocks read
  pg_stat_heap_blocks_read 69231
  # TYPE pg_stat_heap_blocks_hit counter
  # HELP pg_stat_heap_blocks_hit Number of buffer hits
  pg_stat_heap_blocks_hit 38834925
  # TYPE pg_stat_idx_blks_read counter
  # HELP pg_stat_idx_blks_read Number of disk blocks read from all indexes
  pg_stat_idx_blks_read 49632
  # TYPE pg_stat_idx_blks_hit counter
  # HELP pg_stat_idx_blks_hit Number of buffer hits in all indexes
  pg_stat_idx_blks_hit 35615613
  # TYPE pg_stat_toast_blks_read counter
  # HELP pg_stat_toast_blks_read Number of disk blocks read from TOAST tables
  pg_stat_toast_blks_read 1410
  # TYPE pg_stat_toast_blks_hit counter
  # HELP pg_stat_toast_blks_hit Number of buffer hits in TOAST tables
  pg_stat_toast_blks_hit 2397
  # TYPE pg_stat_tidx_blks_read counter
  # HELP pg_stat_tidx_blks_read Number of disk blocks read from TOAST tables
  pg_stat_tidx_blks_read 1410
  # TYPE pg_stat_tidx_blks_hit counter
  # HELP pg_stat_tidx_blks_hit Number of buffer hits in TOAST table indexes
  pg_stat_tidx_blks_hit 3948
```

Same stats in JSON format:

```
[
   {
      "metrics" : [
         {
            "value" : "1"
         }
      ],
      "type" : "UNTYPED",
      "name" : "erlang_vm_time_correction",
      "help" : "1 if time correction is enabled, otherwise 0."
   },
   {
      "metrics" : [
         {
            "value" : "10"
         }
      ],
      "type" : "GAUGE",
      "help" : "The number of async threads in the async thread pool used for asynchronous driver calls.",
      "name" : "erlang_vm_thread_pool_size"
   },
   {
      "type" : "UNTYPED",
      "metrics" : [
         {
            "value" : "1"
         }
      ],
      "help" : "1 if the emulator has been compiled with thread support, otherwise 0.",
      "name" : "erlang_vm_threads"
   },
   {
      "help" : "1 if the emulator has been compiled with SMP support, otherwise 0.",
      "name" : "erlang_vm_smp_support",
      "type" : "UNTYPED",
      "metrics" : [
         {
            "value" : "1"
         }
      ]
   },
   {
      "help" : "The number of schedulers online.",
      "name" : "erlang_vm_schedulers_online",
      "metrics" : [
         {
            "value" : "4"
         }
      ],
      "type" : "GAUGE"
   },
   {
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "4"
         }
      ],
      "name" : "erlang_vm_schedulers",
      "help" : "The number of scheduler threads used by the emulator."
   },
   {
      "name" : "erlang_vm_process_limit",
      "help" : "The maximum number of simultaneously existing processes at the local node.",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "262144"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "395"
         }
      ],
      "type" : "GAUGE",
      "help" : "The number of processes currently existing at the local node.",
      "name" : "erlang_vm_process_count"
   },
   {
      "metrics" : [
         {
            "value" : "65536"
         }
      ],
      "type" : "GAUGE",
      "name" : "erlang_vm_port_limit",
      "help" : "The maximum number of simultaneously existing ports at the local node."
   },
   {
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "60"
         }
      ],
      "name" : "erlang_vm_port_count",
      "help" : "The number of ports currently existing at the local node."
   },
   {
      "metrics" : [
         {
            "value" : "4"
         }
      ],
      "type" : "GAUGE",
      "help" : "The detected number of logical processors online on the system.",
      "name" : "erlang_vm_logical_processors_online"
   },
   {
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "4"
         }
      ],
      "help" : "The detected number of logical processors available to the Erlang runtime system.",
      "name" : "erlang_vm_logical_processors_available"
   },
   {
      "help" : "The detected number of logical processors configured in the system.",
      "name" : "erlang_vm_logical_processors",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "4"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "2053"
         }
      ],
      "type" : "GAUGE",
      "name" : "erlang_vm_ets_limit",
      "help" : "The maximum number of ETS tables allowed."
   },
   {
      "metrics" : [
         {
            "value" : "21716981"
         }
      ],
      "type" : "COUNTER",
      "help" : "Information about wall clock. Same as erlang_vm_statistics_runtime_milliseconds except that real time is measured",
      "name" : "erlang_vm_statistics_wallclock_time_milliseconds"
   },
   {
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "2089510"
         }
      ],
      "name" : "erlang_vm_statistics_runtime_milliseconds",
      "help" : "The sum of the runtime for all threads in the Erlang runtime system. Can be greater than wall clock time"
   },
   {
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "0"
         }
      ],
      "help" : "Total length of the run-queues",
      "name" : "erlang_vm_statistics_run_queues_length_total"
   },
   {
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "16208617229"
         }
      ],
      "name" : "erlang_vm_statistics_reductions_total",
      "help" : "Total reductions"
   },
   {
      "name" : "erlang_vm_statistics_bytes_output_total",
      "help" : "Total number of bytes output to ports",
      "metrics" : [
         {
            "value" : "2374560576"
         }
      ],
      "type" : "COUNTER"
   },
   {
      "metrics" : [
         {
            "value" : "2375163718"
         }
      ],
      "type" : "COUNTER",
      "help" : "Total number of bytes received through ports",
      "name" : "erlang_vm_statistics_bytes_received_total"
   },
   {
      "metrics" : [
         {
            "value" : "359118308568"
         }
      ],
      "type" : "COUNTER",
      "help" : "Garbage collection: bytes reclaimed",
      "name" : "erlang_vm_statistics_garbage_collection_bytes_reclaimed"
   },
   {
      "help" : "Garbage collection: words reclaimed",
      "name" : "erlang_vm_statistics_garbage_collection_words_reclaimed",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "44889788571"
         }
      ]
   },
   {
      "help" : "Garbage collection: number of GCs",
      "name" : "erlang_vm_statistics_garbage_collection_number_of_gcs",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "5263551"
         }
      ]
   },
   {
      "name" : "erlang_vm_statistics_context_switches",
      "help" : "Total number of context switches since the system started",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "109981973"
         }
      ]
   },
   {
      "help" : "The total amount of memory currently allocated for the emulator that is not directly related to any Erlang process. Memory presented as processes is not included in this memory.",
      "name" : "erlang_vm_memory_system_bytes_total",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "1468545",
            "labels" : {
               "usage" : "atom"
            }
         },
         {
            "labels" : {
               "usage" : "binary"
            },
            "value" : "942728"
         },
         {
            "value" : "33503265",
            "labels" : {
               "usage" : "code"
            }
         },
         {
            "value" : "2496320",
            "labels" : {
               "usage" : "ets"
            }
         },
         {
            "value" : "8025046",
            "labels" : {
               "usage" : "other"
            }
         }
      ]
   },
   {
      "help" : "The total amount of memory currently allocated for the Erlang processes.",
      "name" : "erlang_vm_memory_processes_bytes_total",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "24208976",
            "labels" : {
               "usage" : "used"
            }
         },
         {
            "labels" : {
               "usage" : "free"
            },
            "value" : "21304"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "66"
         }
      ],
      "type" : "GAUGE",
      "help" : "Erlang VM ETS Tables count",
      "name" : "erlang_vm_ets_tables"
   },
   {
      "name" : "erlang_vm_dets_tables",
      "help" : "Erlang VM DETS Tables count",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "0"
         }
      ]
   },
   {
      "metrics" : [
         {
            "labels" : {
               "kind" : "system"
            },
            "value" : "46435904"
         },
         {
            "value" : "24230280",
            "labels" : {
               "kind" : "processes"
            }
         }
      ],
      "type" : "GAUGE",
      "help" : "The total amount of memory currently allocated. This is the same as the sum of the memory size for processes and system.",
      "name" : "erlang_vm_memory_bytes_total"
   },
   {
      "name" : "erlang_vm_memory_atom_bytes_total",
      "help" : "The total amount of memory currently allocated for atoms. This memory is part of the memory presented as system memory.",
      "type" : "GAUGE",
      "metrics" : [
         {
            "labels" : {
               "usage" : "used"
            },
            "value" : "1455537"
         },
         {
            "value" : "13008",
            "labels" : {
               "usage" : "free"
            }
         }
      ]
   },
   {
      "help" : "Max number of requestors allowed to block on taking pool member.",
      "name" : "erchef_pooler_queued_requestors_max",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "20",
            "labels" : {
               "pool_name" : "sqerl"
            }
         },
         {
            "value" : "50",
            "labels" : {
               "pool_name" : "oc_chef_authz_http"
            }
         },
         {
            "labels" : {
               "pool_name" : "chef_index_http"
            },
            "value" : "50"
         },
         {
            "value" : "50",
            "labels" : {
               "pool_name" : "chef_depsolver"
            }
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "sqerl"
            }
         },
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "oc_chef_authz_http"
            }
         },
         {
            "labels" : {
               "pool_name" : "chef_index_http"
            },
            "value" : "0"
         },
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "chef_depsolver"
            }
         }
      ],
      "type" : "GAUGE",
      "help" : "Number of requestors blocking to take a pool member.",
      "name" : "erchef_pooler_queued_requestors"
   },
   {
      "name" : "erchef_pooler_members_max",
      "help" : "Max number of pool members allowed in the pool.",
      "type" : "GAUGE",
      "metrics" : [
         {
            "labels" : {
               "pool_name" : "sqerl"
            },
            "value" : "20"
         },
         {
            "value" : "100",
            "labels" : {
               "pool_name" : "oc_chef_authz_http"
            }
         },
         {
            "labels" : {
               "pool_name" : "chef_index_http"
            },
            "value" : "100"
         },
         {
            "labels" : {
               "pool_name" : "chef_depsolver"
            },
            "value" : "5"
         }
      ]
   },
   {
      "name" : "erchef_pooler_members_free",
      "help" : "Number of pool members currently available.",
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "20",
            "labels" : {
               "pool_name" : "sqerl"
            }
         },
         {
            "value" : "25",
            "labels" : {
               "pool_name" : "oc_chef_authz_http"
            }
         },
         {
            "value" : "25",
            "labels" : {
               "pool_name" : "chef_index_http"
            }
         },
         {
            "labels" : {
               "pool_name" : "chef_depsolver"
            },
            "value" : "5"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "sqerl"
            }
         },
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "oc_chef_authz_http"
            }
         },
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "chef_index_http"
            }
         },
         {
            "value" : "0",
            "labels" : {
               "pool_name" : "chef_depsolver"
            }
         }
      ],
      "type" : "GAUGE",
      "name" : "erchef_pooler_members_in_use",
      "help" : "Number of pool members currently being used."
   },
   {
      "name" : "pg_stat_tidx_blks_hit",
      "help" : "Number of buffer hits in TOAST table indexes",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "6486"
         }
      ]
   },
   {
      "help" : "Number of disk blocks read from TOAST tables",
      "name" : "pg_stat_tidx_blks_read",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "1410"
         }
      ]
   },
   {
      "help" : "Number of buffer hits in TOAST tables",
      "name" : "pg_stat_toast_blks_hit",
      "metrics" : [
         {
            "value" : "3666"
         }
      ],
      "type" : "COUNTER"
   },
   {
      "help" : "Number of disk blocks read from TOAST tables",
      "name" : "pg_stat_toast_blks_read",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "1410"
         }
      ]
   },
   {
      "name" : "pg_stat_idx_blks_hit",
      "help" : "Number of buffer hits in all indexes",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "72120936"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "49632"
         }
      ],
      "type" : "COUNTER",
      "name" : "pg_stat_idx_blks_read",
      "help" : "Number of disk blocks read from all indexes"
   },
   {
      "metrics" : [
         {
            "value" : "85697544"
         }
      ],
      "type" : "COUNTER",
      "name" : "pg_stat_heap_blocks_hit",
      "help" : "Number of buffer hits"
   },
   {
      "help" : "Number of disk blocks read",
      "name" : "pg_stat_heap_blocks_read",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "69513"
         }
      ]
   },
   {
      "metrics" : [
         {
            "value" : "29610"
         }
      ],
      "type" : "GAUGE",
      "help" : "Estimated number of dead rows",
      "name" : "pg_stat_n_dead_tup"
   },
   {
      "type" : "GAUGE",
      "metrics" : [
         {
            "value" : "92355"
         }
      ],
      "help" : "Estimated number of live rows",
      "name" : "pg_stat_n_live_tup"
   },
   {
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "31020"
         }
      ],
      "name" : "pg_stat_n_tup_del",
      "help" : "Number of rows deleted"
   },
   {
      "help" : "Number of rows updated",
      "name" : "pg_stat_n_tup_upd",
      "metrics" : [
         {
            "value" : "87702"
         }
      ],
      "type" : "COUNTER"
   },
   {
      "metrics" : [
         {
            "value" : "52452"
         }
      ],
      "type" : "COUNTER",
      "help" : "Number of rows inserted",
      "name" : "pg_stat_n_tup_ins"
   },
   {
      "help" : "Number of live rows fetched by index scans",
      "name" : "pg_stat_tup_fetch",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "44669364"
         }
      ]
   },
   {
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "48594381"
         }
      ],
      "name" : "pg_stat_idx_scan",
      "help" : "Number of index scans initiated"
   },
   {
      "name" : "pg_stat_seq_tup_read",
      "help" : "Number of live rows fetched by sequential scans",
      "type" : "COUNTER",
      "metrics" : [
         {
            "value" : "1167619731"
         }
      ]
   },
   {
      "help" : "Number of sequential scans initiated",
      "name" : "pg_stat_seq_scan",
      "metrics" : [
         {
            "value" : "9793719"
         }
      ],
      "type" : "COUNTER"
   }
]
```

Signed-off-by: Kartik Null Cating-Subramanian <[email protected]>
@ksubrama ksubrama merged commit 1b3153a into master Sep 22, 2017
@ksubrama ksubrama deleted the jdm/SUSTAIN-673 branch September 22, 2017 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants