Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add basic performance statistics to phone home #3044

Merged
merged 3 commits into from
Apr 4, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions UPGRADE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,18 @@ returned by the Client-Server API:
# configured on port 443.
curl -kv https://<host.name>/_matrix/client/versions 2>&1 | grep "Server:"

Upgrading to $NEXT_VERSION
====================

This release expands the anonymous usage stats sent if the opt-in
``report_stats`` configuration is set to ``true``. We now capture RSS memory
and cpu use at a very coarse level. This requires administrators to install
the optional ``psutil`` python module.

We would appreciate it if you could assist by ensuring this module is available
and ``report_stats`` is enabled. This will let us see if performance changes to
synapse are having an impact to the general community.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go in the changelog IMO, otherwise a) we'll forget to update $NEXT_VERSION and b) I think most people look at CHANGELOG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably then move the historical information into the changelog and leave UPDATING as just the top level howto, referring to the changelog for any specific requirements on upgrade? Just so we don't split the idea of "upgrade notes" across multiple places.

Upgrading to v0.15.0
====================

Expand Down
33 changes: 33 additions & 0 deletions synapse/app/homeserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,10 @@ def profiled(*args, **kargs):

stats = {}

# Contains the list of processes we will be monitoring
# currently either 0 or 1
stats_process = []

@defer.inlineCallbacks
def phone_stats_home():
logger.info("Gathering stats for reporting")
Expand All @@ -428,6 +432,13 @@ def phone_stats_home():
daily_sent_messages = yield hs.get_datastore().count_daily_sent_messages()
stats["daily_sent_messages"] = daily_sent_messages

if len(stats_process) > 0:
stats["memory_rss"] = 0
stats["cpu_average"] = 0
for process in stats_process:
stats["memory_rss"] += process.memory_info().rss
stats["cpu_average"] += int(process.cpu_percent(interval=None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include the time since the last call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nvm


logger.info("Reporting stats to matrix.org: %s" % (stats,))
try:
yield hs.get_simple_http_client().put_json(
Expand All @@ -437,10 +448,32 @@ def phone_stats_home():
except Exception as e:
logger.warn("Error reporting stats: %s", e)

def performance_stats_init():
try:
import psutil
process = psutil.Process()
# Ensure we can fetch both, and make the initial request for cpu_percent
# so the next request will use this as the initial point.
process.memory_info().rss
process.cpu_percent(interval=None)
logger.info("report_stats can use psutil")
stats_process.append(process)
except (ImportError, AttributeError):
logger.warn(
"report_stats enabled but psutil is not installed or incorrect version."
" Disabling reporting of memory/cpu stats."
" Ensuring psutil is available will help matrix.org track performance"
" changes across releases."
)

if hs.config.report_stats:
logger.info("Scheduling stats reporting for 3 hour intervals")
clock.looping_call(phone_stats_home, 3 * 60 * 60 * 1000)

# We need to defer this init for the cases that we daemonize
# otherwise the process ID we get is that of the non-daemon process
clock.call_later(0, performance_stats_init)

# We wait 5 minutes to send the first set of stats as the server can
# be quite busy the first few minutes
clock.call_later(5 * 60, phone_stats_home)
Expand Down