Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APPSERV-47 Adds Custom Watches to Monitoring Console #4463

Merged
merged 15 commits into from
Feb 5, 2020

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Jan 30, 2020

Summary

Adds custom watches to client. In contrast to watches collected from MonitoringWatchSources a custom watch is a user defined Watch instance. The monitoring client is used to compose such a watch. This can be done in the new page Watches that also lists all watches and allows to edit, delete or disable and enable them.

A disabled watch is simply not actively evaluating data, it is ignored but it will show in the graphs as a outline box instead of a filled one. All watches can be disabled and enabled. This is stored in the added configuration so that the disabled state can be recreated on server startup.

Only custom watches can be delete. Collected (or programmatic) watches cannot be deleted as they originate from the server's code.

Custom watches can be edited. Programmatic watches can be duplicated, which changes the name of the duplicate to Copy of {Name}. So this becomes another watch but allows to use the programmatic watch as a template.

To recreate the custom watches on server-startup the watches need to be stored in some form.
I decided to write custom to and from JSON methods for the affected classes. This will allow to put in the proper code that can handle changes to the data structure. The strategy here is to be able to handle missing fields (that would be added) and ignore fields (that would be removed).
The conversion is covered with unit tests.

On server side:

  • adds monitoring console configuration to store disabled watches as well as custom watch data
  • REST and service API to enable/disable watches
  • distinction between custom watches and programmatic watches
  • watch states are shared when deriving a watch
  • REST and service API to create custom watches
  • REST and service API to delete custom watches
  • WatchData REST API class is not writable to receive custom watch data send by client

On client side:

  • adds a Watches configuration page that lists existing watches, allows to edit, delete or disable/enable them.
  • extracts model constants to a separate file for better readability and clarity
  • extracts a controller file that encapsulation the client-server communication on client side for better readability and clarity
  • fixes/updates formatting of watch conditions (e.g. in alert tables) and watch lists
  • uses payara design system colors with better contrast
  • fixes alignment of indicator (text message above graph)
  • uses configured colours for status messages
  • now also triggers status messages from alerts (red=critical, amber=alerting)
  • adds new colour defaults for error and missing status
  • use of const instead of let (TLDR; const should be default, let only when variable is actually changed, var is basically never needed or better suited in our "non-legacy" codebase)
  • watch indicator at right side of the graph now shows the indicator as outline if the watch is disabled
  • Setting input with unit now can use a function to provide a changing unit.

WriteableView and ProtectedList

Tried to improve it to make it more usable. Code also could do with some cleanup.

  • added generics
  • added missing @Override annoations
  • removed unnecessary else nesting
  • made static helpers static
  • added some List methods that would otherwise throw a UnsupportedOperationException on the basis of existing method (so no actual transaction management was needed)

As mentioned some of the existing methods of the ProtectedList still show some odd behaviour.
When using set(int, E) the value at the index would be null next time I loaded the list.

Tests

Unit tests were added to verify the correctness of the to/from JSON conversion of the Watch class and all classes involved.

General information can be taken from the documentation https://github.com/payara/Payara-Server-Documentation/pull/705.

The custom watches feature was tested manually following the steps below:

General Setup:

  1. build, install and start the server
  2. use set-monitoring-console-configuration --enabled=true to deploy MC
  3. open MC at http://localhost:8080/monitoring-console/
  4. make sure browser cache for JS/CSS is cleared for MC's domain
  5. check that following pages do exist: Watches (if not most likely a browser cache issue - or get in touch)

Testing Disabling Existing Watches

  1. Open page Watches in MC
  2. click checkbox of one of the listed watches to disable it, e.g. Metric Collection Duration
  3. for Metric Collection Duration watch open page Monitoring and check the watch indicator on right side of the graph is now drawn as outline instead of filled
  4. restart the server, e.g. using asadmin commands
  5. refresh page Watches in MC and check disabled watch is still disabled
  6. enable the watch again
  7. for Metric Collection Duration watch open page Monitoring again and check the indicator is now drawn filled

Testing Disabling Custom Watches

  1. Open page Watches in MC
  2. create a new watch named Heap Check by checking Unhealthy, filling in: If ns:jvm HeapUsage in Percent is > (some value clearly below your current heap usage, see Core page), and press Save or Update
  3. goto Core page and check that the graph shows an alert because heap usage is above the watch level
  4. go back to Watches page and disable the created watch by clicking the checkbox on the left or using the cogs menu
  5. go back to Core and check that alert is gone (and that indicator is now as outline)
  6. go to Alerts page and check the alert is shown in Past Unacknowledged Alerts
  7. go to Watches page again an enable the watch once more, check on Core page that an alert is triggered again

Testing Custom Watches

  1. create a custom watch as given in steps for Testing Disabling Custom Watches
  2. restart the server
  3. open Watches page and check the custom watch created is still there

Testing Listing of collected Watches

  1. open Watches page of MC, with default settings two watches should be listed
  2. open admin console and browse to Configurations => server-config => HealthCheck_ tab CPU Usage and check Enabled
  3. go back to Watches page and check a new watch CPU Usage is listed now

Testing new admin command parameters are hidden

run:

asadmin> set-monitoring-console-configuration --help
NAME
     set-monitoring-console-configuration

SYNOPSIS
     Usage: set-monitoring-console-configuration [--enabled=true|
     false] 

OPTIONS
     --enabled

Command set-monitoring-console-configuration executed successfully.

@jbee jbee self-assigned this Jan 30, 2020

@Param(optional = true, alias = "remove-watch")
private String removeWatch;

Copy link
Contributor Author

@jbee jbee Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. I opted for a parameter for each action. This could also be done by adding a general parameter watch for the name and data for the JSON and another parameter action or so. But I found this easier to use. On the other hand these are not really intended to be used manually. They mainly exist so that the service can run them to update the config. Manual use is possible but will take a restart to take effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're not for manual use you can prepend them with an underscore (or two underscores, can't remember) to hide them as options on the command line (but not from REST).


if (constraintViolations != null && !constraintViolations.isEmpty()) {
Iterator<ConstraintViolation<ConfigBeanProxy>> it = constraintViolations.iterator();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Filling in generics the ConfigBeanProxy has shown to be not correct. While this might be the actual type the generics along the path to this point do not actually provide this guarantee.

appliedChanges.add(event);
}
for (ProtectedList<?> entry : changedCollections.values()) {
commitListChanges(entry, appliedChanges);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. The loop body got extracted to the commitListChanges method because I needed a bindable type variable to use generics.

@@ -472,32 +479,37 @@ public synchronized void abort(Transaction t) {
throw new TransactionFailure("Not part of a transaction", null);
}
ConfigBean newBean = bean.allocate(type);
WriteableView writeableView = bean.getHabitat().<ConfigSupport>getService(ConfigSupport.class).getWriteableView(newBean.getProxy(type), newBean);
bean.getHabitat().<ConfigSupport>getService(ConfigSupport.class);
WriteableView writeableView = ConfigSupport.getWriteableView(newBean.getProxy(type), newBean);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. Changed to static access or the static method getWriteableView.

@Override
public String toString() {
return proxied.toString();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. These actually work now 🕺

}

@Override
public E set(int index, E object) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB. While the above two now work I think set is not behaving properly. I looked into that but decided to not mess with it and avoid using it.

@jbee
Copy link
Contributor Author

jbee commented Jan 30, 2020

jenkins test please

…removed watches are also removed from disabled list
@jbee
Copy link
Contributor Author

jbee commented Jan 30, 2020

jenkins test please

@jbee jbee requested a review from Pandrex247 January 31, 2020 10:06
Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find these in the code so I'll place some comments here:

Not a fan of the "circle" for active - it looks like a radio button to me which was extra confusing with my next point 😂
Also not a fan of having the circle / pause simply be an icon with the actual enable/disable button being next to it - can they not be one and the same?

Not sure about the dropdown for x and ms: it seems a bit odd to me.
Selecting ms simply appends ms to the value, whereas x leaves it as an unknown default (without looking at documentation)?
To me it would be more intuitive to either drop the unit box and just ask people to enter the unit themselves (similar to some of the Micro command-line options), or add validation that prevents them from adding units themselves to the duration field and have a drop down with accepted values (similar to admin console). This seems like a really awkward in-between - particularly since there isn't anything stopping me from setting the value to 1000 and the unit to ms, and then removing the m to make it 1000 seconds.

You have it mentioned in your testing guidelines but I don't think a server restart is required for custom watches to take affect from my own testing - don't know if there's some documentation you need to edit around that.

I'm not a JS person and it's a big PR so might have missed something, but aside from above points seems to work quite nicely 😊


@Param(optional = true, alias = "remove-watch")
private String removeWatch;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're not for manual use you can prepend them with an underscore (or two underscores, can't remember) to hide them as options on the command line (but not from REST).

@@ -84,6 +84,10 @@
public boolean isLessSevereThan(Level other) {
return ordinal() > other.ordinal();
}

public static fish.payara.monitoring.alert.Alert.Level parse(String level) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the full canonical name necessary here? I can't see a conflicting import, and neither does my IDE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. good catch.

* This is first of all the data and watch collection and evaluation.
*
* @author Jan Bernitt
* @since 5.204
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you mean 201 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, was again confused.

count: 'Count',
ms: 'Milliseconds',
ns: 'Nanoseconds',
bytes: 'Bytes',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is bytes necessarily the best unit vs. kilo or mega?
An concrete example in our monitoring doesn't jump immediately to mind but just inciting the thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more a dimension or base unit. KB or MB are multiples of this base unit. Which is expressed further up in the table of conversions. Also metrics themselves are actually given in bytes not kilo or mega. For the user this has no significance as values are automatically shown in the appropriate unit(s). For example 5KB or 10MB300Kb. Actual metrics such as used memory are in bytes and are only converted to something more fitting in the UI.

@@ -37,16 +37,31 @@
only if the new code is made subject to such option by the copyright
holder.
*/
:root {
/* Color RGB values to be used in other variable definitions */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Color
(╯°□°)╯︵ ┻━┻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Colour ;)

@@ -137,4 +141,30 @@ public String toString() {
}
return str.toString();
}

public JsonValue toJSON() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this class already exists, but a description of what it's used for wouldn't go amiss - circumstance isn't an intuitive name to me in the context of alerts (in comparison to something like event).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc.

@jbee
Copy link
Contributor Author

jbee commented Feb 3, 2020

You have it mentioned in your testing guidelines but I don't think a server restart is required for custom watches to take affect from my own testing - don't know if there's some documentation you need to edit around that.

That was referring to a change made via the asadmin command. If you use the monitoring console webapp the change takes effect immediately.

@jbee
Copy link
Contributor Author

jbee commented Feb 4, 2020

jenkins test please

@jbee
Copy link
Contributor Author

jbee commented Feb 4, 2020

@Pandrex247 Addressed your comments.

Changed the input+dropdown to a single input field where any number without unit is understood as count while any number with unit is understood as time.

Removed the icon in the list. Also never liked it but wanted to have some more clear indicator. Now the checkbox indicates the disabled state. Should a watch be stopped (which usually not occurs) it is also greyed out and the title will say (stopped) in addition to the name to make that clear as it could be checked but not effective when stopped.

Parameter for command are now hidden ones (one underscore).

Other small comments were addressed as well.

@Pandrex247
Copy link
Member

Changed the input+dropdown to a single input field where any number without unit is understood as count while any number with unit is understood as time.

Just checking that I understand this correctly, but what exactly is meant by Count? Is it "checks" / "updates"? So dependent on how frequently you have the data refresh polling?

@jbee
Copy link
Contributor Author

jbee commented Feb 4, 2020

It is the number of points considered. Those are always the most recent n points available. So it has some connection to the frequency of data collection (not the client polling) on the server which is once per second. For example a count of 5 means take the last 5 points and check if the condition holds for them whereas 5sec would mean take any number of points in the last 5 seconds (usually that are 6) compare those. This can make a bigger difference since not all metrics do collect once per second as the @MonitoringData annotation can be used to increase this to any number of seconds.

@jbee jbee merged commit 52a7791 into payara:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants