This RFC is to introduce a new metrics source to consume metrics from Java Virtual Machines (JVM) using the JMX protocol. The high level plan is to implement one source that collects metrics from JVM servers.
Background reading on JVM/JMX monitoring:
This RFC will cover:
- A new source for JVM-based metrics using JMX.
Users want to collect, transform, and forward metrics to better observe how their JVM-based applications are performing.
Build a single source called jmx_metrics
(name to be confirmed) to collect JVM metrics.
The recommended implementation is to use a/the Rust JMX client to connect to a JVM server by an address specified in configuration.
This will require the end user to configure JMX on their JVM instances bound to an external port that can be queried by Vector.
Both the Telegraf and Prometheus JMX exporter provide a Java-based agent. Prometheus uses a self-maintained agent and Telegraf uses the Jolokia agent through a collector.
The Prometheus JMX exporter states:
This exporter is intended to be run as a Java Agent, exposing a HTTP server and serving metrics of the local JVM. It can be also run as an independent HTTP server and scrape remote JMX targets, but this has various disadvantages, such as being harder to configure and being unable to expose process metrics (e.g., memory and CPU usage). Running the exporter as a Java Agent is thus strongly encouraged.
This may require some testing as we proceed.
The query will return these metrics by parsing the query results and converting them into metrics:
jmx_up
Used as an uptime metric (0/1)jmx_config_reload_success_total
(counter)jmx_process_cpu_seconds_total
(counter)jmx_process_start_time_seconds
(gauge)jmx_process_open_fds
(gauge)jmx_process_max_fds
(gauge)jmx_jvm_threads_current
(gauge)jmx_jvm_threads_daemon
(gauge)jmx_jvm_threads_peak
(gauge)jmx_jvm_threads_started_total
(counter)jmx_jvm_threads_deadlocked
(gauge)jmx_jvm_threads_deadlocked_monitor
(gauge)jmx_jvm_threads_state
tagged withstate
(gauge)jmx_config_reload_failure_total
(counter)jmx_jvm_buffer_pool_used_bytes
tagged withpool
(gauge)jmx_jvm_buffer_pool_capacity_bytes
tagged withpool
(gauge)jmx_jvm_buffer_pool_used_buffers
tagged withpool
(gauge)jmx_jvm_classes_loaded
(gauge)jmx_jvm_classes_loaded_total
(counter)jmx_jvm_classes_unloaded_total
(counter)jmx_java_lang_MemoryPool_UsageThresholdSupported
tagged withname
(gauge)jmx_java_lang_Threading_ThreadContentionMonitoringEnabled
(gauge)jmx_java_lang_OperatingSystem_CommittedVirtualMemorySize
(gauge)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_used
tagged withname
,key
(counter?)jmx_java_lang_Threading_ThreadContentionMonitoringSupported
(gauge)jmx_jvm_gc_collection_seconds
tagged withgc
(summary)jmx_jvm_memory_bytes_used
tagged witharea
(gauge)jmx_jvm_memory_bytes_committed
tagged witharea
(gauge)jmx_jvm_memory_bytes_max
tagged witharea
(gauge)jmx_jvm_memory_bytes_init
tagged witharea
(gauge)jmx_jvm_memory_pool_bytes_used
tagged withpool
(gauge)jmx_jvm_memory_pool_bytes_committed
tagged withpool
(gauge)jmx_jvm_memory_pool_bytes_max
tagged withpool
(gauge)jmx_jvm_memory_pool_bytes_init
tagged withpool
(gauge)jmx_jvm_info
tagged with version, vendor, runtime (gauge)jmx_jvm_memory_pool_allocated_bytes_total
tagged withpool
(counter)jmx_java_lang_Memory_HeapMemoryUsage_committed
maybe gauge? TBDjmx_java_lang_OperatingSystem_TotalSwapSpaceSize
maybe gauge? TBDjmx_java_lang_MemoryPool_CollectionUsage_max
tagged withname
(untyped)jmx_java_lang_Runtime_StartTime
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_endTime
tagged withname
(untyped)jmx_java_lang_Memory_HeapMemoryUsage_max
(untyped)jmx_java_lang_MemoryPool_UsageThreshold
tagged withname
(untyped)jmx_java_lang_MemoryPool_CollectionUsageThresholdCount
tagged withname
(untyped)jmx_java_lang_Memory_NonHeapMemoryUsage_used
(untyped)jmx_java_lang_Threading_PeakThreadCount
(untyped)jmx_java_lang_MemoryPool_PeakUsage_used
tagged withname
(untyped)jmx_java_lang_ClassLoading_TotalLoadedClassCount
(untyped)jmx_java_lang_OperatingSystem_MaxFileDescriptorCount
(untyped)jmx_java_lang_ClassLoading_Verbose
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_id
tagged withname
(untyped)jmx_java_lang_Threading_CurrentThreadUserTime
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_committed
tagged withname
andkey
(untyped)jmx_java_lang_Threading_ThreadCount
(untyped)jmx_java_lang_MemoryPool_PeakUsage_committed
tagged withname
(untyped)jmx_java_lang_Memory_ObjectPendingFinalizationCount
(untyped)jmx_java_lang_MemoryPool_Usage_used
tagged withname
(untyped)jmx_java_lang_GarbageCollector_CollectionCount
tagged withname
(untyped)jmx_java_lang_Threading_SynchronizerUsageSupported
(untyped)jmx_java_lang_Runtime_BootClassPathSupported
(untyped)jmx_java_nio_BufferPool_Count
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_GcThreadCount
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_committed
tagged withname
andkey
(untyped)jmx_java_lang_Threading_CurrentThreadCpuTimeSupported
(untyped)jmx_java_lang_ClassLoading_LoadedClassCount
(untyped)jmx_java_lang_MemoryPool_CollectionUsage_init
tagged withname
(untyped)jmx_java_lang_MemoryPool_PeakUsage_max
tagged withname
(untyped)jmx_java_lang_MemoryPool_Usage_max
tagged withname
(untyped)jmx_java_lang_GarbageCollector_Valid
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_used
tagged withname
andkey
(untyped)jmx_java_lang_Threading_ThreadAllocatedMemoryEnabled
(untyped)jmx_java_lang_MemoryManager_Valid
tagged withname
(untyped)jmx_java_lang_MemoryPool_Usage_init
tagged withname
(untyped)jmx_java_lang_OperatingSystem_ProcessCpuLoad
(untyped)jmx_java_lang_MemoryPool_CollectionUsage_committed
tagged withname
(untyped)jmx_java_lang_OperatingSystem_TotalPhysicalMemorySize
(untyped)jmx_java_lang_Memory_NonHeapMemoryUsage_committed
(untyped)jmx_java_lang_Compilation_TotalCompilationTime
(untyped)jmx_java_lang_Memory_Verbose
(untyped)jmx_java_lang_MemoryPool_Valid
tagged withname
(untyped)jmx_java_lang_OperatingSystem_FreeSwapSpaceSize
(untyped)jmx_java_lang_MemoryPool_UsageThresholdExceeded
tagged withname
(untyped)jmx_java_lang_Threading_CurrentThreadCpuTime
(untyped)jmx_java_lang_MemoryPool_CollectionUsageThreshold
tagged withname
(untyped)jmx_java_lang_GarbageCollector_CollectionTime
tagged withname
(untyped)jmx_java_lang_Compilation_CompilationTimeMonitoringSupported
(untyped)jmx_java_lang_MemoryPool_Usage_committed
tagged withname
(untyped)jmx_java_lang_Memory_NonHeapMemoryUsage_init
(untyped)jmx_java_lang_MemoryPool_PeakUsage_init
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_startTime
tagged withname
(untyped)jmx_java_lang_OperatingSystem_AvailableProcessors
(untyped)jmx_java_lang_MemoryPool_CollectionUsageThresholdSupported
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_max
tagged withname
andkey
(untyped)jmx_java_lang_ClassLoading_UnloadedClassCount
(untyped)jmx_java_nio_BufferPool_MemoryUsed
tagged withname
(untyped)jmx_java_nio_BufferPool_TotalCapacity
tagged withname
(untyped)jmx_java_lang_Memory_HeapMemoryUsage_used
(untyped)jmx_java_lang_MemoryPool_CollectionUsage_used
tagged withname
(untyped)jmx_java_lang_Memory_HeapMemoryUsage_init
(untyped)jmx_java_lang_OperatingSystem_SystemCpuLoad
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_init
tagged withname
andkeys
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init
tagged withname
andkey
(untyped)jmx_java_lang_Threading_ThreadAllocatedMemorySupported
(untyped)jmx_java_lang_Memory_NonHeapMemoryUsage_max
(untyped)jmx_java_lang_Threading_DaemonThreadCount
(untyped)jmx_java_lang_Threading_ThreadCpuTimeSupported
(untyped)jmx_java_lang_OperatingSystem_SystemLoadAverage
(untyped)jmx_java_lang_Threading_TotalStartedThreadCount
(untyped)jmx_java_lang_OperatingSystem_ProcessCpuTime
(untyped)jmx_java_lang_OperatingSystem_FreePhysicalMemorySize
(untyped)jmx_java_lang_Runtime_Uptime
(untyped)jmx_java_lang_MemoryPool_CollectionUsageThresholdExceeded
tagged withname
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_duration
tagged withname
(untyped)jmx_java_lang_Threading_ObjectMonitorUsageSupported
(untyped)jmx_java_lang_GarbageCollector_LastGcInfo_memoryUsageAfterGc_max
tagged withname
andkey
(untyped)jmx_java_lang_MemoryPool_UsageThresholdCount
tagged withname
(untyped)jmx_java_lang_Threading_ThreadCpuTimeEnabled
(untyped)jmx_java_lang_OperatingSystem_OpenFileDescriptorCount
(untyped)
The type of metric for any metric marked untyped
is unclear but likely gauges. We'll need to do deeper research to determine:
- Whether we want to keep them.
- What type they are if we retain them.
Naming of metrics is determined via:
jmx _ metric_name
This is in line with JMX's internal naming.
All metrics will be tagged with the endpoint
and host
.
This list is the default JVM metrics available via JMX. Individual applications (Kafka, Cassandra, Tomcat, etc) will expose their own MBeans and metrics and we can walk each of them and parse their output to construct and return the metrics as the Prometheus jmx_exporter does.
The Prometheus exporter contains accept/deny patterns for specific MBean objects and a rule-based rewrite engine to match and construct metrics from specific objects. Both of these seems like reasonable future work items, albeit we may want to look at whether a transform might be more effective?
The following additional source configuration will be added:
[sources.my_source_id]
type = "jmx_metrics" # required
endpoint = "service:jmx:rmi:///jndi/rmi://127.0.0.1:1234/jmxrmi" # required - address of the JMX webserver to scrape.
username = "user" # optional - username for any JMX authentication.
password = "password" # optional - password for any JMX authentication.
scrape_interval_secs = 15 # optional, default, seconds
namespace = "jmx" # optional, default is "jmx", namespace to put metrics under
- We'd also add a guide for doing this with authentication.
The JVM is a popular platform for running applications. Additionally, it is the basis for running a number of other infrastructure tools, including Kafka and Cassandra. User frequently want to understand it's performance.
Additionally, as part of Vector's vision to be the "one tool" for ingesting and shipping observability data, it makes sense to add as many sources as possible to reduce the likelihood that a user will not be able to ingest metrics from their tools.
- https://github.com/prometheus/jmx_exporter
- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/jolokia
- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/jolokia2
- https://collectd.org/wiki/index.php/Plugin:GenericJMX
- https://collectd.org/documentation/manpages/collectd-java.5.shtml#genericjmx_plugin
- https://github.com/ScalaConsultants/panopticon-tui
- https://github.com/replicante-io/agents/tree/c62821b45b1c44bc3e5f2bfec6ebfe4454c694f1/agents/kafka
- Additional maintenance and integration testing burden of a new source
-
Having users run telegraf or Prom node exporter and using Vector's Prometheus source to scrape it. We could not add the source directly to Vector and instead instruct users to run Prometheus'
jmx_exporter
and point Vector at the resulting data. -
Or we could use something like jmxtrans and add a Vector OutputWriter.
I decided against both of these this as they would be in conflict with one of the listed principles of Vector:
One Tool. All Data. - One simple tool gets your logs, metrics, and traces (coming soon) from A to B.
If users are already running Prometheus though, they could opt for the Prometheus path.
- Java agent or not.
Incremental steps that execute this change. Generally this is in the form of:
- Submit a PR with the initial source implementation
- Accept/Deny list for MBean object patterns.
- Rule-based engine for matching and constructing specific metrics.