Skip to content

Commit

Permalink
Merge pull request #5997 from IntersectMBO/mkarg/tracer-prometheus
Browse files Browse the repository at this point in the history
cardano-tracer: OpenMetrics compliance for Prometheus; fix `forHuman` output in journald
  • Loading branch information
disassembler authored Sep 27, 2024
2 parents ea98b81 + fbe6675 commit a3096c3
Show file tree
Hide file tree
Showing 34 changed files with 545 additions and 217 deletions.
4 changes: 2 additions & 2 deletions cardano-node/cardano-node.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,8 @@ library
, strict-sop-core
, strict-stm
, time
, trace-dispatcher ^>= 2.6.0
, trace-forward ^>= 2.2.6
, trace-dispatcher ^>= 2.7.0
, trace-forward ^>= 2.2.7
, trace-resources ^>= 0.2.2
, tracer-transformers
, transformers
Expand Down
35 changes: 24 additions & 11 deletions cardano-node/src/Cardano/Node/Tracing/Documentation.hs
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ import Ouroboros.Network.TxSubmission.Inbound (TraceTxSubmissionInboun
import Ouroboros.Network.TxSubmission.Outbound (TraceTxSubmissionOutbound)

import Control.Exception (SomeException)
import Control.Monad (forM_)
import Data.Aeson.Types (ToJSON)
import Data.Proxy (Proxy (..))
import qualified Data.Text.IO as T
Expand All @@ -110,6 +111,7 @@ data TraceDocumentationCmd
= TraceDocumentationCmd
{ tdcConfigFile :: FilePath
, tdcOutput :: FilePath
, tdMetricsHelp :: Maybe FilePath
}

parseTraceDocumentationCmd :: Opt.Parser TraceDocumentationCmd
Expand All @@ -124,14 +126,20 @@ parseTraceDocumentationCmd =
(TraceDocumentationCmd
<$> Opt.strOption
( Opt.long "config"
<> Opt.metavar "NODE-CONFIGURATION"
<> Opt.metavar "FILE"
<> Opt.help "Configuration file for the cardano-node"
)
<*> Opt.strOption
( Opt.long "output-file"
<> Opt.metavar "FILE"
<> Opt.help "Generated documentation output file"
<> Opt.help "Generated documentation output file (Markdown)"
)
<*> Opt.optional (Opt.strOption
( Opt.long "output-metric-help"
<> Opt.metavar "FILE"
<> Opt.help "Metrics helptext file for cardano-tracer (JSON)"
)
)
Opt.<**> Opt.helper)
$ mconcat [ Opt.progDesc "Generate the trace documentation" ]
]
Expand All @@ -147,18 +155,19 @@ runTraceDocumentationCmd
:: TraceDocumentationCmd
-> IO ()
runTraceDocumentationCmd TraceDocumentationCmd{..} = do
docTracers tdcConfigFile tdcOutput
docTracers tdcConfigFile tdcOutput tdMetricsHelp

-- Have to repeat the construction of the tracers here,
-- as the tracers are behind old tracer interface after construction in mkDispatchTracers.
-- Can be changed, when old tracers have gone
docTracers ::
FilePath
-> FilePath
-> Maybe FilePath
-> IO ()
docTracers configFileName outputFileName = do
docTracers configFileName outputFileName mbMetricsHelpFilename = do
(bl, trConfig) <- docTracersFirstPhase (Just configFileName)
docTracersSecondPhase outputFileName trConfig bl
docTracersSecondPhase outputFileName mbMetricsHelpFilename trConfig bl


-- Have to repeat the construction of the tracers here,
Expand Down Expand Up @@ -761,12 +770,16 @@ docTracersFirstPhase condConfigFileName = do

docTracersSecondPhase ::
FilePath
-> Maybe FilePath
-> TraceConfig
-> DocTracer
-> IO ()
docTracersSecondPhase outputFileName trConfig bl = do
content <- docuResultsToText bl trConfig
handle <- openFile outputFileName WriteMode
hSetEncoding handle utf8
T.hPutStr handle content
hClose handle
docTracersSecondPhase outputFileName mbMetricsHelpFilename trConfig bl = do
docuResultsToText bl trConfig
>>= doWrite outputFileName
forM_ mbMetricsHelpFilename $ \f ->
doWrite f (docuResultsToMetricsHelptext bl)
where
doWrite outfile text =
withFile outfile WriteMode $ \handle ->
hSetEncoding handle utf8 >> T.hPutStr handle text
9 changes: 8 additions & 1 deletion cardano-tracer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# ChangeLog

## 0.3 (September 20, 2024)
## 0.3 (September 26, 2024)

* Abondon `snap` webserver in favour of `wai`/`warp` for Prometheus and EKG Monitoring.
* Add dynamic routing to EKG stores of all connected nodes.
* Derive URL compliant routes from connected node names (instead of plain node names).
* Remove the requirement of two distinct ports for the EKG backend (changing `hasEKG` config type).
* Improved OpenMetrics compliance of Prometheus exposition; also addresses [issue#5140][i5140].
* Prometheus help annotations can be provided via the new optional config value `metricsHelp`.
* For optional RTView component only: Disable SSL/https connections. Force `snap-server`
dependency to build with `-flag -openssl`.
* Add JSON responses when listing connected nodes for both Prometheus and EKG Monitoring.
* Fix: actually send `forHuman` rendering output to journald when specified.
* Add consistency check for redundant port values in the config.

## 0.2.4 (August 13, 2024)
Expand Down Expand Up @@ -48,3 +51,7 @@
## 0.1.0

Initial version.



[i5140]: https://github.com/IntersectMBO/cardano-node/issues/5140
2 changes: 2 additions & 0 deletions cardano-tracer/bench/cardano-tracer-bench.hs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ main = do
, teReforwardTraceObjects = \_-> pure ()
, teRegistry = handleRegistry
, teStateDir = Nothing
, teMetricsHelp = []
}

tracerEnvRTView :: TracerEnvRTView
Expand Down Expand Up @@ -148,6 +149,7 @@ main = do
, rotation = Nothing
, verbosity = Nothing
, metricsComp = Nothing
, metricsHelp = Nothing
, hasForwarding = Nothing
, resourceFreq = Nothing
}
Expand Down
137 changes: 137 additions & 0 deletions cardano-tracer/configuration/metrics_help.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
{
"Forge.DelegMapSize": "Delegation map size",
"Forge.UtxoSize": "UTxO set size",
"Mem.resident": "Kernel-reported RSS (resident set size)",
"RTS.alloc": "RTS-reported bytes allocated",
"RTS.gcHeapBytes": "RTS-reported heap bytes",
"RTS.gcLiveBytes": "RTS-reported live bytes",
"RTS.gcMajorNum": "Major GCs",
"RTS.gcMinorNum": "Minor GCs",
"RTS.gcticks": "RTS-reported CPU ticks spent on GC",
"RTS.mutticks": "RTS-reported CPU ticks spent on mutator",
"RTS.threads": "RTS green thread count",
"Stat.cputicks": "Kernel-reported CPU ticks (1/100th of a second), since process start",
"Stat.fsRd": "FS bytes read",
"Stat.fsWr": "FS bytes written",
"Stat.netRd": "IP packet bytes read",
"Stat.netWr": "IP packet bytes written",
"SuppressedMessages...": "",
"aboutToLeadSlotLast": "",
"adoptedOwnBlockSlotLast": "",
"adoptionThreadDied": "",
"blockContext": "",
"blockFromFuture": "",
"blockNum": "Number of blocks in this chain fragment.",
"blockReplayProgress": "Progress in percent",
"blockfetchclient.blockdelay": "",
"blockfetchclient.blockdelay.cdfFive": "",
"blockfetchclient.blockdelay.cdfOne": "",
"blockfetchclient.blockdelay.cdfThree": "",
"blockfetchclient.blocksize": "",
"blockfetchclient.lateblocks": "",
"blocksForged": "How many blocks did this node forge?",
"cardano_build_info": "Cardano node build info",
"cardano_version_major": "Cardano node version information",
"cardano_version_minor": "Cardano node version information",
"cardano_version_patch": "Cardano node version information",
"connectedPeers": "Number of connected peers",
"connectionManager.duplexConns": "",
"connectionManager.fullDuplexConns": "",
"connectionManager.inboundConns": "",
"connectionManager.outboundConns": "",
"connectionManager.unidirectionalConns": "",
"couldNotForgeSlotLast": "",
"currentKESPeriod": "",
"density": "The actual number of blocks created over the maximum expected number of blocks that could be created over the span of the last @k@ blocks.",
"epoch": "In which epoch is the tip of the current chain.",
"forgedInvalidSlotLast": "",
"forgedSlotLast": "",
"forging_enabled": "Can this node forge blocks? (Is it provided with block forging credentials) 0 = no, 1 = yes",
"haskell_compiler_major": "Cardano compiler version information",
"haskell_compiler_minor": "Cardano compiler version information",
"headersServed": "A counter triggered on any header event",
"headersServed.falling": "A counter triggered only on header event with falling edge",
"inboundGovernor.Cold": "",
"inboundGovernor.Hot": "",
"inboundGovernor.Idle": "",
"inboundGovernor.Warm": "",
"ledgerState": "",
"ledgerView": "",
"localInboundGovernor.cold": "",
"localInboundGovernor.hot": "",
"localInboundGovernor.idle": "",
"localInboundGovernor.warm": "",
"mempoolBytes": "Byte size of the mempool",
"nodeCannotForge": "How many times was this node unable to forge [a block]?",
"nodeIsLeader": "How many times was this node slot leader?",
"nodeNotLeader": "",
"notAdoptedSlotLast": "",
"operationalCertificateExpiryKESPeriod": "",
"operationalCertificateStartKESPeriod": "",
"peerSelection.ActiveBigLedgerPeers": "Number of active big ledger peers",
"peerSelection.ActiveBigLedgerPeersDemotions": "Number of active big ledger peers demotions",
"peerSelection.ActiveBootstrapPeers": "Number of active bootstrap peers",
"peerSelection.ActiveBootstrapPeersDemotions": "Number of active bootstrap peers demotions",
"peerSelection.ActiveLocalRootPeers": "Number of active local root peers",
"peerSelection.ActiveLocalRootPeersDemotions": "Number of active local root peers demotions",
"peerSelection.ActiveNonRootPeers": "Number of active non root peers",
"peerSelection.ActiveNonRootPeersDemotions": "Number of active non root peers demotions",
"peerSelection.ActivePeers": "Number of active peers",
"peerSelection.ActivePeersDemotions": "Number of active peers demotions",
"peerSelection.Cold": "Number of cold peers",
"peerSelection.ColdBigLedgerPeers": "Number of cold big ledger peers",
"peerSelection.ColdBigLedgerPeersPromotions": "Number of cold big ledger peers promotions",
"peerSelection.ColdBootstrapPeersPromotions": "Number of cold bootstrap peers promotions",
"peerSelection.ColdNonRootPeersPromotions": "Number of cold non root peers promotions",
"peerSelection.ColdPeersPromotions": "Number of cold peers promotions",
"peerSelection.EstablishedBigLedgerPeers": "Number of established big ledger peers",
"peerSelection.EstablishedBootstrapPeers": "Number of established bootstrap peers",
"peerSelection.EstablishedLocalRootPeers": "Number of established local root peers",
"peerSelection.EstablishedNonRootPeers": "Number of established non root peers",
"peerSelection.EstablishedPeers": "Number of established peers",
"peerSelection.Hot": "Number of hot peers",
"peerSelection.HotBigLedgerPeers": "Number of hot big ledger peers",
"peerSelection.KnownBigLedgerPeers": "Number of known big ledger peers",
"peerSelection.KnownBootstrapPeers": "Number of known bootstrap peers",
"peerSelection.KnownLocalRootPeers": "Number of known local root peers",
"peerSelection.KnownNonRootPeers": "Number of known non root peers",
"peerSelection.KnownPeers": "Number of known peers",
"peerSelection.LocalRoots": "Numbers of warm & hot local roots",
"peerSelection.RootPeers": "Number of root peers",
"peerSelection.Warm": "Number of warm peers",
"peerSelection.WarmBigLedgerPeers": "Number of warm big ledger peers",
"peerSelection.WarmBigLedgerPeersDemotions": "Number of warm big ledger peers demotions",
"peerSelection.WarmBigLedgerPeersPromotions": "Number of warm big ledger peers promotions",
"peerSelection.WarmBootstrapPeersDemotions": "Number of warm bootstrap peers demotions",
"peerSelection.WarmBootstrapPeersPromotions": "Number of warm bootstrap peers promotions",
"peerSelection.WarmLocalRootPeersPromotions": "Number of warm local root peers promotions",
"peerSelection.WarmNonRootPeersDemotions": "Number of warm non root peers demotions",
"peerSelection.WarmNonRootPeersPromotions": "Number of warm non root peers promotions",
"peerSelection.WarmPeersDemotions": "Number of warm peers demotions",
"peerSelection.WarmPeersPromotions": "Number of warm peers promotions",
"peerSelection.churn.DecreasedActiveBigLedgerPeers": "number of decreased active big ledger peers",
"peerSelection.churn.DecreasedActivePeers": "number of decreased active peers",
"peerSelection.churn.DecreasedEstablishedBigLedgerPeers": "number of decreased established big ledger peers",
"peerSelection.churn.DecreasedEstablishedPeers": "number of decreased established peers",
"peerSelection.churn.DecreasedKnownBigLedgerPeers": "number of decreased known big ledger peers",
"peerSelection.churn.DecreasedKnownPeers": "number of decreased known peers",
"peerSelection.churn.IncreasedActiveBigLedgerPeers": "number of increased active big ledger peers",
"peerSelection.churn.IncreasedActivePeers": "number of increased active peers",
"peerSelection.churn.IncreasedEstablishedBigLedgerPeers": "number of increased established big ledger peers",
"peerSelection.churn.IncreasedEstablishedPeers": "number of increased established peers",
"peerSelection.churn.IncreasedKnownBigLedgerPeers": "number of increased known big ledger peers",
"peerSelection.churn.IncreasedKnownPeers": "number of increased known peers",
"peersFromNodeKernel": "",
"remainingKESPeriods": "",
"served.block": "",
"slotInEpoch": "Relative slot number of the tip of the current chain within the epoch..",
"slotIsImmutable": "",
"slotNum": "Number of slots in this chain fragment.",
"slotsMissed": "How many slots did this node miss?",
"submissions.accepted": "",
"submissions.rejected": "",
"submissions.submitted": "",
"systemStartTime": "The UTC time this node was started.",
"txsInMempool": "Transactions in mempool",
"txsProcessedNum": ""
}
82 changes: 58 additions & 24 deletions cardano-tracer/docs/cardano-tracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,24 @@

# Contents

1. [Introduction](#Introduction)
1. [Motivation](#Motivation)
3. [Overview](#Overview)
2. [Build and run](#Build-and-run)
3. [Configuration](#Configuration)
1. [Distributed Scenario](#Distributed-scenario)
2. [Local Scenario](#Local-scenario)
3. [Network Magic](#Network-magic)
4. [Requests](#Requests)
5. [Logging](#Logging)
6. [Logs Rotation](#Logs-rotation)
7. [Prometheus](#Prometheus)
8. [EKG Monitoring](#EKG-monitoring)
9. [Verbosity](#Verbosity)
10. [RTView](#RTView)
- [Cardano Tracer](#cardano-tracer)
- [Contents](#contents)
- [Introduction](#introduction)
- [Motivation](#motivation)
- [Overview](#overview)
- [Build and run](#build-and-run)
- [Configuration](#configuration)
- [Distributed Scenario](#distributed-scenario)
- [Important](#important)
- [Local Scenario](#local-scenario)
- [Network Magic](#network-magic)
- [Requests](#requests)
- [Logging](#logging)
- [Logs Rotation](#logs-rotation)
- [Prometheus](#prometheus)
- [EKG Monitoring](#ekg-monitoring)
- [Verbosity](#verbosity)
- [RTView](#rtview)

# Introduction

Expand Down Expand Up @@ -390,20 +393,51 @@ $ curl --silent -H "Accept: application/json" '127.0.0.1:3200' | jq '.'
}
```

The Promethus output is a map from Prometheus metric to value:
Prometheus uses the text-based exposition format, complete with `# TYPE` and `# HELP` annotations. The latter ones have to be provided by the `metricsHelp` config value (see below).

The output should be [OpenMetrics](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#text-format) compliant. Example snippet:

```
$ curl '127.0.0.1:3200/12700130004'
blockNum_int 35
rts_gc_init_cpu_ms 5
rts_gc_par_tot_bytes_copied 0
served_block_counter 31
submissions_accepted_counter 2771
density_real 5.7692307692307696e-2
blocksForged_int 6
# TYPE Mem_resident_int gauge
# HELP Mem_resident_int Kernel-reported RSS (resident set size)
Mem_resident_int 103792640
# TYPE rts_gc_max_bytes_used gauge
rts_gc_max_bytes_used 5811512
# TYPE rts_gc_gc_cpu_ms counter
rts_gc_gc_cpu_ms 50
# TYPE RTS_gcMajorNum_int gauge
# HELP RTS_gcMajorNum_int Major GCs
RTS_gcMajorNum_int 4
# TYPE rts_gc_par_avg_bytes_copied gauge
rts_gc_par_avg_bytes_copied 0
# TYPE rts_gc_num_bytes_usage_samples counter
rts_gc_num_bytes_usage_samples 4
# TYPE remainingKESPeriods_int gauge
remainingKESPeriods_int 62
# TYPE rts_gc_bytes_copied counter
rts_gc_bytes_copied 17114384
# TYPE nodeCannotForge_int gauge
# HELP nodeCannotForge_int How many times was this node unable to forge [a block]?
# EOF
```

Passing metric help annotations to the service can be done in the config file, either as a key-value map from metric name to help text, or as a seperate JSON file containing such a map.
The system's internal metric names have to be used as keys (cf. [metrics documentation](https://github.com/input-output-hk/cardano-node-wiki/blob/main/docs/new-tracing/tracers_doc_generated.md#metrics)).
```
"metricsHelp": "path/to/key-value-map.json"
```
or
```
"metricsHelp": {
"Mem.resident": "Kernel-reported RSS (resident set size)",
"RTS.gcMajorNum": "Major GCs",
"nodeCannotForge": "How many times was this node unable to forge [a block]?"
}
```



## EKG Monitoring

At top-level route `/` EKG gives a list of connected nodes.
Expand Down
Loading

0 comments on commit a3096c3

Please sign in to comment.