Skip to Content

Monitoring

Minimum viable monitoring for tidecoind checks five things: the process is running, the node is on the expected chain, block height is advancing, peer count is healthy, and disk space is not near exhaustion. Everything else builds on those signals.

This page is the operator runbook. It is not the external metrics integration guide; Prometheus, Grafana, log shipping, and exporter patterns belong in Monitoring Integration.

Quick health check

Run these checks first when a node looks unhealthy:

tidecoin-cli getblockchaininfo tidecoin-cli getnetworkinfo tidecoin-cli getmempoolinfo tidecoin-cli uptime

For a wallet-capable node, also check wallet loading separately:

tidecoin-cli listwallets

Core signals

SignalRPC or sourceHealthy state
Processsystemd, container runtime, or process supervisorRunning and restarting only intentionally.
Chaingetblockchaininfo.chainMatches the expected network.
IBDgetblockchaininfo.initialblockdownloadFalse after initial sync.
Block progressgetblockchaininfo.blocks, headers, verificationprogressBlocks advance and do not lag headers unexpectedly.
Best hashgetblockchaininfo.bestblockhashAgrees across your own nodes.
Peersgetnetworkinfo.connectionsAbove your operational threshold.
Network activegetnetworkinfo.networkactiveTrue.
Mempoolgetmempoolinfo.size, bytes, mempoolminfeeWithin expected range for the node role.
DiskFilesystem and getblockchaininfo.size_on_diskEnough free space for chain growth and logs.
Warningsgetblockchaininfo.warnings, getnetworkinfo.warningsEmpty.

Do not treat one node’s height as authoritative. Compare at least two nodes you control, or compare your node against a known-good external source during manual incident response.

Alert starting points

Tune exact thresholds to the deployment, but start with these classes:

AlertInitial condition
Node downRPC probe fails or supervisor reports the process stopped.
Stuck syncblocks does not advance while peers are connected and headers are ahead.
IBD too longinitialblockdownload remains true after the planned sync window.
Low peersconnections stays below the node-role threshold.
Network disablednetworkactive is false.
Chain mismatchchain is not the configured network.
Fleet splitTwo controlled nodes report different best block hashes past a tolerance window.
Disk pressureFree disk crosses the filesystem threshold.
WarningsRPC warnings field becomes non-empty.
Deep reorgA reorg exceeds the service’s configured tolerance.

Exchange and explorer nodes should alert on fleet disagreement, not just local process health. A node can be up and still be on the wrong side of a network split.

Logs

The primary log is debug.log in the network-specific data directory. Use logs for diagnosis, but alert from structured probes where possible.

Useful log areas:

CategoryHow to enableUse
Network-debug=netPeer discovery, connection, and message flow.
RPC-debug=rpcRPC request handling and failures.
HTTP/REST-debug=httpREST and HTTP server behavior.
ZMQ-debug=zmqZMQ notifier setup and publishing issues.
Mempool-debug=mempoolMempool admission and eviction behavior.
Validation-debug=validationBlock and transaction validation.
Reindex-debug=reindexReindex progress and failures.
Block storage-debug=blockstorageBlock file and pruning issues.
Coin DB-debug=coindbChainstate database issues.
Wallet DB-debug=walletdbWallet database issues.
Tor/I2P/proxy-debug=tor, -debug=i2p, -debug=proxyPrivacy network connectivity.

Enable narrow categories during incident response. Avoid leaving broad -debug=all logging enabled on production nodes unless disk retention is sized for it.

Disk and data growth

Monitor both filesystem free space and the node’s reported chain data size. size_on_disk does not include every operational file, wallet backup, log, or container overlay, so filesystem alerts still matter.

Nodes with txindex=1, block filters, or other indexes need more space and can take longer to recover after unclean shutdowns. Pruned nodes reduce block-file storage but are not suitable for every explorer, exchange, or indexer workload.

Incident triage

When a node alerts:

  1. Check process state and recent restarts.
  2. Check getblockchaininfo and getnetworkinfo.
  3. Compare best block hash with another controlled node.
  4. Check free disk and recent debug.log errors.
  5. Identify whether the issue is local process health, peer connectivity, disk, wallet state, or chain disagreement.
  6. Pause dependent services if the node is used for deposits, withdrawals, or explorer indexing.

Do not start with -reindex unless logs or database errors point there. Reindexing is a recovery action with real downtime.

Source of truth

TopicSource
RPC probesgetblockchaininfo, getnetworkinfo, getmempoolinfo, uptime
Log categories../tidecoin/src/logging.cpp, ../tidecoin/src/init/common.cpp
REST and ZMQ operationREST and ZMQ
External metrics integrationMonitoring Integration

See also: Troubleshooting, Reindex & Recovery, RPC Security.

Last updated on