Monitoring
Minimum viable monitoring for tidecoind checks five things: the process is
running, the node is on the expected chain, block height is advancing, peer count
is healthy, and disk space is not near exhaustion. Everything else builds on
those signals.
This page is the operator runbook. It is not the external metrics integration guide; Prometheus, Grafana, log shipping, and exporter patterns belong in Monitoring Integration.
Quick health check
Run these checks first when a node looks unhealthy:
tidecoin-cli getblockchaininfo
tidecoin-cli getnetworkinfo
tidecoin-cli getmempoolinfo
tidecoin-cli uptimeFor a wallet-capable node, also check wallet loading separately:
tidecoin-cli listwalletsCore signals
| Signal | RPC or source | Healthy state |
|---|---|---|
| Process | systemd, container runtime, or process supervisor | Running and restarting only intentionally. |
| Chain | getblockchaininfo.chain | Matches the expected network. |
| IBD | getblockchaininfo.initialblockdownload | False after initial sync. |
| Block progress | getblockchaininfo.blocks, headers, verificationprogress | Blocks advance and do not lag headers unexpectedly. |
| Best hash | getblockchaininfo.bestblockhash | Agrees across your own nodes. |
| Peers | getnetworkinfo.connections | Above your operational threshold. |
| Network active | getnetworkinfo.networkactive | True. |
| Mempool | getmempoolinfo.size, bytes, mempoolminfee | Within expected range for the node role. |
| Disk | Filesystem and getblockchaininfo.size_on_disk | Enough free space for chain growth and logs. |
| Warnings | getblockchaininfo.warnings, getnetworkinfo.warnings | Empty. |
Do not treat one node’s height as authoritative. Compare at least two nodes you control, or compare your node against a known-good external source during manual incident response.
Alert starting points
Tune exact thresholds to the deployment, but start with these classes:
| Alert | Initial condition |
|---|---|
| Node down | RPC probe fails or supervisor reports the process stopped. |
| Stuck sync | blocks does not advance while peers are connected and headers are ahead. |
| IBD too long | initialblockdownload remains true after the planned sync window. |
| Low peers | connections stays below the node-role threshold. |
| Network disabled | networkactive is false. |
| Chain mismatch | chain is not the configured network. |
| Fleet split | Two controlled nodes report different best block hashes past a tolerance window. |
| Disk pressure | Free disk crosses the filesystem threshold. |
| Warnings | RPC warnings field becomes non-empty. |
| Deep reorg | A reorg exceeds the service’s configured tolerance. |
Exchange and explorer nodes should alert on fleet disagreement, not just local process health. A node can be up and still be on the wrong side of a network split.
Logs
The primary log is debug.log in the network-specific data directory. Use logs
for diagnosis, but alert from structured probes where possible.
Useful log areas:
| Category | How to enable | Use |
|---|---|---|
| Network | -debug=net | Peer discovery, connection, and message flow. |
| RPC | -debug=rpc | RPC request handling and failures. |
| HTTP/REST | -debug=http | REST and HTTP server behavior. |
| ZMQ | -debug=zmq | ZMQ notifier setup and publishing issues. |
| Mempool | -debug=mempool | Mempool admission and eviction behavior. |
| Validation | -debug=validation | Block and transaction validation. |
| Reindex | -debug=reindex | Reindex progress and failures. |
| Block storage | -debug=blockstorage | Block file and pruning issues. |
| Coin DB | -debug=coindb | Chainstate database issues. |
| Wallet DB | -debug=walletdb | Wallet database issues. |
| Tor/I2P/proxy | -debug=tor, -debug=i2p, -debug=proxy | Privacy network connectivity. |
Enable narrow categories during incident response. Avoid leaving broad
-debug=all logging enabled on production nodes unless disk retention is sized
for it.
Disk and data growth
Monitor both filesystem free space and the node’s reported chain data size.
size_on_disk does not include every operational file, wallet backup, log, or
container overlay, so filesystem alerts still matter.
Nodes with txindex=1, block filters, or other indexes need more space and can
take longer to recover after unclean shutdowns. Pruned nodes reduce block-file
storage but are not suitable for every explorer, exchange, or indexer workload.
Incident triage
When a node alerts:
- Check process state and recent restarts.
- Check
getblockchaininfoandgetnetworkinfo. - Compare best block hash with another controlled node.
- Check free disk and recent
debug.logerrors. - Identify whether the issue is local process health, peer connectivity, disk, wallet state, or chain disagreement.
- Pause dependent services if the node is used for deposits, withdrawals, or explorer indexing.
Do not start with -reindex unless logs or database errors point there.
Reindexing is a recovery action with real downtime.
Source of truth
| Topic | Source |
|---|---|
| RPC probes | getblockchaininfo, getnetworkinfo, getmempoolinfo, uptime |
| Log categories | ../tidecoin/src/logging.cpp, ../tidecoin/src/init/common.cpp |
| REST and ZMQ operation | REST and ZMQ |
| External metrics integration | Monitoring Integration |
See also: Troubleshooting, Reindex & Recovery, RPC Security.