Monitoring

Minimum viable monitoring for tidecoind checks five things: the process is running, the node is on the expected chain, block height is advancing, peer count is healthy, and disk space is not near exhaustion. Everything else builds on those signals.

This page is the operator runbook. It is not the external metrics integration guide; Prometheus, Grafana, log shipping, and exporter patterns belong in Monitoring Integration.

Quick health check

Run these checks first when a node looks unhealthy:


tidecoin-cli getblockchaininfo
tidecoin-cli getnetworkinfo
tidecoin-cli getmempoolinfo
tidecoin-cli uptime

For a wallet-capable node, also check wallet loading separately:


tidecoin-cli listwallets

Core signals

Signal	RPC or source	Healthy state
Process	systemd, container runtime, or process supervisor	Running and restarting only intentionally.
Chain	`getblockchaininfo.chain`	Matches the expected network.
IBD	`getblockchaininfo.initialblockdownload`	False after initial sync.
Block progress	`getblockchaininfo.blocks`, `headers`, `verificationprogress`	Blocks advance and do not lag headers unexpectedly.
Best hash	`getblockchaininfo.bestblockhash`	Agrees across your own nodes.
Peers	`getnetworkinfo.connections`	Above your operational threshold.
Network active	`getnetworkinfo.networkactive`	True.
Mempool	`getmempoolinfo.size`, `bytes`, `mempoolminfee`	Within expected range for the node role.
Disk	Filesystem and `getblockchaininfo.size_on_disk`	Enough free space for chain growth and logs.
Warnings	`getblockchaininfo.warnings`, `getnetworkinfo.warnings`	Empty.

Do not treat one node’s height as authoritative. Compare at least two nodes you control, or compare your node against a known-good external source during manual incident response.

Alert starting points

Tune exact thresholds to the deployment, but start with these classes:

Alert	Initial condition
Node down	RPC probe fails or supervisor reports the process stopped.
Stuck sync	`blocks` does not advance while peers are connected and headers are ahead.
IBD too long	`initialblockdownload` remains true after the planned sync window.
Low peers	`connections` stays below the node-role threshold.
Network disabled	`networkactive` is false.
Chain mismatch	`chain` is not the configured network.
Fleet split	Two controlled nodes report different best block hashes past a tolerance window.
Disk pressure	Free disk crosses the filesystem threshold.
Warnings	RPC `warnings` field becomes non-empty.
Deep reorg	A reorg exceeds the service’s configured tolerance.

Exchange and explorer nodes should alert on fleet disagreement, not just local process health. A node can be up and still be on the wrong side of a network split.

Logs

The primary log is debug.log in the network-specific data directory. Use logs for diagnosis, but alert from structured probes where possible.

Useful log areas:

Category	How to enable	Use
Network	`-debug=net`	Peer discovery, connection, and message flow.
RPC	`-debug=rpc`	RPC request handling and failures.
HTTP/REST	`-debug=http`	REST and HTTP server behavior.
ZMQ	`-debug=zmq`	ZMQ notifier setup and publishing issues.
Mempool	`-debug=mempool`	Mempool admission and eviction behavior.
Validation	`-debug=validation`	Block and transaction validation.
Reindex	`-debug=reindex`	Reindex progress and failures.
Block storage	`-debug=blockstorage`	Block file and pruning issues.
Coin DB	`-debug=coindb`	Chainstate database issues.
Wallet DB	`-debug=walletdb`	Wallet database issues.
Tor/I2P/proxy	`-debug=tor`, `-debug=i2p`, `-debug=proxy`	Privacy network connectivity.

Enable narrow categories during incident response. Avoid leaving broad -debug=all logging enabled on production nodes unless disk retention is sized for it.

Disk and data growth

Monitor both filesystem free space and the node’s reported chain data size. size_on_disk does not include every operational file, wallet backup, log, or container overlay, so filesystem alerts still matter.

Nodes with txindex=1, block filters, or other indexes need more space and can take longer to recover after unclean shutdowns. Pruned nodes reduce block-file storage but are not suitable for every explorer, exchange, or indexer workload.

Incident triage

When a node alerts:

Check process state and recent restarts.
Check getblockchaininfo and getnetworkinfo.
Compare best block hash with another controlled node.
Check free disk and recent debug.log errors.
Identify whether the issue is local process health, peer connectivity, disk, wallet state, or chain disagreement.
Pause dependent services if the node is used for deposits, withdrawals, or explorer indexing.

Do not start with -reindex unless logs or database errors point there. Reindexing is a recovery action with real downtime.

Source of truth

Topic	Source
RPC probes	`getblockchaininfo`, `getnetworkinfo`, `getmempoolinfo`, `uptime`
Log categories	`../tidecoin/src/logging.cpp`, `../tidecoin/src/init/common.cpp`
REST and ZMQ operation	REST and ZMQ
External metrics integration	Monitoring Integration