Infrastructure and node monitoring
When you use Torq as middleware then you can get very good insights on how well your Lightning node is functioning by
exposing prometheus and open telemetry.
To enable prometheus you need to provide extra configuration.
torq.prometheus.path = "localhost:7070"
To enable open-telemetry there are several options required, here is an example for Jaeger-HTTP:
otel.exporter.type = "otlpHttp"
otel.exporter.endpoint = "http://localhost:4318"
sampler.fraction = 0.1
Jaeger-gRPC:
otel.exporter.type = "otlpGrpc"
otel.exporter.endpoint = "http://localhost:4317"
sampler.fraction = 0.1
Note: Make sure otlp is enabled in Jaeger with --collector.otlp.enabled=true
Prometheus is for real-time statistics.
Open-telemetry (Jaeger) is for tracing, so more for backtracking how things behaved at a certain point in time. How many
executions and how long those took to complete.
Grafana is for creating nice insightfull graphs
Prometheus node exporter is for OS metrics like memory, CPU, discspace, ...
Below we will provide an example configuration of how you could monitor you entire stack including OS. The example setup
is using podman with host networking. Whenever using host network make sure you understand: you need a firewall!
File ~/prometheus.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'monitoring-torq-stack'
scrape_configs:
- job_name: 'prometheus-torq'
metrics_path: '/metrics'
static_configs:
- targets:
- 'localhost:7070'
- job_name: 'prometheus-jaeger'
metrics_path: '/metrics'
static_configs:
- targets:
- 'localhost:14269'
- job_name: 'prometheus-node'
static_configs:
- targets: ['localhost:9100']
rule_files:
- '/alert.rules'
File ~/alert.rules
groups:
- name: generic.service_down
rules:
- alert: service_down
expr: up == 0
for: 30s
annotations:
summary: Instance is down
File ~/grafana.ini
[paths]
logs = /log
[server]
root_url = http://localhost/grafana
serve_from_sub_path = true
router_logging = true
[auth.anonymous]
enabled = true
;org_name = torq.co
;org_role = Viewer
Boot grafana container
podman run -d --name grafana -h grafana --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -v grafanaVolume:/var/lib/grafana -v logVolume:/log -e "GF_SECURITY_ADMIN_PASSWORD=YOURSECUREPASSWORDGOESINHERE" -e "GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-worldmap-panel,grafana-piechart-panel,briangann-datatable-panel" -v ~/grafana.ini:/etc/grafana/grafana.ini:z docker.io/grafana/grafana
Boot prometheus container
podman run -d --name prometheus -h prometheus --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -v ~/prometheus.yml:/prometheus.yml:z -v ~/alert.rules:/alert.rules:z -v logVolume:/log docker.io/prom/prometheus --config.file=/prometheus.yml --web.route-prefix=/prometheus --web.external-url=http://localhost/prometheus
Boot Jaeger container (open telemetry)
podman run -d --name jaeger -h jaeger --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 -v logVolume:/log docker.io/jaegertracing/all-in-one:1.45
Boot prometheus node exporter (OS metrics podman edition)
podman run -d --name prometheus-node-exporter -h prometheus-node-exporter --network=host --restart=always -v /etc/localtime:/etc/localtime:ro --pid="host" --net="host" -v "/:/host:ro,rslave" quay.io/prometheus/node-exporter:latest --path.rootfs=/host