Home Install Torq Infrastructure and node monitoring

Infrastructure and node monitoring

Last updated on Jan 19, 2024

When you use Torq as middleware then you can get very good insights on how well your Lightning node is functioning by exposing prometheus and open telemetry.

To enable prometheus you need to provide extra configuration.

torq.prometheus.path = "localhost:7070"

To enable open-telemetry there are several options required, here is an example for Jaeger-HTTP:

otel.exporter.type = "otlpHttp"
otel.exporter.endpoint = "http://localhost:4318"
sampler.fraction = 0.1

Jaeger-gRPC:

otel.exporter.type = "otlpGrpc"
otel.exporter.endpoint = "http://localhost:4317"
sampler.fraction = 0.1

Note: Make sure otlp is enabled in Jaeger with --collector.otlp.enabled=true

Prometheus is for real-time statistics.

Open-telemetry (Jaeger) is for tracing, so more for backtracking how things behaved at a certain point in time. How many executions and how long those took to complete.

Grafana is for creating nice insightfull graphs

Prometheus node exporter is for OS metrics like memory, CPU, discspace, ...

Below we will provide an example configuration of how you could monitor you entire stack including OS. The example setup is using podman with host networking. Whenever using host network make sure you understand: you need a firewall!

File ~/prometheus.yml

global:
  scrape_interval: 5s
  external_labels:
    monitor: 'monitoring-torq-stack'
scrape_configs:
  - job_name: 'prometheus-torq'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - 'localhost:7070'
  - job_name: 'prometheus-jaeger'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - 'localhost:14269'
  - job_name: 'prometheus-node'
    static_configs:
      - targets: ['localhost:9100']
rule_files:
    - '/alert.rules'

File ~/alert.rules

groups:
- name: generic.service_down
  rules:
  - alert: service_down
    expr: up == 0
    for: 30s
    annotations:
      summary: Instance is down

File ~/grafana.ini

[paths]
logs = /log

[server]
root_url = http://localhost/grafana
serve_from_sub_path = true
router_logging = true

[auth.anonymous]
enabled = true
;org_name = torq.co
;org_role = Viewer

Boot grafana container

podman run -d --name grafana -h grafana --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -v grafanaVolume:/var/lib/grafana -v logVolume:/log -e "GF_SECURITY_ADMIN_PASSWORD=YOURSECUREPASSWORDGOESINHERE" -e "GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-worldmap-panel,grafana-piechart-panel,briangann-datatable-panel" -v ~/grafana.ini:/etc/grafana/grafana.ini:z docker.io/grafana/grafana

Boot prometheus container

podman run -d --name prometheus -h prometheus --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -v    ~/prometheus.yml:/prometheus.yml:z -v ~/alert.rules:/alert.rules:z -v logVolume:/log docker.io/prom/prometheus --config.file=/prometheus.yml --web.route-prefix=/prometheus --web.external-url=http://localhost/prometheus

Boot Jaeger container (open telemetry)

podman run -d --name jaeger -h jaeger --network=host --restart=always -v /etc/localtime:/etc/localtime:ro -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 -v logVolume:/log docker.io/jaegertracing/all-in-one:1.45

Boot prometheus node exporter (OS metrics podman edition)

podman run -d --name prometheus-node-exporter -h prometheus-node-exporter --network=host --restart=always -v /etc/localtime:/etc/localtime:ro --pid="host" --net="host" -v "/:/host:ro,rslave" quay.io/prometheus/node-exporter:latest --path.rootfs=/host