Docker Monitoring using Prometheus, cAdvisor, Node Exporter and Grafana

Linux system administration, Virtualization & Cloud computing

Introduction

As companies adopt Docker containers to achieve improved efficiency and velocity of application deployment, monitoring and observability become increasingly critical for running containers in production. Monitoring provides valuable metrics, logs and insights into how both applications and infrastructure are performing. This enables teams to troubleshoot issues proactively before they cause downstream impacts, as well as optimize usage and spending on containerized resources.

In this comprehensive guide, we will set up an integrated open-source monitoring stack for visualizing Docker host and container metrics using:

Prometheus: A popular open-source time series database for storing and querying numeric metrics
cAdvisor: A utility which collects resource usage and performance data from running containers
Node Exporter: Exposes hardware and OS metrics from physical and virtual Docker server hosts
Grafana: Feature-rich dashboards and graphs for analytics and visualization

Collectively these tools provide end-to-end observability into Dockerized environments, from physical infrastructure up through running applications. We will deploy them with Docker and set up metrics ingestion pipelines, storage, querying and dashboarding to understand exactly how containers utilize host resources over time.

Why Monitor Containers and Hosts?

First, why is monitoring so important for container infrastructure?

As applications are packaged into portable, isolated containers, they become more distributed across fluid pools of virtualized container hosts like nodes in a cluster.

This ephemeral architecture introduces visibility challenges including:

Understanding how containers utilize physical resources like CPU, memory, disk and network
Mapping which containers run on which hosts over time
Correlating application performance with lower-level resource metrics
Identifying trends and spikes in utilization to prevent over-utilization
Tuning container configurations and placements based on behavior
Establishing alerts to detect anomalies or problems

While containerization brings architectural benefits through loose coupling and portability, the environment grows increasingly complex from an operations view.

By collecting, storing and charting detailed time series metrics we regain visibility including:

Live resource usage monitoring with breakdowns per host, container, namespace
Historical trend analysis for capacity planning and optimization
Metrics-driven alerting when thresholds are crossed
Correlation between application response times and lower level characteristics
Dashboarding for understanding infrastructure at a glance

Understanding exactly how applications consume resources empowers more efficient operation, automated responses, and helps prevent instability from overutilization or resource bottlenecks.

Prerequisites

To follow along with all components, you will need:

A Linux server running Docker with latest version of Docker Compose
Docker daemon permission for writing metrics from exporters
Ports 9090, 9100 and 3000 accessible from monitoring tools
Basic Linux administration and Docker familiarity

For convenience, we will use an Ubuntu 20.04 system. But any modern Linux distribution should work well.

Now let’s explore how to wire up Prometheus metrics ingestion and visualized dashboards for container environments!

Step 1 – Create Isolated Docker Monitoring Network

First, create a user-defined bridge network for communication between the monitoring services using:

$ docker network create monitoring-net

This allows exposing the services on consistent hostnames instead of dynamic IP addresses that can change:

prometheus
grafana 
cadvisor
node-exporter

Now run each service attached to this network for simplified connectivity.

Step 2 – Set Up Prometheus Time Series Database

Prometheus is a specialized time series database optimized for ingesting and querying numeric metrics like counters, gauges and histograms even at high scale or cardinality. It scrappes and stores numeric metrics at regular intervals over time.

Prometheus runs well in containers and integrates with Docker environments using exporters to provide metrics about containers, images, volumes and more.

Pull the latest Prometheus server image:

$ docker pull prom/prometheus:latest

Next, create a directory for persistence across container restarts:

$ mkdir -p /prometheus-data

We need to define a Prometheus configuration file to discover what to monitor. Create this at /prometheus-data/prometheus.yml on your host with initial job definitions:

global:
  scrape_interval: 10s
  external_labels:
    monitor: production-01
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'cadvisor'
    scrape_interval: 5s
    static_configs:
      - targets: ['cadvisor:8080']
  - job_name: 'node'
    scrape_interval: 5s 
    static_configs:
      - targets: ['node-exporter:9100']

This configures connectivity to:

Prometheus self-metrics on port 9090
cAdvisor exposing container metrics on 8080
Node Exporter for host OS metrics on 9100

We will define these services next.

Finally, run the Prometheus server in detached mode:

$ docker run -d --name=prometheus \
    --network=monitoring-net \
    -p 9090:9090 \
    -v=/prometheus-data:/prometheus-data \ 
    prom/prometheus:latest \
    --config.file=/prometheus-data/prometheus.yml

This launches Prometheus connected to the monitoring network, loading the configuration file and persistent volume.

You can access the Prometheus UI at http://<server-ip>:9090. We will integrate Grafana shortly for dashboards.

Step 3 – Install cAdvisor for Container Metrics

cAdvisor (Container Advisor) is a utility for collecting, aggregating, processing, and exporting performance and resource usage metrics from running containers.

For example, cAdvisor exposes CPU, memory, filesystem, and network usage statistics per container. This allows understanding how much resources containers utilize relative to their host machines over time.

cAdvisor runs as a daemonset on each node in a cluster for monitoring resource usage. Here we will run it standalone with Docker.

Pull the latest cAdvisor image:

$ docker pull gcr.io/cadvisor/cadvisor:latest

Then launch the containerized cAdvisor agent:

$ docker run \
  --name=cadvisor \
  --network=monitoring-net \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  gcr.io/cadvisor/cadvisor:latest

This runs cAdvisor with access to:

Host Filesystem for collecting storage usage per container at /
Docker Socket to gather detailed metrics about running containers
Host Info like CPU/Memory via /sys

cAdvisor scrapes these sources and exposes aggregated metrics on port 8080.

Prometheus will automatically discover cAdvisor metrics on our monitoring network.

Step 4 – Installing Node Exporter on Docker Hosts

While cAdvisor exposes metrics about running containers, Node Exporter gathers OS and hardware metrics from the Docker hosts themselves such as CPU, memory, disk utilization, network, systemd services, and more.

This reveals performance and saturation issues that could impact applications like the operating system or physical hardware.

Using docker, launch node-exporter similarly:

$ docker run -d \
  --name=node-exporter \  
  --network=monitoring-net \
  -p 9100:9100 \  
  prom/node-exporter:latest

This exposes host metrics on port 9100. Prometheus will discover them for scraping into metrics like:

node_cpu_seconds_total{mode="idle"}
node_memory_MemAvailable_bytes 
node_network_transmit_bytes_total

We now have 2 pipelines sending system and container metrics into Prometheus.

Step 5 – Install Grafana for Beautiful Data Visualization

Raw metrics numbers in Prometheus can be hard to interpret over the CLI or UI. For improved analysis, Grafana provides flexible data dashboards with graphs, gauges and breakdowns mixing multiple metrics together.

Pull and run the official Grafana image:

$ docker run -d \
  --name=grafana \
  -p 3000:3000 \  
  --network=monitoring-net \
  grafana/grafana:latest

This runs Grafana connected to our monitoring services on port 3000.

Navigate to http://<server-ip>:3000 and log into Grafana using the default credentials:

Username: admin
Password: admin

Let’s set up our Prometheus data source next…

Step 6 – Configure Prometheus Data Source in Grafana

From the Grafana sidebar menu, click on “Configuration” then “Data Sources”.

Here you can manage connections to monitoring databases like Prometheus, Graphite, InfluxDB and more.

Select “Add Data Source” and set the following fields:

Name: Prometheus
Type: Prometheus
URL: http://prometheus:9090
Access: Proxy

Then click “Save and Test”. Grafana now has access to all metrics stored in Prometheus!

Step 7 – Import Dashboard Templates

Rather than creating dashboards completely from scratch, we can leverage the Grafana community dashboard ecosystem with pre-built templates.

Hover over the “+” icon on the left menu and select “Import”. Then enter the Dashboard ID 1860 which handles containers, or 893 for the Docker host view.

Grafana imports these templates pre-populated with graphs and breakdowns for our new Prometheus data source. Customize them or extend with more focused dashboards!

Now your Grafana instance should have insightful dashboards monitoring:

Per-Container Resource Usage Metrics
Host-Level OS, CPU Memory, and Docker Stats

Prometheus and Grafana now provide end-to-end data pipelines, storage and visualization for container environments. Next let’s discuss administrative functionality to maintain and scale our new monitoring stack.

Administering the Monitoring Services

Now that you have Prometheus, Node Exporter, cAdvisor and Grafana all running, here are best practices for administering these over the long term.

Persisting Prometheus Metrics History

By default Prometheus stores metrics locally on disk which limits capacity and durability. For production systems, use remote storage to retain history for longer trend analysis and capacity planning.

Popular long-term stores compatible with Prometheus include:

Azure Blob Storage
AWS S3
Google Cloud Storage
Thanos

Configure these under prometheus.yml‘s remote_write: and remote_read: sections.

Retaining Dashboard History in Grafana

To persist dashboard history across Grafana restarts, configure a database like PostgreSQL or MySQL under “Configuration – Data Sources – Grafana Database”. Sync monitoring dashboards to source control for version history.

Limiting Data Cardinality

Due to the high volume of per-container metrics possible, watch for metrics “cardinaltiy explosions”. Filter carefully with metric relabelling to control storage growth. Dropping less valuable metrics can improve performance.

Horizontal Sharding

To distribute load as environments grow larger, run multiple Prometheus pods in a StatefulSet with hashring or federation configurations. Similarly Grafana can be made highly available through replication controllers.

Summary

In this guide we have built a comprehensive monitoring stack for Docker hosts and container estates including:

Prometheus: Central metrics database scraping and storing container/host stats
cAdvisor: Gathering resource usage from running containers
Node Exporter: Harvesting OS and hardware metrics from servers
Grafana: Visualizing metrics through insightful dashboards

Together, these tools provide end-to-end visibility and alerts to detect anomalies across dynamic container environments.

Now that you have a working monitoring foundation, potential next steps include:

Integrating log data in Elasticsearch for correlate traces with metrics
Building alert rules and web hooks in Prometheus and Grafana
Autoscaling containers based on utilization with Kubernetes Horizontal Pod Autoscaler
Researching tools like Weave Scope or Lens for mapping containers to hosts

As you move containers and microservices to production, I hope this exploration into metrics, monitoring and visibility helps run infrastructure reliably! Let me know if you have any other questions.