CockroachDB generates detailed time series metrics for each node in a cluster. This page shows you how to pull these metrics into Prometheus, an open source tool for storing, aggregating, and querying time series data. It also shows you how to connect Grafana and Alertmanager to Prometheus for flexible data visualizations and notifications.

All files used in this tutorial can be found in the monitoring directory of the CockroachDB repository.

Before You Begin

Make sure you have already started a CockroachDB cluster, either locally or in a production environment.

Step 1. Install Prometheus

  1. Download the latest Prometheus tarball for your OS.

  2. Extract the binary and add it to your PATH. This makes it easy to start Prometheus from any shell.

  3. Make sure Prometheus installed successfully:

    $ prometheus -version
    prometheus, version 1.4.1 (branch: master, revision: 2a89e8733f240d3cd57a6520b52c36ac4744ce12)
      build user:       [email protected]
      build date:       20161128-10:02:41
      go version:       go1.7.3

Step 2. Configure Prometheus

  1. Download the starter Prometheus configuration file and aggregation rules for CockroachDB:

    # Configuration file:
    $ wget \
    -O prometheus.y    
    # Aggregation rules:
    $ wget -P rules

    When you examine the configuration file, you'll see that it is set up to scrape the time series metrics of a single, insecure local node every 10 seconds:

    • scrape_interval: 10s defines the scrape interval.
    • metrics_path: '/_status/vars' defines the Prometheus-specific CockroachDB endpoint f scraping time series metrics.
    • scheme: 'http' specifies that the cluster being scraped is insecure.
    • targets: ['localhost:8080'] specifies the hostname and http-port of the Cockroach node to collect time series metrics on.
  2. Edit the configuration file to match your deployment scenario:

    Scenario Config Change
    Multi-node local cluster Expand the targets field to include 'localhost:<http-port>' for each additional node.
    Production cluster Change the targets field to include '<hostname>:<http-port>' for each node in the cluster. Also, be sure your network configuration allows TCP communication on the specified ports.
    Secure cluster Uncomment scheme: 'https' and comment out scheme: 'http'.

Step 3. Start Prometheus

  1. Start the Prometheus server, with the -config.file flag pointing to the configuration file:

    $ prometheus -config.file=prometheus.yml
    INFO[0000] Starting prometheus (version=1.4.1, branch=master, revision=2a89e8733f240d3cd57a6520b52c36ac4744ce12)  source=main.go:77
    INFO[0000] Build context (go=go1.7.3, [email protected], date=20161128-10:02:41)  source=main.go:78
    INFO[0000] Loading configuration file prometheus.yml     source=main.go:250
    INFO[0000] Loading series map and head chunks...         source=storage.go:354
    INFO[0000] 0 series loaded.                              source=storage.go:359
    INFO[0000] Listening on :9090                            source=web.go:248
    INFO[0000] Starting target manager...                    source=targetmanager.go:63
  2. Point your browser to http://<hostname of machine running prometheus>:9090, where you can use the Prometheus UI to query, aggregate, and graph CockroachDB time series metrics.

    • Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, point your browser to <hostname of a CockroachDB node>:8080/_status/vars.
    • For more details on using the Prometheus UI, see their official documentation.

Step 4. Visualize metrics in Grafana

Although Prometheus lets you graph metrics, Grafana is a much more powerful visualization tool that integrates with Prometheus easily.

  1. Install and start Grafana for your OS.

  2. Point your browser to http://<hostname of machine running grafana>:3000 and log into the Grafana UI with the default username/password, admin/admin, or create your own account.

  3. Add Prometheus as a datasource, and configure the datasource as follows:

    Field Definition
    Name Prometheus
    Default True
    Type Prometheus
    Url http://<hostname of machine running prometheus>:9090
    Access Direct
  4. Download the starter Grafana dashboards for CockroachDB:

    # runtime dashboard: node status, including uptime, memory, and cpu.
    $ wget
    # storage dashboard: storage availability.
    $ wget
    # sql dashboard: sql queries/transactions.
    $ wget
    # replicas dashboard: replica information and operations.
    $ wget
  5. Add the dashboards to Grafana.

Step 5. Send notifications with Alertmanager

If you like, you can connect Alertmanager to Prometheus to send alerts to email, chat clients, or other channels when specific rules are met. CockroachDB provides starter rules for your convenience.

  1. Download the latest Alertmanager tarball for your OS.

  2. Extract the binary and add it to your PATH. This makes it easy to start Alertmanager from any shell.

  3. Make sure Alertmanager installed successfully:

    $ alertmanager -version
    alertmanager, version 0.5.1 (branch: master, revision: 0ea1cac51e6a620ec09d053f0484b97932b5c902)
      build user:       [email protected]
      build date:       20161125-08:15:17
      go version:       go1.7.3
  4. Download the alerting rules for CockroachDB to the rules/ directory, where the Prometheus config expects to find it:

    $ wget -P rules
  5. Edit the Alertmanager configuration file that came with the binary, simple.yml, to specify the desired receivers for notifications.

  6. Start the Alertmanager server, with the -config.file flag pointing to the configuration file:

    $ alertmanager -config.file=simple.yml
  7. In the shell running Prometheus, use CTRL + C to stop Prometheus and then restart it with the -config.file flag pointing to the Prometheus configuration file and the -alertmanager.url flag pointing to the machine running Alertmanager:

    $ prometheus -config.file=prometheus.yml \
    -alertmanager.url=<hostname of machine running alertmanager>:9093
  8. Point your browser to http://<hostname of machine running alertmanager>:9093, where you can use the Alertmanager UI to define rules for silencing alerts.

Yes No