ClickHouse Monitoring with Instana

February 6, 2020

Instana - Enterprise Observability and APM for Cloud-Native Applications

ClickHouse has some amazing built-in monitoring capabilities accessible through the system tables (system.trace_log, system.metrics, system.query_log, etc.). They are great when diving deep into issues, but when ClickHouse is just one of the many things you have to watch for, it’s best if you can use a fully managed and automated monitoring solution such as Instana. It is no wonder that Instana provides excellent support for ClickHouse because Instana developers and operations are using ClickHouse to power Instana, and Instana to monitor ClickHouse.

Instana ClickHouse Logos
Let’s see how Instana exposes ClickHouse built-in monitoring data visually and through time, puts them into context, provides additional insights across your stack, and does it all with very little effort.

Infrastructure monitoring

To get Instana running, all you need to do is install a single agent per operating system instance. Once the Instana agent is installed on each of the hosts where the ClickHouse servers are running, Instana automatically discovers and provides real time information about the health and performance of the hosts themselves (CPU, memory, and IO) but also of the running ClickHouse servers. It’s therefore a great tool for operations as it helps them keep clusters healthy, upgrade to newer versions smoothly, and get capacity planning right.

Each ClickHouse server gets a dedicated dashboard where you have access to most of the ClickHouse metrics and other information like the number of active parts per table or the running queries. Metrics can also easily be compared across multiple servers.

ClickHouse Monitoring KPI Metric Comparison

Application monitoring based on distributed tracing

Relying on logs or infrastructure metrics only does not give you the full picture because ClickHouse servers do not live on their own. Queries to your ClickHouse cluster may come from multiple services, sometimes managed by different teams. Typical problems such as failed queries, slow queries, N+1 queries, and too frequent rate of inserts, can only be fixed when the service causing the issue is identified. This is why Instana captures the SQL statement, latency, potential error, receiving host, and caller information of every single query to ClickHouse.

ClickHouse Monitoring Distributed Tracing

Instana provides this level of insight because it comes with automatic tracing (no code changes required) across your distributed systems composed of services you built (e.g. Java, Python, NodeJS, etc.), databases, and messaging systems. It can even trace a ClickHouse query all the way back to the user or page who initiated everything from a web or mobile interface thanks to Instana End User Monitoring capabilities.

ClickHouse Monitoring Service Dependency Map

Not only does Instana give you access to all of the individual transactions that ever touched your ClickHouse cluster, it also aggregates them to form higher level concepts that everyone is familiar with: applications, services, and endpoints. Their corresponding dashboards make it easy to spot trends and outliers at a glance. With ClickHouse you get one service representing your ClickHouse cluster, and as many endpoints as there are tables. On each, you’ll find typical performance indicators such as call count, error rate, and latency, but also top queries and error messages.

ClickHouse Monitoring Summary Dashboard

From these dashboards and charts it’s easy to jump to the ClickHouse queries of interest, which can then be further filtered or grouped by all kinds of query properties, including information specific to your domain: tenant, timeframe, known query name, etc.

ClickHouse Monitoring Analytics Errors

Automatic issue detection and alerting

Dashboards are useful when looking for the root cause of a known problem (e.g. reported by one of your users) or when trying to improve reliability or response times in general. However for the rest of the time, it’s best to let Instana do the work for you, and let it detect issues as they arise: disk is soon running out of space, load is too high, sudden drop in the number of requests, error rate or latency too high, etc.

You do not get alerted on every single issue to prevent alert fatigue. Instead, Instana runs a root cause analysis for you by correlating events (e.g. the sudden high CPU usage observed on a ClickHouse server is correlated with the change to the ClickHouse setting max_thread) together to form incidents which are then reported within the product or sent to third-parties like PagerDuty or Opsgenie.

Instana comes with built-in knowledge and rules for all kinds of technologies including ClickHouse, but they can be extended based on your SLAs and things you’ve learned from you own experience operating ClickHouse. For example, you could create an issue every time some set of queries falls below a certain latency threshold or when insert queries are being throttled.

ClickHouse Monitoring Expert Health Rules


Instana provides great monitoring capabilities for ClickHouse and more. Insights are readily available and actionable whether you are a developer building services on top of ClickHouse or an ops in charge of running a ClickHouse cluster. If you are interested in trying it out, it’s easy, you can start a free trial right away, install the Instana agent on your machines, and watch your cluster magically appear on the map!

ClickHouse Monitoring Infrastructure Map

Play with Instana’s APM Observability Sandbox

Announcement, Developer, Featured, Product, Thought Leadership
AWS Lambda, the serverless functions (or FaaS) offering from Amazon continues to grow in usage, both overall and in production applications. One of the biggest challenges is how to trace and monitor...
Conceptual, Featured, Thought Leadership
Building scalable systems has become more accessible over the past decade thanks to immutable infrastructure, containers, and orchestration platforms such as Kubernetes. As the complexity of these applications continues to accelerate the...
Developer, Engineering, Events
Last month I had the chance to visit SRECON19 EMEA in Dublin for the very first time (or any SRECON for that matter). I have been to quite a few conferences over the path...

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit