Certificate watcher with telegraf and Grafana?
One of our customers uses self-signed certificates for some internal processes. From a technical perspective, they work as expected. Problems arise after about one year, when the certificates expire. Since they are not officially signed, no one watches them to check when they expire. I was wondering if there was a way to set up an easy watch and alert system with our existing monitoring tools: telegraf, InfluxDB and Grafana.
I will show you how we solved our issue with these three technologies.
For some services in the Hadoop Cluster, our customer uses self-signed certificates. To generate, revoke and regenerate them, we created several ansible playbooks, so we automated the entire creation process. The issue is: No one is watching the expiration date of these certificates. Therefore, in the past, we have run into troubles multiple times because the certificates expired. In order to avoid this, some colleagues created reminders in Outlook or other tools to get a hint weeks before the expiration date, but if they were on vacation or left our department, the notifications left with them.
To monitor other metrics in the cluster, we use telegraf and push the metrics into InfluxDB. Then we use Grafana to visualize the metrics and setup alerts on top of it. I was wondering if some special monitoring tools or other established mechanism existed, but could not find any that were sufficient for our use case.
Telegraf is a tool which collects several metrics, like CPU or RAM usage and publishes these metrics in a database, such as InfluxDB. It acts as an agent and uses several plugins to collect metrics. For example, the plugins "CPU" and "Disk" collect metrics about CPU usage and disk information.
On the official website, you can find a lot more information about telegraf and how it works.
I checked the telegraf input plugins and found the X509 cert plugin. The plugin looks quite promising, and the configuration is minimal and easy.
In general, the plugin just needs some sources and starts to collect metrics of them. These sources could be URLs like "https://ordix.de" or some static files, like the self-signed certificates of our customer. If you want to use static files, you must put the path to the certificate into the source list, for example: "/etc/ssl/certs/ssl-cert-snakeoil.pem"
After the configuration of the sources, I recommend increasing the metrics collection interval to at least "24h" because it is not necessary to have so many data points for this.
After reloading your telegraf service, you should see the first metrics in the InfluxDB.
The plugin provides several tags, like "source", "organization", "issuer_common_name" or "san".
We can also use the following fields:
- verification_code (int)
- verification_error (string)
- expiry (int, seconds)
- age (int, seconds)
- startdate (int, seconds)
- enddate (int, seconds)
Consultant bei ORDIX
Bei Updates im Blog, informieren wir per E-Mail.