How to Implement Global View and High Availability for Prometheus
Originally published on The New Stack and Squadcast. Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting. Prometheus, a Cloud Native Computing Foundation (CNCF) project, has become one of the most popular open source solutions for application and system monitoring....