Using a Kubernetes Service Mesh Worth It?

“A service mesh is a dedicated infrastructure layer that decouples some of the critical operational tasks of a distributed application from its business logic. Large-scale, Kubernetes-hosted microservice applications are natural candidates for service meshes due to their complex requirements of inter-services communication (e.g., retries, timeouts, traffic splitting), observability (e.g., metrics, logs, traces), and security features (e.g., authentication, authorization, encryption). Service meshes can offload many operational concerns of the Kubernetes cluster, leaving the developers to focus on business logic....

December 9, 2021

How important is Observability for SRE?

Originally published on Squadcast. Observability is the practice of assessing a system’s internal state by observing its external outputs. Through instrumentation, systems can provide telemetry such as metrics, traces, and logs that help organizations better understand, debug, maintain and evolve their platforms. SREs use many tools and practices to manage services at scale and observability is a crucial part of it. Observability enhances SRE by allowing its practitioners to infer a system’s internal state....

December 3, 2021

Should You Run a Database on Kubernetes?

Historically, stateful workloads were run outside container orchestrators. Platforms built on top of orchestrators like Kubernetes were not, yet, proficient at dealing with data. But these systems have matured and now offer features the allow stateful workloads to be efficiently managed by such systems. If you want to find how can databases be run on Kubernetes, what mechanisms it offers, and what type of databases and data are best suited for it, check this out....

November 17, 2021

How to Measure System Reliability

Originally published on Cprime. As businesses grow, new requirements arise for teams. The technology ecosystem becomes ever more complex and it is really important to understand each change and how it affects the overall system, as well as the service provided to users, who have high expectations. They expect systems to be up, responsive, fast, consistent, and reliable. System ReliabilityReliability for systems means that a system is doing what its users need it to do....

November 11, 2021

How to improve your influence as an SRE

Originally published on Squadcast. Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task. SRE is an opinionated implementation of DevOps and is defined by Ben Sloss, VP of Engineering at Google as “what happens when you ask a software engineer to design an operations function”. And it even comes with a completely free manual and workbook. Although SRE aims to be a “prescription” on how to run complex systems the right way, reliability can mean different things in different contexts....

November 10, 2021