San Francisco Public Library

Site reliability engineering, how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

Label
Site reliability engineering, how Google runs production systems, edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy
Language
eng
Bibliography note
Includes bibliographical references (pages 501-512) and index
Illustrations
illustrations
Index
index present
Literary Form
non fiction
Main title
Site reliability engineering
Nature of contents
bibliography
Oclc number
930683030
Responsibility statement
edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy
Sub title
how Google runs production systems
Table Of Contents
Introduction. The production environment at Google, from the viewpoint of an SRE -- Principles. Embracing risk -- Service level objectives -- Eliminating toil -- Monitoring distributed systems -- The evolution of automation at Google -- Release engineering -- Simplicity -- Practices. Practical alerting from time-series data -- Being on-call -- Effective troubleshooting -- Emergency response -- Managing incidents -- Postmortem culture: learning from failure -- Tracking outages -- Testing for reliability -- Software engineering in SRE -- Load balancing at the frontend -- Load balancing in the datacenter -- Handling overload -- Addressing cascading failures -- Managing critical state: distributed consensus for reliability -- Distributed periodic scheduling with Cron --Data processing pipelines -- Date integrity: what you read is what your wrote -- Reliable product launches at scale -- Management. Accelerating SREs to on-call and beyond -- Dealing with interrupts -- Embedding an SRE to recover from operational overload -- Communication and collaboration in SRE -- The evolving SRE engagement model -- Conclusions. Lessons learned from other industries
Classification
Content
Mapped to