What are the best books or resources on SRE automation that are also practical?

52 points by abbadadda 4 days ago

I’m bought in on the need for automation in SRE, now how do I do it? An SRE team I’m on is making a big push to automate, and I want to make sure we do this right. I also don’t need theoretical SRE books right now, I need some books or resources with case studies of things automated and ideally some code to boot. I’m looking mostly for the “HOW”, but I’m also interested in books to help think about “WHO/WHAT/WHERE/WHEN.” I’m _not_ interested in books that delve deeply into the WHY for either SRE or automation. I’m looking for resources practical applications, ideally with pitfalls to avoid, on how to build a world class DevOps organization and automate the right things on my team. I’m thinking API construction, design patterns to Use/Avoid, fully autonomous systems vs self-service tooling, and SLI construction and measurements. Sorry, this is a lot, but I might be helping to lead this team so I have a lot of questions on the implementation of this (not to mention how to steer office politics) and I’m feeling a bit out of my depth. Any advice on good resources to pursue is most appreciated!!!

jmillikin 4 days ago

Do you want a book on SRE, or do you want one on systems administration and operations?

SRE is a specialized software engineering discipline, the sort of thing you'd do if you had a team of 20 SWEs and wanted to write something like Kubernetes or Puppet. Good books on SRE (besides the Google book that popularized it) cover UNIX and Linux operating system concepts, low-level networking, and software architecture in general.

On the other hand, if you've got a team that does a lot of operations work by hand -- runbooks, MOPs, checklists, etc -- and want to automate it, then you should be looking for books on modern systems administration. Look for references to specific technologies such as Kubernetes (or Mesos), Terraform (or CloudFormation, or Pulumi), Puppet (or Ansible).

(context: I've been an SRE for ~10 years, first at Google then Stripe)

  • abbadadda 4 days ago

    Thank you very much for the recommendations. Do you know of any good books that cover how and what to automate? Let’s say that you have strong SysAdmin fundamentals (Linux, Networking, Python, Etc.), which books would you recommend to learn more about automating in a large organization?

  • tn890 4 days ago

    FYI Mesos is dead.

hcrean 4 days ago

The Google SRE books are good, they cover most of the major things. There are then lots of books on specific tooling you might want to use. This is not a one book subject!

dev_0 4 days ago

I don't believe there is a single book on this because of the difference in contexts.

You need to synergize different tools such as Terraform to implement the principles

  • thejosh 4 days ago

    Exactly there are so many different contexts and how your deployment works etc. And also the size of your team and the size of your org.

sreblog 4 days ago

I've definitely written a bunch for friends asking similar questions. Feel free to reach out (contact info on the website, which is in my hn bio).

LaurensBER 4 days ago

The DevOps handbook is great and explains both the why and includes plenty of practical case studies.

rr888 4 days ago

Is this a small company or a team in a big one? If you're a team in a corporate its good to talk to other divisions and see what they're up to. There might be company wide tools you dont know about. Usually too you'll see people sitting on a different floor with the same problems and its good to bounce ideas off even if you dont use the same solutions.

Also you'll see ITIL framework, its usually horribly overweight if you're small or even medium sized, but might give you some ideas.

vira28 3 days ago

Systems performance from Brendan Gregg. Thank me later.

  • abbadadda 2 days ago

    I’ve seen a few of his blog posts but never checked out his book! Thanks for the recommendation.