I have a blog entry that Iâ€™ve been percolating for a long, long time around this topic, but Iâ€™ve never finished it. One of the most common discussions in IT (especially system and network administration) is around achieving â€œ5 9â€™sâ€ of uptime â€“ that is, 99.999% availability for critical services. What that means is that a system is able to be unavailable for about five and a half minutes for each year that it is running. This is often discussed as the â€œgold standardâ€ of IT.
Unfortunately, what Iâ€™ve noticed is that the services that are held to 5 9â€™s are the least agile of the services â€“ what theyâ€™re giving up by creating a 5 9â€™s service is the ability for the service to have any flex in it â€“ they take no risks and no chances that could lead to improved performance because of the incredibly inflexible limitations of the goal. Note that this is incredibly important in some cases â€“ 911 services, life support equipment, etc. But this is the standard to which we in IT also tend to hold things like email services, web sites, etc.
Iâ€™ve always thought that this was much like the idea of flow states â€“ for every type of service or task, thereâ€™s an ideal level of service that allows you to be both stressed and most relaxed. That optimal level of stress exists for service delivery in IT - the flexibility to upgrade, add performance enhancements, etc. are part of achieving optimal service, as much as an appropriate level of uptime. And pushing too far to one side (flexible service) or the other (absolute service availability) leads to the service being out of its â€œflowâ€ state.