Note on elasticity - this isn't just about outages
Architectural Goals
- High availability
- Linear scalability
- Elasticity/Flexibility
- Redundancy/Fault Tolerance
| z, ? | toggle help (this) |
| space, → | next slide |
| shift-space, ← | previous slide |
| d | toggle debug mode |
| ## <ret> | go to slide # |
| c, t | table of contents (vi) |
| f | toggle footer |
| r | reload slides |
| n | toggle notes |
| p | run preshow |

Mikhail Panchenko, Surge 2011

Note on elasticity - this isn't just about outages
this talk isn't solely about the cloud, cloud adds challenges to universal problems
reference Allspaw, Ben Fried for context
"Complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 78). Kindle Edition.
"The notion of baffling interactions is increasingly familiar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.

"The beauty of this is its simplicity. Once a plan gets too complex, everything can go wrong."
Walter Sobchak, The Big Lebowski
Linear = assembly line, Complex = nuclear powerplant, web arch
tight = heart surgery, loose = dinner (dishes, groceries, fallback to pizza)

"... they found that radioactive water was not traveling to the tank they intended, but because of complex flow and pressure interactions, was going to a different, wrong tank, which also overflowed, this time in the auxiliary building."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (pp. 22-23). Kindle Edition.
"The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network."
"Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region" http://aws.amazon.com/message/65648/
Previously independent systems become coupled as a result of unanticipated interactions, leading to fundamentally surprising results

Photo by wwarby

Photo by 20after4
"The notion of baffling interactions is increasingly familiar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
shared resources - flickr infinite recursion story
shared resources - flickr infinite recursion story

"temporarily disable a datacenter" instead of coming up with elaborate failure strategies
stampede to the other datacenter if one goes down entirely; also, potential for AWS to fuck up (see EBS/RDS outage) - they are susceptible to all the same problems


Photo by reschroederimages








202 Accepted

