Note on elasticity - this isn't just about outages
Architectural Goals
- High availability
- Linear scalability
- Elasticity/Flexibility
- Redundancy/Fault Tolerance
z, ? | toggle help (this) |
space, → | next slide |
shift-space, ← | previous slide |
d | toggle debug mode |
## <ret> | go to slide # |
c, t | table of contents (vi) |
f | toggle footer |
r | reload slides |
n | toggle notes |
p | run preshow |
Mikhail Panchenko, Surge 2011
Note on elasticity - this isn't just about outages
this talk isn't solely about the cloud, cloud adds challenges to universal problems
reference Allspaw, Ben Fried for context
"Complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 78). Kindle Edition.
"The notion of baffling interactions is increasingly familiar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
"The beauty of this is its simplicity. Once a plan gets too complex, everything can go wrong."
Walter Sobchak, The Big Lebowski
Linear = assembly line, Complex = nuclear powerplant, web arch
tight = heart surgery, loose = dinner (dishes, groceries, fallback to pizza)
"... they found that radioactive water was not traveling to the tank they intended, but because of complex flow and pressure interactions, was going to a different, wrong tank, which also overflowed, this time in the auxiliary building."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (pp. 22-23). Kindle Edition.
"The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network."
"Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region" http://aws.amazon.com/message/65648/
Previously independent systems become coupled as a result of unanticipated interactions, leading to fundamentally surprising results
Photo by mathematically_impossible
Photo by wwarby
Original photos by mathematically_impossible and miheco
Photo by 20after4
"The notion of baffling interactions is increasingly familiar to all of us. [...] As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other systems, they experience more and more incomprehensible or unexpected interactions. They become more vulnerable to unavoidable system accidents."
Charles Perrow. Normal Accidents: Living with High-Risk Technologies (p. 72). Kindle Edition.
tuned mass damper from Taipei
Photo by erikcharlton
shared resources - flickr infinite recursion story
shared resources - flickr infinite recursion story
"temporarily disable a datacenter" instead of coming up with elaborate failure strategies
stampede to the other datacenter if one goes down entirely; also, potential for AWS to fuck up (see EBS/RDS outage) - they are susceptible to all the same problems
Photo by reschroederimages
202 Accepted
Photo by joshme17