In space, just-in-time delivery doesn't exist, which changes the rules of resiliency and backup options.
Disasters and equipment failures can happen at any
time, anywhere, and enterprise IT administrators need to properly
prepare for them. This past week, NASA fixed an equipment failure
aboard the International Space Station (ISS), and while it operates
in a very different environment from data centers here on Earth, its
operations can serve as a guide to terrestrial best practices.
NASA astronauts Rick Mastracchio and Mike Hopkins
exited the ISS on Dec. 21 for a five-and-a-half-hour spacewalk to
remove a faulty ammonia pump. On Dec. 24, the two astronauts took
another spacewalk, this time installing a new ammonia pump to restore
the ISS to full operations.
What's interesting to note here is that the new
ammonia pump was already aboard ISS as a spare part. In the hostile
environment that is space, redundancy isn't an option, and spare
parts aren't easily sourced from a remote location. In the case of
the spare ammonia pump, there's also the question of how NASA and its
ISS partners could have ferried a new ammonia pump to the station.
Much of the ISS, including the ammonia pumps, were originally carried
to space by way of the NASA shuttle fleet, which was decommissioned
in 2011 with the final flight of the Shuttle Atlantis.
From a disaster recovery and redundancy perspective,
NASA and its ISS partners had to plan from the beginning to have lots
of options for repair and replacement of station components. Simply
put, without the on-board ability to deal with certain types of
equipment failure, the ISS would not be the success it is today and
lives would be at risk.
Bringing the same message down to Earth, data centers
and even branch IT and small offices can learn from NASA's example.
While humans on Earth likely don't need to keep an extra ammonia pump
onsite, it does make sense to have other types of spare equipment on
premise.
Mission-critical servers and networking components
can and should have redundant power supplies and fans for cooling.
Power supplies and fans do break down and, even here on Earth where
an extra power supply or fan can easily be sourced, it still takes
time, which a mission-critical environment likely can't afford.
Automatic failover is another commonly deployed
feature in enterprise IT today. Clustered and mirrored server
deployments that automatically take over for a failed component is a
must-have in modern data centers.
Actually keeping extra equipment on hand, like NASA
does, might seem like a luxury, but it also makes sense. For smaller
branch and office IT environments, simply keeping an extra (perhaps
older) WiFi access point or router on hand for emergencies isn't a
bad idea. In the modern era, where the cloud exists for backup and
application delivery, it's important to remember that you still need
access to the cloud and you still require some form of on-site or
mobile equipment to do that.
Planning for failure means
that you have options. Without redundancy and spare parts, equipment
failure is an option that is more likely than not.
No comments:
Post a Comment