Data Center Resiliency in the Time of COVID-19

Posted by Isaiah LaJoie on November 09, 2020

Jp7xqssyt26v3pzyhmnt

The topic of resiliency in 2020 is an interesting one for all of us. This has been a year in which the world has unwittingly come to understand, in many ways, the job that those who operate data centers perform every day. Inasmuch as every data center facility operates just one short step away from any number of system-wide outage events, the current COVID-19 pandemic serves as a useful backdrop to a conversation about data center resiliency. The concept of resiliency is different than that of the more commonly understood term, redundancy.

Due to the COVID-19 crisis, communities in the US and abroad have had to address tough questions, such as: what happens when we have an outage in one part of the infrastructure? What are the related systems, and how will they be impacted? What happens when the operating system itself is attacked, and parts of the infrastructure have to be shut down? These are the kinds of questions being asked by medical and community leaders worldwide; and they are the same types of questions that data center planning teams ask when anticipating or dealing with a crisis.

As cities and countries seek to solve dilemmas brought on by COVID-19, they experience the differences between redundancy and resiliency. While redundancy may be thought of as an emergency reserve or back-up capacity —like extra ventilators or personal-protection equipment  — resiliency is the more complex notion of solving one or more urgent problems while in the midst of crisis.

What does this mean in a data center?  As we’ve established, resiliency is not the same as redundancy. True resiliency is the ability of a facility to recover which, by definition, implies that there will eventually be something to recover from. The reality is that there is no perfect scenario – no perfect system, no perfect device, no perfect staff, no perfect process, no perfect procedure – to ensure that a facility will remain one hundred percent operational at all times. There is always room for error, even in the slightest margins.

Resiliency, in many ways, is the ability of our industry to embrace the reality of imperfection. We are, after all, humans. The equipment that comprises all data center systems — mechanical, electrical, security-related, etc. – is designed and built by humans (or at least, under their watch). All of these systems operate in a state of grace and are, at any point in time, closer or further away from operating improperly. Along this line, resiliency is owning the fact that something could go wrong and determining the best way to recover when it does. The best way, meaning, no downtime to the end user. 

As with the pandemic, the key to resiliency is knowledge, as well as its judicial application to the situation. The same rule applies in a data center — you have to know what you are dealing with — from both a design and an operational standpoint. It’s not enough to know your systems and how they work, you also have to know what to do when they aren’t working. The key to your resiliency lies in all phases of the life of the facility: from design and commissioning, through operation and maintenance, to obsolescence and migration. 

 

IoT In Our Lives