What is a zombie (comatose) server, and why should I care?

Posted by Annie Paquette on October 28, 2019

Qsmsts42sucfw0tdtsrz

Whatis.com states that "a zombie server is a physical server that is running but has no external communications or visibility and contributes no compute resources, that it consumes electricity but is serving no useful purpose."

In a time when a server was dedicated to running a particular application or performing a specific function such as the late 1990s or early 2000s, this definition of a "zombie" made sense. Generally speaking, zombie servers happened because they were forgotten, or never configured for the purpose for which they had been acquired (a dirty secret of the IT industry). In many cases, once plugged in, the server operating systems never allowed these servers to enter their sleep state, for fear that they would be too slow to respond. Identifying these systems as zombies was relatively easy -- their power consumption was constant (a flat line).

In today’s data center environment, declaring a server comatose may be more difficult to do. Even in the smallest data centers, servers are now supporting virtualized and containerized applications. The software orchestration layer (VMware, Xen, Docker, Kubernetes, MESOS) is assigning and tracking workloads across physical servers. For the orchestration software to operate successfully, a pool of "idle" servers must be available to draw upon. Idle in this case means that the physical server is in a no-load or low-load state but that it can be rapidly configured/deployed to accept a workload. Consequently, declaring a server to be in a zombie or comatose state requires a little more effort and involves analysis of both idle time and understanding why the server is not being utilized by the orchestration layer. Asset tracking systems help the orchestration layer know what resources are present in the data center and available to be employed, and data analytics of the server utilization data from the orchestration layer can point to potentially idle systems.

Modern high-end power distribution units (PDUs) support the measurement of current and voltage at the infeed and outlet level and sit on the network. These PDUs can serve as a data collection point that provides another aspect of the necessary information for determining whether or not a server is truly comatose.  By gathering server power consumption data from the PDU into a database or DCIM tool, simple analytics can show whether or not a server is in a low power or constant current consumption state over time. By integrating this information back into the orchestration layer and running some reports, the data center operator can then easily determine if the server is truly comatose or if it is just underutilized due to lack of workload. This would indicate whether or not turning off the server could be done without undue business interruption.

In a virtualized/containerized data center environment replete with converged systems (server, storage, and networking in a single device), the most efficient system is either on and fully loaded, or completely off. A remotely managed PDU (one having outlet level switching) can ensure that a server is fully off and drawing zero power until the server is needed. When combined with solid state storage (SSD) rather than rotating media, a converged server system can reliably be turned on and off as needed and still be responsive to the needs of the orchestration layer for compute on demand.

Assumptions

  • "Server" is a converged system of motherboard, memory, solid-state storage, networking, and power supply
  • Idle (server + power supply) draws 25 percent of max rating; if server power supply is 400W, then draws 100W when idle, or 0.1 KW
  • Server is truly comatose and doing no useful work at all during a year, for (24*365 = 8,760 hours)
  • Cost of electricity for the data center is $0.05/kWHr

Turning the idle server completely off (drawing zero watts) can save as much as [0.1KW*8760 hours*$0.05/KW], or $44 per year per server.  As the cost of electric power rises, the savings per server increase commensurately. Multiply that by the potential of thousands of servers in a hyperscale data center and the savings can add up quickly indeed. And that’s why we should all care.

Power supply size 400 400 400 400 400
Idle power draw (kW) 0.1 0.1 0.1 0.1 0.1
Hours per year 8760 8760 8760 8760 8760
Cost of electricity per kWHr 0.05 0.07 0.1 0.12 0.15
Savings  $    43.80  $    61.32  $    87.60  $  105.12  $  131.40

Click here to view the original article published on betanews, March 14, 2019. 

Things that go Bump in the Data Center