The testing of backup power systems is crucial for ensuring that data center operations remain available through power interruptions. By cutting all power to the facility and replicating a real-world electrical grid failure, pull-the-plug testing provides the most comprehensive assessment of these systems. However, there are some differing opinions on the best way to perform the test and whether the electrical utility company needs to be involved.
Results from the Uptime Institute Data Center Resiliency Survey 2023 found that more than 70% of organizations perform pull-the-plug tests (Figure 1), and of this group, roughly 95% do so at least annually. At the same time, less than half of operators involve their utility company in the process — raising questions over the best practices and the value of some approaches to performing the test.
Operators are not required to notify the utility company of these tests in most cases. This is because it is unlikely that a sudden drop in demand, even from larger data centers, would impact an average-sized grid.
Successful pull-the-plug tests assess a range of operations, including power-loss detection, switchgear, backup generation and the controls needed to connect to on-site power production systems. Depending on the facility design, it may not be possible to fully test all these functions without coordinating with the electrical utility company.
Therefore, organizations that interrupt their power supply independently, without the involvement of the utility, are at risk of performing an incomplete test. And this may give a false sense of security about the facility’s ability to ride through a power outage.
Figure 1. Most data center operators perform pull-the-plug tests
Below are three of the most common approaches for performing a pull-the-plug test and the key considerations for operators when determining which type of test is best suited to their facility.
Coordinating with the electrical utility provider
For this test, the grid provider cuts all incoming power to the data center, prompting the backup power controls to start.
Although this approach guarantees an interruption to the power supply, it can create challenges with costs and scheduling. Because this is a full test of all backup functions, there are some risks. This means it is crucial to have staff with the necessary skills on-site during the test to monitor each step of the procedure and ensure it runs smoothly. This can create scheduling challenges since the test may be constrained by staff availability, including those from suppliers. And because utility providers typically charge fees for their technicians, the costs can increase if unforeseen events, such as severe weather, occur that result in a call for a rescheduling of the test.
Typically, operators have to use this approach when they lack an isolation device — but these carry their own set of challenges.
Using an isolation device to interrupt power
A pull-the-plug test may also be carried out using an isolation device. These are circuit breakers or switches that are deployed upstream of the power transformers. Opening the isolation device cuts the power from the grid to the facility without requiring coordination with the electrical utility company. This approach can cut costs and remove some of the scheduling challenges listed in the previous section, but may not be feasible for some facility designs.
For example, if the opened hardware is monitored by a programmable logic controller (PLC), the generators may start automatically without using (and therefore testing) the controls linked to the power transformer. In this case, the testing of the power-loss detection, the processes for switching devices to on-site power use, and the controls used to coordinate these steps can be bypassed, leading to an incomplete test.
The use of an isolation device can also create new risks. Human error or hardware malfunctions of the device can result in unintended power interruptions or failures to interrupt the power when necessary. Other factors can add to these risks, such as installing the device outside a building and exposing it to extreme weather.
Data center operators that have deployed isolation devices in the initial facility’s design are the most likely to use them to conduct pull-the-plug tests. Those operators that do not have the devices already installed may not want to have them retrofitted due to new concerns, such as spatial challenges — some standards, such as the National Electrical Code in the US, require additional open space around such deployments. Any new installations would also require testing, which would carry all the risks and costs associated with pull-the-plug tests.
Pulling the power transformer fuses
Pulling the power transformer fuses tests all the PLC and backup system hardware required for responding to utility power failures and does not require coordination with the grid provider. However, the power loss to the facility is only simulated and not experienced. The PLC reacts as if an interruption to the power has happened, but a true loss of grid power only occurs once the generator power is active and backup systems are online.
In this case, the uninterruptible power supply (UPS) batteries only discharge for a fraction of the time that they would normally in an actual power outage and are therefore not fully tested. Depending on the PLC design, other ancillary processes may also be skipped and not tested.
However, this approach has many advantages that offset these limitations. It is widely used, particularly by operators of facilities that are the most sensitive to risk. Because the grid power is not interrupted, it can be restored quickly if the equipment malfunctions or human error occurs during the test. And because the UPS batteries are discharged only for a short time, there is less stress and impact on their overall life expectancy.
Facilities that have difficulties with interrupting the power, such as coordinating with the utility or have designs that place staff at risk while opening breakers and switches, also benefit from this approach.
While data center operators have options for pulling the plug, many are unwilling or unable to perform the test. For example, colocation providers and operators of facilities that process critical infrastructure workloads may be restricted over how and when they can pull the plug due to customer contracts.
The Uptime Intelligence View
Uptime Intelligence data has consistently shown that power is the most common cause behind the most significant data center outages, with the failure to switch from the electrical grid to on-site a recurrent problem. At the same time, electrical grids are set to become less reliable. As a result, all operators can benefit from reviewing their pull-the-plug testing procedures with their clients, regardless of whether they involve the energy provider or not, to help ensure resilient backup power systems.
For more details on data center resiliency and outage prevention, Uptime Institute’s Annual Outages Analysis 2023 is available here.
The post Are utility companies needed for pull-the-plug testing? appeared first on Website Host Review.