Getting to resolution on any continuity event is a challenge for any organization due to the nature and criticality of the application and potential loss of revenue. Understanding what it takes to resolve an issue takes time, a bit of luck and a lot of research and preparation.

Once your aware of the type of outage the organization is experiencing, the next step is to triage the extent of the incident. Triage allows organizations together data points on what actually happened to determine deal with a outage. This leads to process to resolve the situation. In many cases, there are multiple solutions depending severity of an incident.

Case in point, lets take a look at the Colonial Pipeline ransomware attack which was breaking news on Friday May 7th, 2021. At the time of this writing the incident was ongoing six days into the event. With ransomware, many organizations treat it like a cyber-security issue. This is true when your “Prevention” mode to which I call PLAN A. This generally involves the organizations cyber-security posture. However, once the infection has taken hold, your in “Reaction” mode. This is a business continuity event on all systems affected by the incursion.

This is where resolution comes in. In this extreme case, you have two teams that are scrambling. One trying to figure out what failed from a risk mitigation perspective. In other words, what security systems failed to detect this issue, how do they sterilize the environment and determine who caused the security violation. Another team is in BC/DR mode. They are trying to get mission critical infrastructure back up and running but ransomware is particularly a insidious situation. They are depending the the security team to ensure hygiene of the environment before they can bring systems online. Therefore, the resolution stage can be very labor intensive and expensive. Systems have to be wiped clean, restored, validated to be virus free and then put back on the network. Many opt to buy all new equipment with new operating systems and fresh installs of the application because of the risk of reinfection which takes even longer to complete and increases the cost exponentially.

Humans have to think though these and determine what the logical steps are to resolve the incident. This again can take time and his heavily prone to error. This can add minutes to hours or more (days and weeks with ransomware) depending on the type and complexity of an outage. Strategies vary from addressing resolution can range from an adhoc approach to drawing up complex flow charts and “playbooks” to determine a logical resolution option.

If you take the human factor out of the mix, resolution can be in most cases handled programmatically. Done properly, recovery actions can be planned like a playbook to deal with each type of risk factor thus giving organizations a PLAN B. During execution, a reliable process can be used that works 100% of the time for individual types of outages. This requires mature continuity products that can provide automated resolution procedures that are tested and highly proven.

For more information on how you can recover from ransomware, please view the Neverfail global webinar entitled “How to Recover From Ransomware” hosted my Michael Wrightson. This video talks about how IT organizations can not only recovery from a ransomware fast but any type of outage that can take down mission critical systems.

If you missed the second part of the series, please read “The Real View of a Disaster – Awareness“. However, if you would like to read a short eBook on this topic, download “Anatomy of an Outage” by Neverfail. I’m sure you will find it interesting!

Comments are closed.