IT service continuity management (ITSCM) is a key component of ITIL service delivery. It focuses on planning for incident prevention, prediction, and management with the goal of maintaining service availability and performance at the highest possible levels before, during, and after a disaster-level incident.
The goal of ITSCM is to reduce the downtime, costs, and business impact of incidents by putting effective, standardized processes in place for when those incidents do inevitably occur.
Because without a plan, there are a lot of factors that can slow—or stop—incident recovery. After all, your on-call expert might be responding when they’re bleary-eyed at 3 a.m. They might be out of touch with the code after working on something else for weeks or months. They might panic at the scale of the disaster-level incident. Or they might be the newest member of the disaster recovery team, without as much experience resolving issues.
Having a well-documented, clear plan for service continuity management will help minimize any delays caused by learning curves, time away from the code, disaster panic, or midnight alerts.
In ITIL 4, service continuity management is a process meant to support business continuity management (BCM). The goal of the process is to make sure services are back up and running within the agreed-upon business timelines after major service disruptions.
ITIL 4 makes a distinction between incident management—which handles incidents at a variety of impact levels—and ITSCM, which is about planning for large-scale disasters.
So, what exactly constitutes a disaster? The answer may be different for each business, but the Business Continuity Institute defines it as: “A sudden unplanned event that causes great damage or serious loss to an organization. It results in an organization failing to provide critical business functions for some predetermined minimum period of time.”
The scale of what we call a disaster, the predetermined minimum time, and the definition of critical business functions are three things each business will need to define and document for themselves.
Business continuity management is a process managed outside IT that identifies risks to the business and works to mitigate those risks. Some risks may be IT-related, including disaster-level incidents, and some risks may be outside IT control, such as natural disasters or facility fires.
Since BCM encompasses ITSCM as well as other risk-mitigation processes, it makes sense for IT teams to work closely with the BCM team to create:
From a business perspective, the goal of ITSCM is to reduce the downtime, costs, and business impact of disaster-level incidents. On a more tactical level, objectives include:
Here at Atlassian, our own continuity plan , is built on the assumption that the process of disaster planning is ongoing, leadership-driven, and thoroughly tested. We are determined to not #@!% our customers . Our process includes planning, communication, clear responsibilities, testing, and continuous improvement.
The planning process starts with asking high-level questions and then building a plan based on your answers. Starting questions should include:
Once you have answers to these questions, the next step is to use those answers to define:
The key to a successful ITSCM planning phase is documenting and templatizing the resulting plan to make it clear and repeatable. Having assets such as an incident response playbook or other runbooks can be a source of truth and organization to responders during a high-stakes scenario.
In the spirit of ITSCM, a solution with access to a built-in knowledge base —like Jira Service Management powered by Confluence—allows for continuous documentation that allows for revision, optimization and collaboration. That way, responders have access to previous resolution documentation and up-to-date resources.
Who’s responsible in case of disaster? Who’s responsible for maintaining and updating plans, processes, and documentation? ITSCM should always have a clear sense of roles and responsibilities not only for disasters themselves, but for ongoing monitoring and improvement. Using Jira Service Management, responders can tag the appropriate party or person on issues to ensure responsibilities are properly delegated and to facilitate cross-functional collaboration.
At Atlassian, part of our approach is to have regular disaster recovery meetings with our site reliability engineers and our risk and compliance team. They discuss gaps in disaster recovery and identify where additional plans, improvements, assessments, or changes need to be made.
Openness is a core value at Atlassian and we believe the more informed your organization is about your ITSCM plans, the more effective those plans will be.
Offering flexible communication channels throughout the incident response processallows teams to stay in touchby their preferred method. Jira Service Management integrates multiple communications channels to minimize downtime, such as embeddable status widget, dedicated statuspage, email, chat tools, social media, and SMS.
Not only does communication keep stakeholders on board and help the c-suite stave off panic during a disaster-level incident, but it also allows the team to reach out for help from other teams if needed and mitigate the risk of friction caused by organizational confusion.
How do you know if your plans work unless you test them? This is a foundational question for ITSCM and the reason that testing and incident management drills are vital to the success of the practice.
Testing can help you identify weak points in your process, unforeseen issues, and where teams may need re-training or better documentation.
ITSCM is not a one-and-done process. It requires thoughtful planning up front and ongoing training, assessment, and improvement. That’s why we have regular disaster recovery meetings. It’s why we test system backups and run drills on what happens in case of a data center outage or AWS region failure. And it’s why any ITSCM plan worth its salt is a continually monitored, ever-changing thing.
Most companies represent the ITSCM process as a series of steps, but we think it’s more like a circle. Planning should lead to defined roles and responsibilities. From there, the team should communicate across the organization, test and test again, assess, monitor, and improve and, in those improvements, continue to update the plan, further define roles, and continue communicating.
Again, this is where a built-in, collaborative knowledge base comes into play. Knowledge base articles are a valuable resource when it comes to assessment and documentation. Incident postmortem reports are crucial for revision and repair following an incident, but can also act as a longstanding resource for potential problems in the future. Jira Service Management, powered by Confluence, offers a powerful collaborative platform to execute assessment and improvement solutions.
In order to effectively plan and implement ITSCM practices across the organization, many businesses appoint a Service Continuity Manager and a Service Continuity Recovery Team.
As the name suggests, the Service Continuity Manager is responsible for overseeing service continuity. This person typically owns the process from A to Z, leading plan development, managing ongoing monitoring and assessment activities, and overseeing plans in action in case of disaster.
This person is typically an experienced, senior-level technical support professional, but may be in a management role and not directly involved with the tech day to day.
Led by the SCM, this team is responsible for running tests and incident drills and continually improving ITSCM. The team typically includes technical staff, QA professionals or users for testing, and representatives from departments across the organization who are responsible for keeping lines of communication open between ITSCM and their teams.
Organizations with clear plans for disaster recovery will recover quicker and more fully in case of disasters.
ITSCM isn’t about planning for everyday outages. It’s about addressing worst-case scenarios and ensuring that if they happen, they cause minimal disruption to the lives of customers and employees.
Here are three clear benefits of a good ITSCM practice:
Discover how ITSCM improves customer service quality and minimizes organizational downtime with Jira Service Management.
Try Jira Service Management free
IT service continuity management (ITSCM) is a key component of ITIL service delivery. It focuses on planning for incident prevention, prediction, and management with the goal of maintaining service availability and performance at the highest possible levels before, during, and after a disaster-level incident.
What are the four stages of Itscm? ›ITSCM process comprises of four stages − Initiation, Requirements & strategy, Implementation, and Ongoing operation.
What are the main benefits of proper IT service continuity management? ›Benefits of IT Service Continuity Management
Controlled recovery of systems. Reduction of downtime - increased continuity of service to customer. Minimal disruption to Departments business.
What are the 7 steps of continuity management? › 7 Steps to Create a Business Continuity Plan + Webinar ReplayA business continuity plan positions your organization to survive serious disruption. It eliminates confusion common to every disaster, providing a clear blueprint for what everyone should do. More importantly, your business continuity plan supports: Communication between employees and customers.
What is service continuity plan? ›The Information Technology Service Continuity Plan is the collection of policies, standards, procedures and tools through which organisations not only improve their ability to respond when major system failures occur, but also improve their resilience to major incidents, ensuring that critical systems and services do .
Which of these are KPIS relating to IT service continuity management? › KPIs IT Service Continuity ManagementKey Performance Indicator (KPI) | Definition |
---|---|
Implementation Duration | Duration from the identification of of a disaster-related risk to the implementation of a suitable continuity mechanism |
Number of Disaster Practices | Number of disaster practices actually carried out |
ITIL 4 provides a digital operating model that enables organizations to co-create effective value from their IT-supported products and services. ITIL 4 builds on ITIL's decades of progress, evolving established ITSM practices for the wider context of customer experience, value streams, and digital transformation.
What does RTO stand for in BCP? ›RTO (Recovery Time Objective) explained.
Which of the following is an activity of IT service continuity management? ›The Objective of IT Service Continuity Management
To maintain a set of plans on IT service continuity and IT recovery which are in support of the overall business continuity plans. They should also perform business impact analysis, risk analysis, and management activities on a regular basis.
The IT Service Continuity Manager is responsible for managing risks that could seriously impact IT services. He ensures that the IT service provider can provide minimum agreed service levels in cases of disaster, by reducing the risk to an acceptable level and planning for the recovery of IT services.
In which stages of the IT service continuity lifecycle does testing take place? ›