Guide to IT Service Continuity Management (ITSCM) (2024)

IT service continuity management (ITSCM) is a key component of ITIL service delivery. It focuses on planning for incident prevention, prediction, and management with the goal of maintaining service availability and performance at the highest possible levels before, during, and after a disaster-level incident.

The goal of ITSCM is to reduce the downtime, costs, and business impact of incidents by putting effective, standardized processes in place for when those incidents do inevitably occur.

Because without a plan, there are a lot of factors that can slow—or stop—incident recovery. After all, your on-call expert might be responding when they’re bleary-eyed at 3 a.m. They might be out of touch with the code after working on something else for weeks or months. They might panic at the scale of the disaster-level incident. Or they might be the newest member of the disaster recovery team, without as much experience resolving issues.

Having a well-documented, clear plan for service continuity management will help minimize any delays caused by learning curves, time away from the code, disaster panic, or midnight alerts.

ITSCM and ITIL 4

In ITIL 4, service continuity management is a process meant to support business continuity management (BCM). The goal of the process is to make sure services are back up and running within the agreed-upon business timelines after major service disruptions.

ITSCM vs. incident management

ITIL 4 makes a distinction between incident management—which handles incidents at a variety of impact levels—and ITSCM, which is about planning for large-scale disasters.

So, what exactly constitutes a disaster? The answer may be different for each business, but the Business Continuity Institute defines it as: “A sudden unplanned event that causes great damage or serious loss to an organization. It results in an organization failing to provide critical business functions for some predetermined minimum period of time.”

The scale of what we call a disaster, the predetermined minimum time, and the definition of critical business functions are three things each business will need to define and document for themselves.

ITSCM and business continuity management (BCM)

Business continuity management is a process managed outside IT that identifies risks to the business and works to mitigate those risks. Some risks may be IT-related, including disaster-level incidents, and some risks may be outside IT control, such as natural disasters or facility fires.

Since BCM encompasses ITSCM as well as other risk-mitigation processes, it makes sense for IT teams to work closely with the BCM team to create:

A business continuity plan (BCP) that includes plans for prevention and recovery from disaster-level IT incidents
Business impact analyses (BIA) that identify the potential business impact of an IT disaster

ITSCM objectives

From a business perspective, the goal of ITSCM is to reduce the downtime, costs, and business impact of disaster-level incidents. On a more tactical level, objectives include:

Working closely with BCM to protect overall business continuity
Creating and managing plans for IT service continuity and recovery in case of disaster
Working with vendors to minimize the impact of any downtime in their products and services, as it relates to the business
Analyzing risk and impact and revising plans accordingly over time

The ITSCM process

Here at Atlassian, our own continuity plan , is built on the assumption that the process of disaster planning is ongoing, leadership-driven, and thoroughly tested. We are determined to not #@!% our customers . Our process includes planning, communication, clear responsibilities, testing, and continuous improvement.

Planning

The planning process starts with asking high-level questions and then building a plan based on your answers. Starting questions should include:

What is our incident response?
What are the values we’ll follow?
What kinds of disasters do we need to plan for? What are the risks and threats inherent to our business?
What systems do we need to support? Which are critical?
How will we respond in case of each disaster?
Where is the information we’ll need to support and restore critical systems?
How can we centralize that information and simplify restoration processes?
Is the information and process documentation collaborative and reviewable by the teams who will be managing it?

Once you have answers to these questions, the next step is to use those answers to define:

Policies for disaster recovery
Scope of IT responsibilities
Scope of business impact of each risk
Plans and processes for each risk scenario
Personnel and documentation requirements

The key to a successful ITSCM planning phase is documenting and templatizing the resulting plan to make it clear and repeatable. Having assets such as an incident response playbook or other runbooks can be a source of truth and organization to responders during a high-stakes scenario.

In the spirit of ITSCM, a solution with access to a built-in knowledge base —like Jira Service Management powered by Confluence—allows for continuous documentation that allows for revision, optimization and collaboration. That way, responders have access to previous resolution documentation and up-to-date resources.

Clear responsibilities

Who’s responsible in case of disaster? Who’s responsible for maintaining and updating plans, processes, and documentation? ITSCM should always have a clear sense of roles and responsibilities not only for disasters themselves, but for ongoing monitoring and improvement. Using Jira Service Management, responders can tag the appropriate party or person on issues to ensure responsibilities are properly delegated and to facilitate cross-functional collaboration.

At Atlassian, part of our approach is to have regular disaster recovery meetings with our site reliability engineers and our risk and compliance team. They discuss gaps in disaster recovery and identify where additional plans, improvements, assessments, or changes need to be made.

Communication

Openness is a core value at Atlassian and we believe the more informed your organization is about your ITSCM plans, the more effective those plans will be.

Offering flexible communication channels throughout the incident response processallows teams to stay in touchby their preferred method. Jira Service Management integrates multiple communications channels to minimize downtime, such as embeddable status widget, dedicated statuspage, email, chat tools, social media, and SMS.

Not only does communication keep stakeholders on board and help the c-suite stave off panic during a disaster-level incident, but it also allows the team to reach out for help from other teams if needed and mitigate the risk of friction caused by organizational confusion.

Testing

How do you know if your plans work unless you test them? This is a foundational question for ITSCM and the reason that testing and incident management drills are vital to the success of the practice.

Testing can help you identify weak points in your process, unforeseen issues, and where teams may need re-training or better documentation.

Assess and improve

ITSCM is not a one-and-done process. It requires thoughtful planning up front and ongoing training, assessment, and improvement. That’s why we have regular disaster recovery meetings. It’s why we test system backups and run drills on what happens in case of a data center outage or AWS region failure. And it’s why any ITSCM plan worth its salt is a continually monitored, ever-changing thing.

Most companies represent the ITSCM process as a series of steps, but we think it’s more like a circle. Planning should lead to defined roles and responsibilities. From there, the team should communicate across the organization, test and test again, assess, monitor, and improve and, in those improvements, continue to update the plan, further define roles, and continue communicating.

Again, this is where a built-in, collaborative knowledge base comes into play. Knowledge base articles are a valuable resource when it comes to assessment and documentation. Incident postmortem reports are crucial for revision and repair following an incident, but can also act as a longstanding resource for potential problems in the future. Jira Service Management, powered by Confluence, offers a powerful collaborative platform to execute assessment and improvement solutions.

ITSCM roles and responsibilities

In order to effectively plan and implement ITSCM practices across the organization, many businesses appoint a Service Continuity Manager and a Service Continuity Recovery Team.

Service Continuity Manager (SCM)

As the name suggests, the Service Continuity Manager is responsible for overseeing service continuity. This person typically owns the process from A to Z, leading plan development, managing ongoing monitoring and assessment activities, and overseeing plans in action in case of disaster.

This person is typically an experienced, senior-level technical support professional, but may be in a management role and not directly involved with the tech day to day.

Service Continuity Recovery Team

Led by the SCM, this team is responsible for running tests and incident drills and continually improving ITSCM. The team typically includes technical staff, QA professionals or users for testing, and representatives from departments across the organization who are responsible for keeping lines of communication open between ITSCM and their teams.

Why does ITSCM matter?

Organizations with clear plans for disaster recovery will recover quicker and more fully in case of disasters.

ITSCM isn’t about planning for everyday outages. It’s about addressing worst-case scenarios and ensuring that if they happen, they cause minimal disruption to the lives of customers and employees.

Here are three clear benefits of a good ITSCM practice:

If disaster strikes, a good ITSCM plan means essential services will be back up and running quickly.
The organization is always prepared for a major disaster and can react quickly and appropriately.
Everyone across the business understands what will happen in case of disaster and how long they can expect systems to be down.

Discover how ITSCM improves customer service quality and minimizes organizational downtime with Jira Service Management.

Try Jira Service Management free

WHAT IS IT service continuity management Itscm and explain the need? ›

IT service continuity management (ITSCM) is a key component of ITIL service delivery. It focuses on planning for incident prevention, prediction, and management with the goal of maintaining service availability and performance at the highest possible levels before, during, and after a disaster-level incident.

What are the four stages of Itscm? ›

ITSCM process comprises of four stages − Initiation, Requirements & strategy, Implementation, and Ongoing operation.

What are the main benefits of proper IT service continuity management? ›

Benefits of IT Service Continuity Management

Controlled recovery of systems. Reduction of downtime - increased continuity of service to customer. Minimal disruption to Departments business.

What are the 7 steps of continuity management? › 7 Steps to Create a Business Continuity Plan + Webinar Replay

Step 1: Regulatory Review and Landscape. .
Step 2: Risk Assessment. .
Step 3: Perform a Business Impact Analysis. .
Step 4: Strategy and Plan Development. .
Step 5: Create an Incident Response Plan. .
Step 6: Plan Testing, Training and Maintenance. .
Step 7: Communication.

26 Jun 2018 What is the importance of creating service continuity planning? ›

A business continuity plan positions your organization to survive serious disruption. It eliminates confusion common to every disaster, providing a clear blueprint for what everyone should do. More importantly, your business continuity plan supports: Communication between employees and customers.

What is service continuity plan? ›

The Information Technology Service Continuity Plan is the collection of policies, standards, procedures and tools through which organisations not only improve their ability to respond when major system failures occur, but also improve their resilience to major incidents, ensuring that critical systems and services do .

Which of these are KPIS relating to IT service continuity management? › KPIs IT Service Continuity Management

Key Performance Indicator (KPI)	Definition
Implementation Duration	Duration from the identification of of a disaster-related risk to the implementation of a suitable continuity mechanism
Number of Disaster Practices	Number of disaster practices actually carried out

17 Jun 2019 What is ITIL v4 framework? ›

ITIL 4 provides a digital operating model that enables organizations to co-create effective value from their IT-supported products and services. ITIL 4 builds on ITIL's decades of progress, evolving established ITSM practices for the wider context of customer experience, value streams, and digital transformation.

What does RTO stand for in BCP? ›

RTO (Recovery Time Objective) explained.

Which of the following is an activity of IT service continuity management? ›

The Objective of IT Service Continuity Management

To maintain a set of plans on IT service continuity and IT recovery which are in support of the overall business continuity plans. They should also perform business impact analysis, risk analysis, and management activities on a regular basis.

What is a service continuity manager? ›

The IT Service Continuity Manager is responsible for managing risks that could seriously impact IT services. He ensures that the IT service provider can provide minimum agreed service levels in cases of disaster, by reducing the risk to an acceptable level and planning for the recovery of IT services.

In which stages of the IT service continuity lifecycle does testing take place? ›

1 Stage 1 - Initiation.
2 Stage 2 - Requirements Analysis and Strategy Definition.
3 Stage 3 - Implementation.
4 Stage 4 - Operational Management.
5 Invocation. Invocation is the ultimate test of the Business Continuity and ITSCM plans.