ERT, RTO, RPO… all ducks in a row

 

This article will explain what the following terms mean:

  • RTO (Recovery Time Objective),
  • RPO (Recovery Point Objective),
  • ERT (Estimated Recovery Time).

The terms ERT, RTO, RPO are used in IT’s “Business Continuity and Disaster Recovery policy”, and basically they are time characteristics (measured in seconds, hours etc.) of a disaster recovery of a data system.

In the article RT (Recovery Time) and RP (Recovery Point) definitions are used for defining RTO, RPO, and ERT. This approach may lead to a new understanding of the terms and further discussions.

Disaster Recovery in a Nutshell

A “disaster” here means an event resulting in user’s data loss and/or a period when data was unavailable.

A process of a disaster recovery may be described step by step this way:

  1. data system is “on service”, is ON,
  2. disaster occurs and the system is now “out of service”, is OFF,
  3. recovery is started by user’s request or by an automated request,
  4. recovery is finished,
  5. the system is “on service” again, is ON again.

ERT, RTO, RPO... Disaster Recovery in a Nutshell

Often the moment of the disaster and start of the recovery are considered to start at the same time, but it is not always true.

Say, a user has accidentally deleted a database, but did not notice it, and only the next day the user discovered the loss and sent to the data center a request for a recovery. In this example, there is a big gap between the moment of the disaster and start of the recovery, so these moments are not the same.

We consider “Request for Recovery” and “Start of Recovery” as the same moment because the request here must start the recovery immediately, even if in reality it does not happen.

Recovery Time and Recovery Point

In ERT, RTO, RPO abbreviations

  • RT stands for ‘Recovery Time’, and it is a time interval,
  • RP stands for ‘Recovery Point’, and it means a moment in time.

The RT and RP terms can be used and discussed separately from ERT, RTO, RPO.

ERT, RTO, RPO... RT and RP

  • Recovery Time is a duration of the recovery process,
  • Recovery Point is a point-in-time from which the data system restores the data after the disaster.

Recovery Time

RT (Recovery Time) is a period of time required for a data center to resume its former condition after a request for recovery.

Recovery time can be expressed by the formula:

RT = <time of resuming functioning> – <time of request for recovery>

RT is overall ‘out of service’ time interval, it can include the time for trying to fix the problem without specific recovery efforts, the recovery itself, testing, and the communication to the users.

Recovery Point

RP (Recovery Point) is a point in time from which the data center restores the data after the disaster.

The RP point in time may be defined in different ways, it can be

  • some “astronomical” time, e.g., 01-JAN-2018 10:11:33,
  • measured starting from the moment of the disaster, e.g., “6 seconds before the disaster”.

“Astronomical” time RP is used when considering data unavailability.

The duration of data unavailability time interval can be expressed here by the formula:
<data unavailability time> = <time of resuming functioning> – RP

“Before the disaster” approach of defining RP is used when considering a data loss.

The duration of lost data time interval can be expressed here by the formula:
<lost data time> = RP

ERT, RTO & RPO Definitions

Usually ERT, RTO, RPO are defined without using definitions of RT and RP, but they can be used.

ERT – Estimated Recovery Time 

ERT is the estimated Recovery Time.

The common definition without using RT definition:

The estimated duration for the database to be fully functional after a restore/failover request.

RTO – Recovery Time Objective

RTO is the maximum targeted value for the Recovery Time.

The common definition from the Wikipedia without using RT definition:

The Recovery Time Objective (RTO)… is the amount of time the business can be without the service…

RPO – Recovery Point Objective

RPO is the maximum targeted value for the Recovery Point measured from the time of a disaster.

The common definition from the Wikipedia without using RP definition:
Recovery Point Objective (RPO)… is the maximum targeted period in which data might be lost from an IT service…

Advantages of Using RT and RP Definitions

If using RP and RT definitions it can be seen more clearly that ERT, RTO, RPO terms are about only planned and theoretical characteristics but not about actual ones. Actual values of RT and RP can be measured after actual disasters, and some averages and other statistical characteristics of the values can be calculated. Having, say, ARP (Average Recovery Point), ART (Average Recovery Time) etc. would have been very useful for data centers evaluation.

 

Leave a Reply

Your email address will not be published. Required fields are marked *