As I discussed previously, the most important thing for any database is the database itself. To secure the data properly, you need to backup it regularly with a proper backup plan.
Let’s say you hit a disaster and the last proper backup is 10
hours old. So, you are going to lose 10 hours’ worth of data even if you can
restore it instantly. How much data you lose depends on which 10 hours you are
dealing, your most busy business time, or a relatively less busy hour? This is what
we call the Recovery Point Objective.
Now come to restoring the 10-hour old backup that you have
in your hand. How much time you need to get to the last proper backup? How much
time it takes for restore? How much time for changing the App connection string and other configurations? In short, how long the end-users are going to wait
before they can access the database again? This is what we call Recovery Time
Objective.
RPO and RTO are two very important topics for any Database
Management System. Setting RPO and RTO poorly can lead to devastating business
situations. Today we will try to understand RPO and RTO by answering some
questions.
- What are RPO and RTO?
- How can we set RPO and RTO for our business?
- What are the technologies and tools involved in achieving the RPO and RTO we have set?
- How much cost is involved here?
- How can we automate our backups and more importantly, restores properly?
Hang on tight, we are going to cover it one by one.
What are RPO and RTO?
RPO
The recovery point objective is how much worth of data you lose
if you hit a disaster? We often think it's measured in Megabytes. No. It's
measured in time. If your last good backup is 10 hours old, your RPO is 10
hours. You can lose any amount of data based on when the disaster takes place.
RTO
Recovery time objective means how much time you need to restore the last good backup and make you database back online. It's measured in time.
How can we set RPO and RTO for our business?
RPO
You need to ask yourself how much worth of data your
business is comfortable to lose in case of disaster? It’s really the task to
discuss with the business stakeholders, but you need to set it yourself first
before discussing it with them. I am warning you; your business stakeholders
are going to say that they don't want to lose any data at all. Hold on tight, I
am going to show you how you can manage your business stakeholders so that they
can agree to lose some data(which means disagreeing to spend more) or agree to
spend more (which means disagree to lose data), more on that in the later
section.
RTO
You can calculate RTO by answering and summing down the following:
- How much time to get access to the last good backup?
- How much time to move the backup to the destination server?
- How much time needed for restoring the database?
- How much time to make app configuration changes?
Sum all these required times and allow some buffer time,
since you never know what more challenges you are going to face while making
the database back online. Its huge stress, trust me since the end users are
waiting for you.
What are the tools and technologies involved in achieving the RPO and RTO we have set?
For optimal RPO you can use any of the following or a combination of the following:
If your business data is highly valuable and you need very low RPO,
- SQL Server Always On high availability.
- Asynchronous database mirroring
- Go to Cloud, like Azure or Amazon AWS
If you are comfortable with losing some data,
- Find your current RPO with sp_blitzbackup, an open source dba script
- Follow my backup plan article for planning a proper backup strategy Backup SQL Database like Batman
- Maintain backup maintenance script, you can try ola hallengren's automate scripts
- Test your backup script, and I am damn serious about it.
- Implement proper notification whenever a backup fails. You can perform it in SSMS maintenance plan. This can notify the responsible persons when a backup fails.
How much cost is involved?
When you try to have near-zero RPO and RTO, it’s going to
cost you more. Always on can be enabled with synchronous mirroring. Whenever
the primary node fails, you instantly move to the secondary node. SQL Server 2019
have launched with 5 Always on Synchronized nodes, which is cool, since it had
3 nodes in 2017. In always On, you need to configure your App connection string
to Always On listener, see how easy it is to setup connectionstring on Availibility Group
My point is, show your management how much the tools are
going to cost if you want to achieve a good RPO and RTO, so that they can
decide either to spend more money or lose more data.
Automate Backup and Restores
- First use sp_blitzbackup to see what are you currently at
- Use ola hallengren's maintenance script to maintain backups and restores
- Use dbatools.io to perform the same things if you are comfortable with PowerShell.
Comments
Post a Comment