Counter of Festivals

Ashok Blog for SQL Learners and Beginners and Experts

Tuesday 18 December 2012

Backups and Its Types in SQL Server 2005

Recovery Models

In order to begin working on backups, the business needs define a database recovery model. In essence, a recovery model defines what you're going to do with the transaction log data.
There are three recovery models: Full, Simple and Bulk Logged. These are pretty easy to define:
  • Simple in simple recovery mode, the transaction log is not backed up so you can only recover to the most recent full or differential backup.
  • Full – in full recovery mode you backup the database and the transaction log so that you can recover the database to any point in time.
  • Bulk Logged – in bulk logged mode, most transactions are stored in the transaction log, but some bulk operations such as bulk loads or index creation are not logged.
The two most commonly used modes are Simple and Full. Don't necessarily assume that, of course, you always need to use Full recovery to protect your data. It is a business decision. The business is going to tell you if you need to recover to a point in time or if you simply need the last full backup. It's going to define if your data is recoverable by other means, such as manual entry, or if you have to protect as much as possible as it comes across the wire. You use Simple recovery if you can afford to lose the data stored since the last full or differential backup and/or you just don't need recovery to a point in time. In Simple mode, you must restore all secondary read/write file groups when you restore the primary. You use Simple mostly on secondary databases that are not an absolute vital part of the enterprise or reporting systems, with read only access so there isn't a transaction log to worry about anyway. You use Full if every bit of the data is vital, you need to recover to a point in time or, usually in the case of very large databases (VLDB), you need to restore individual files and file groups independently of other files and file groups.
With both Simple and full recovery models, you can now run a Copy-Only backup which allows you to copy the database to a backup file, but doesn't affect the log, differential backup schedules or impact recovery to a point in time. I'll try to drill down on as many of these topics as possible through the article, but not the files and filegroups.

Working with Simple Recovery

Enough talk. Let's get down to running backups. Let's assume that we're in Simple recovery on a small to mid-sized database. I'm going to use AdventureWorks for all the sample scripts. To set it to simple recovery:
ALTER DATABASE AdventureWorks SET RECOVERY SIMPLE
Your simplest backup strategy is to run, at regular intervals, the following SQL Server backup command, which will perform a full backup of the database:
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.BAK'
What's with all the typing you ask? Don't we have GUI tools to handle the work for us? Yes, most simple backups can be performed using SQL Server Management Studio. However, if you want to learn and understand what Management Studio is doing for you, or if you want some fine grained control over what is backed up, how and where, then you're going to have to break out the keyboard and put away the mouse.
The above command will precipitate a basic backup to disk. Most DBAs I know backup to file and then scrape the files onto a tape or some other media. This is because files on disk are simple and quick to recover, whereas media can sometimes be a bit of a pain. For example, we generally have two to three days worth of backups on our file systems for immediate recovery. We only go to the tape systems if we need to run restores for older backups.
What did that command do? It made a copy of all the committed data in the database. It also copied uncommitted log entries. These are used during recovery to either commit or rollback changes that occurred to the data during the backup process.

Copy-only backups

Normally, backing up a database affects other backup and restore processes. For example after running the previous command, any differential backups (a backup that only copies data changed since the last backup) would be using this as the starting point for data changes, not the backup you ran last night. As noted earlier, SQL 2005 introduces a new concept to backups, COPY_ONLY backups, which allow us to keep from interrupting the cycle:
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.bak'
WITH COPY_ONLY;
Already we've found one of those more granular moments when the Management Studio wouldn't help you. If you want a copy only backup, you have to use the command line.

Differential backups

Let's assume for a moment, that we're still in simple recovery, but we're dealing with a larger database, say something above 100 GB in size. Full backups can actually start to slow down the process a bit. Instead, after consultation with the business, we've decided to do a weekly full backup and daily differential backups. Differential backups only backup the data pages that have changed since the last full backup. Following is the SQL backup command to perform a differential backup:
BACKUP DATABASE AdventureWorks
TO DISK = 'C:\backups\AdventureWorks.bak' 
WITH DIFFERENTIAL;
Now, if we had to restore this database, we'd first go to the last full backup, restore that, and then restore the differential backups in order (more on that later).
BACKUP DATABASE Adventureworks
TO DISK = 'C:\backups\AdventureWorks.bak'
WITH INIT;
There are a number of other backup options that I won't be detailing here. Read the books online to see details on BLOCKSIZE, EXPIREDATE, RETAINDAYS, PASSWORD, NAME, STATS, and so on.
You can also run a statement that will check the integrity of a database backup. It doesn't check the integrity of the data within a backup, but it does verify that the backup is formatted correctly and accessible.
RESTORE VERIFYONLY
FROM DISK = 'C:\backups\Adventureworks.bak'

Full recovery and log backups

We've primarily been working on a database that was in Simple recovery mode (this used to be called Truncate Log on Checkpoint). In this mode, we do not backup the transaction logs for later recovery. Every backup under this mechanism is a database backup. Log backups are simply not possible.
However, you've only protected the data as of the last good backup, either full or differential. Let's change our assumptions. Now we're dealing with a large, mission critical application and database. We want to be able to recover this database up to the latest minute. This is a very important point. In theory, since the log entries are being stored and backed up, we're protected up to the point of any failure. However, some failures can cause corruption of the log, making recovery to a point in time impossible. So, we have to determine what the reasonable minimum time between log backups will be. In this case we can live with no more than 15 minutes worth of lost data.
So, let's start by putting our database in FULL recovery mode:
ALTER DATABASE AdventureWorks SET RECOVERY FULL
Then, on a scheduled basis, in this case every 15 minutes, we'll run the SQL backup command for the transaction log:
BACKUP LOG Adventureworks
TO DISK = 'C:\backups\AdventureWorks_Log.bak';
This script will backup committed transactions from the transaction log. It has markers in the file that show the start and stop time. It will truncate the log when it successfully completes, cleaning out from the transaction log the committed transactions that have been written to the backup file. If necessary, you can use the WITH NO_TRUNCATE statement to capture data from the transaction log regardless of the state of the database, assuming it's online and not in an EMERGENCY status. This is for emergencies only.
Note that we are not using the INIT statement in this case, but you can do so if you choose. When doing log backups, you've got options:
  1. Run all the backups to a single file, where they'll stack and all you have to do, on restore (covered later), is cycle through them.
  2. Name the backups uniquely, probably using date and time in the string.
In that latter case, safety says, use INIT because you're exercising maximum control over what gets backed up where, and you'll be able to know exactly what a backup is, when it was taken and from where based on the name. This is yet another place where operating backups from the command line gives you more control than the GUI. We've used both approaches in our systems for different reasons. You can decide what is best for your technology and business requirements.
Most of the options available to the database backup are included in Log backup, including COPY_ONLY. This would allow you to capture a set of transaction data without affecting the log or the next scheduled log backup. This would be handy for taking production data to another system for troubleshooting etc.
If you have your database set to FULL Recovery, you need to run log backups. Sometimes, people forget and the transaction log grows to the point that it fills up the disk drive. In this case, you can run:
BACKUP LOG Adventureworks WITH NO_LOG;
Attaching NO_LOG to the log backup, and not specifying a location for the log, causes the inactive part of the log to be removed and it does this without a log entry itself, thus defeating the full disk drive. This is absolutely not recommended because it breaks the log chain, the series of log backups from which you would recover your database to a point in time. Microsoft recommends running a full backup immediately after using this statement. Further, they're warning that this statement may be deprecated in a future release.

Restoring Databases

As important as SQL Server backups are, and they are vital, they are useless without the ability to restore the database.

Restoring a full database backup

Restoring a full database backup is as simple as it was to create:
RESTORE DATABASE Adventureworks
FROM DISK = 'C:\Backup\AdventureWorks.bak';
It's really that simple – unless, as we we are backing up everything to a file as if it were a backup device. In that case, you'll need to specify which file within the "device" you're accessing. If you don't know which file, you'll need to generate a list:
RESTORE HEADERONLY
FROM DISK = 'C:\Backup\Adventureworks.bak';
This will give you the same list as I showed above from Management Studio. So now, if we wanted to restore the second file in the group, the COPY_ONLY backup, you would issue the following command:
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backup\Adventureworks.bak'
WITH FILE = 2;
Unfortunately, if you're following along, you may find that you just generated this error:
Msg 3159, Level 16, State 1, Line 1
The tail of the log for the database "AdventureWorks" has not been backed up.
Use BACKUP LOG WITH NORECOVERY to backup the log if it contains work you do
not want to lose. Use the WITH REPLACE or WITH STOPAT clause of the RESTORE
statement to just overwrite the contents of the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
What this means is, that your database is in full recovery mode, but you haven't backed up the "tail of the log", meaning the transactions entered since the last time you ran a backup. You can override this requirement if you change the previous syntax to:
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
REPLACE;
That's the first time we've stacked the WITH clauses (WITH FILE=2 and WITH REPLACE is represented as WITH FILE=2, REPLACE), but it won't be the last. Read through the books online. Most of the WITH clause statements can be used in combination with the others.
What happens if we want to restore to a different database than the original? For example, we want to make a copy of our database from a separate backup. Maybe we want to move it down to a production support server where we are going to do some work on it, separate from the production copy of the database. If we take the simple approach, well, try this:
RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;
In this case, you should see a whole series of errors relating to files not being overwritten. You really can create new databases from backups, but if you're doing it on a server with the existing database, you'll need to change the location of the physical files using the logical names. In order to know the logical names of the files for a given database, run this prior to attempting to move the files:
RESTORE FILELISTONLY
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;
This can then be used to identify the appropriate logical names in order to generate this script:
RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
   MOVE 'AdventureWorks_Data' TO 'C:\backups\aw2_data.mdf',
   MOVE 'AdventureWorks_Log' TO 'C:\backups\aw2_log.ldf';

Restoring a differential backup

The last method is to apply the differential backup. This requires two steps. First, we'll restore the database, but with a twist and then we'll apply the differential backup:
RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = ,
   NORECOVERY,
   REPLACE;

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\AdventureWorks.bak'
WITH FILE = 3;
Most of this is probably self-explanatory based on what we've already covered. The one wrinkle is the inclusion of the NORECOVERY keyword. Very simply, during a restore, transactions may have started during the backup process. Some of them complete and some don't. At the end of a restore, completed transactions are rolled forward into the database and incomplete transactions are rolled back. Setting NORECOVERY keeps transactions open. This allows for the next set of transactions to be picked up from the next backup in order.
We're mainly dealing with simple backups and restores in this article, but a more advanced restore in 2005 allows secondary file groups to be restored while the database is online. Its primary file group must be online during the operation. This will be more helpful for very large database systems.

Restoring SQL Server databases to a point in time

Restoring logs is not much more difficult than the differential database restore that we just completed. There's just quite a bit more involved in restoring to a moment in time. Assuming you're backing up your logs to a single file or device:
RESTORE HEADERONLY
FROM DISK = 'C:\Backups\Adventureworks_log.bak';
Otherwise, you simply go and get the file names you need. First run the database restore, taking care to leave it in a non-recovered state. Follow this up with a series of log restores to a point in time.
RESTORE DATABASE AdventureWorks FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 1,
   NORECOVERY,
   REPLACE,
   STOPAT = 'Oct 23, 2006 14:30:29.000';

RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 1,
   NORECOVERY,
   STOPAT 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 2,
   NORECOVERY,
   STOPAT 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 3,
   NORECOVERY,
   STOPAT 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 4,
   STOPAT 'Oct 23, 2006 14:30:29.000';
Now what we have is a database that is up to the exact, last committed transaction at 14:30:29 on the 23rd of October. Remember, during multi-step restores such as this, you have to leave the database in a recovering status. That means appending NORECOVERY to each statement until you've completed the restore process. If for some reason you've added NORECOVERY to all your statements, or you simply stop in the middle, and would like to bring the database back online, you can use this statement to complete the process:
RESTORE DATABASE Adventureworks
WITH RECOVERY;

Database snapshots

SQL Server 2005 introduced the concept of a snapshot, or a read-only, static view of a database. Snapshots are primarily created in order to supply a read-only version of a database for reporting purposes. However, they do function in a similar way to backups. The one primary difference is that all uncommitted transactions are rolled back. There is no option for rolling forward, capturing logs, etc., that backups provide, nor are very many SQL Server resources used at all. Rather, disk technology is used to create a copy of the data. Because of this they are much faster than backups both to create and restore.
NOTE:
For more details on SQL 2005 Snapshot, please refer to http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/.
A good use of snapshots, in addition to reporting, might be to create one prior to maintenance after you've already removed all the active users (and their transactions) from the system. While snapshots don't support the volatility of live backups, their speed and ease of recovery make a great tool for quick recovery from a botched rollout. Snapshots are stored on the server, so you must make sure you've got adequate storage.
The syntax is different because you're not backing up a database; you're creating a new one:
CREATE DATABASE Adventureworks_ss1430
ON (NAME AdventureWorks_Data,
FILENAME 'C:\Backups\AdventureWorks_data_1430.ss')
AS SNAPSHOT OF AdventureWorks;
Now it will be accessible for read-only access. Since we're primarily concerned with using this as a backup mechanism, let's include the method for reverting a database to a database snapshot.
First, identify the snapshot you wish to use. If there is more than one on any database that you're going to revert, you'll need to delete all except the one you are using:

DROP DATABASE Adventureworks_ss1440;
Then you can revert the database by running a RESTORE statement (mixed metaphors, not good):

RESTORE DATABASE Adventureworks
FROM DATABASE_SNAPSHOT Adventureworks_ss1430;  
That's it. On my system, running the database snapshots of Adventureworks took 136 ms. The full backup took 5,670 ms. The restore of the snapshot took 905ms and the database restore took 13,382ms. Incorporating this into a production rollout process could result in significant benefits
Again, it's worth noting that there are some caveats to using the snapshot. You have to have enough disk space for a second copy of the database. You need to be careful dealing with snapshots since most of the syntax is similar to that used by databases themselves. Last, while there are snapshots attached to a database you can not run a restore from a database backup of that database.

Best practices

The manner in which you perform database backups should not be a technical decision. It should be dictated by the business. Small systems with low transaction rates and/or reporting systems that are loaded regularly will only ever need a full database backup. Medium sized systems and large systems become dependent on the type of data managed to determine what types of backup are required.
For a medium sized system, a daily backup with log backups during the day would probably answer most data requirements in a timely manner.
For a large database the best approach is to mix and match the backups to ensure maximum recoverability in minimum time. For example, run a weekly full backup. Twice a day during the week, run a differential backup. Every 10 minutes during the day, run a log backup. This gives you a large number of recovery mechanisms.
For very large databases, you'll need to get into running filegroup and file backups because doing a full backup or even a differential backup of the full database may not be possible. A number of additional functions are available to help out in this area, but I won't be going into them here.
You should take the time to develop some scripts for running your backups and restores. A naming convention so you know what database, from which server, from which date, in what specific backup and format will be very conducive to your sanity. A common location for backups, log, full or incremental, should be defined. Everyone responsible should be trained in both backup and recovery and troubleshooting the same. There are many ways of doing this, but you can find a few suggestions in Pop backs up and Pop Restores.
The real test is to run your backup mechanisms and then run a restore. Then try a different type of restore, and another, and another. Be sure that, not only have you done due diligence in defining how to backup the system, but that you've done the extra step of ensuring that you can recover those backups. If you haven't practiced this and documented the practice and then tested the document, in effect, you're not ready for a disaster.

Summary

Backups within your enterprise should be like voting in Chicago, early and often. Setting up basic backups is quite simple. Adding on log backups and differentials is easy as well. Explore the options to see how to add in file and file group backups and restores to increase the speed of your backups and restores both of which will increase system availability and up time. Keep a common naming standard. Be careful when using snapshots, but certainly employ them. Store your files in a standard location between servers. Practice your recoveries. Finally, to really make your backups sing,  It offers high-performance compression and network resilience to make the process of writing or copying backups across flaky networks fault-tolerant.


Note: The source code bundle contains the SQL Backup scripts described in this article. To obtain the scripts, simply click the "CODE DOWNLOAD" link in the box to the right of the article title.

No comments:

Post a Comment