Recovery of a transaction log dump takes unexpectedly long


Introduction

In this article I discuss why recovery of a transaction log dump of a SAP ASE database may sometimes take longer than expected and

It is assumed you do have a working knowledge of SAP ASE as a DBA.

An important (the most important ?) part of your job as a database administrator (DBA) is to ensure recoverability of your databases in case of damage. For important databases service level agreements (SLAs) are typically defined that specify the maximum time recovery of a database may take.
To verify that defined recovery procedures are suitable for these defined SLAs many organisations perform regular recovery of their production databases by loading database and transaction log dumps to standby or sandbox systems.
If you don’t do this start it now – at least for those databases you care about.

Sometimes recovery of transaction log dumps that are loaded into such a standby database take very long – much longer than it took to generate the log content at runtime and often too long for existing recovery SLAs.

When checked disk I/O throughput is adequate – so this is not the problem. Also, typically the problem is seen at certain times only while recovery time is adequate at other times.

ASE command REORG REBUILD (or a command ALTER TABLE that forces an internal rebuild of the table) may cause long recovery times if transaction log dumps get loaded that contain log records for these actions.
The time it takes to apply a single transaction log record depends on the type of the log record – some types of log records take much more time than others. The REORG commands generate a lot of time consuming log records, hence total recovery take long time.

The problem is aggravated if multiple REORG commands are being executed in parallel at runtime on a multi processor ASE server. The runtime actions are executed in parallel while database recovery is performed by a single ASE task serially.

To avoid undue long recovery times ins such cases so called cumulative database dumps may be taken and loaded.

Example

A test was carried out on a 16k page SAP ASE server 16.0 SP03 PL08 HF1 on Linux in a virtualised environment.
(Note: Disk I/O throughput was not sufficient for production systems in this environment, but it is assumed that this does not affect overall validity of the test. The test was conducted several times no significant deviations were observed and I/O throughput was always in the same range between all test runs.)

A command REORG REBUILD was executed on a table in a database with no other activity at the time. A full database dump and a transaction log dump was taken before the REORG command. A transaction log dump, a cumulative database dump and another, final transaction log dump was  taken after the REORG command finished. Thus the transaction log dump which was taken directly after the REORG command did contain only log records for the REORG command.

The table contained some 35 mio rows , total used space for the table was roughly 15GB.

Table reorg at runtime:

Table reorg at runtime took roughly 1 hour and 47 minutes.

1> select getutcdate() -------------------------------
Apr 22 2022 8:07AM (1 row affected)
1> reorg rebuild ZZRESB with online
Beginning REORG REBUILD of table 'ZZRESB'.
Non-clustered index (index id = 2) is being rebuilt.
Non-clustered index (index id = 3) is being rebuilt.
(35865088 rows affected)
REORG REBUILD of table 'ZZRESB' in database 'TNT' completed.
1> select getutcdate() -------------------------------
Apr 22 2022 9:54AM

Recovery of the corresponding transaction log dump

Recovery of the corresponding transaction log dump took 1 hour 54 minutes.

1> select getutcdate() -------------------------------
Apr 22 2022 11:27AM (1 row affected)
1> LOAD TRANSACTION TNT FROM '/sybase/backups/TNT.TX.TNT.2022-04-22-095451.000'
...snip Roll forward transaction '$ALTER TABLE ADM ZZRESB ID=1232004389 dbid=4 fid=65' (xactid: Rid pageid = 5880360; row num = 102).
Redo pass of recovery has processed 8 committed and 0 aborted transactions.
Completed REDO pass for database 'TNT'.
Use the ONLINE DATABASE command to bring this database online; ASE will not bring it online automatically.
1> select getutcdate() -------------------------------
Apr 22 2022 1:21PM (1 row affected)

In this case there was no other transactional activity on the database at the time the REORG command was executed. With other ordinary transactional activity of database users or other concurrent REORG commands the time to recover the log contents would have been longer and ratio between runtime and recovery would have been worse.

Recovery using a cumulative dump

A second restore was performed loading the same full database dump followed by loading a cumulative dump which was taken immediately after the REORG command on table. After the load of the cumulative dump the database was recovered to the same point in time as after the load of the transaction log dump.

1> select getutcdate() -------------------------------
Apr 22 2022 2:14PM (1 row affected)
1> LOAD DATABASE TNT CUMULATIVE FROM '/sybase/backups/TNT.CU.TNT.2022-04-22-095540.000'
...snip
Roll forward transaction '$systsflush' (xactid: Rid pageid = 6054903; row num = 184).
Roll forward transaction '$rep_xact_for_dumpdb' (xactid: Rid pageid = 6054903; row num = 193).
Redo pass of recovery has processed 2 committed and 0 aborted transactions.
Completed REDO pass for database 'TNT'.
Use the ONLINE DATABASE command to bring this database online; ASE will not bring it online automatically. 1> select getutcdate() -------------------------------
Apr 22 2022 2:19PM (1 row affected)

Loading the cumulative dump an bring the database to the same state took only 5 minutes in this case.

Conclusion

Cumulative dumps may speed up recovery in case of a database restore significantly. Consider to use cumulative database dumps not only but definitely when executing REORG commands on larger tables on a regular basis.