Disaster/Recovery using SMRC

We have traditionally used tape restores at Sungard for executing a disaster/recovery test. Our corporate TCC wants to switch to using SMRC disk copying to a new second data center. SMRC will synchronously write to the production and secondary disk packs, so any write done here will be done there as well.

The plan is that the mirror will be broken and then we will just start up the mainframe at the secondary data center and everything will be working. However, I have concerns about the state of Adabas. Knowing we use the asynchronous buffer flush capabilities of Adabas (so the process of doing physical I/O is not a serial process performed by the nucleus for performance reasons), I think we could not only lose committed transactions but also be in an inconsistent state. I want to fall back to using tape restores to be sure that Adabas has consistency and integrity. We use Adabas Delta Save and have exercised a D/R plan with this strategy successfully multiple times.

I recently posted this question to SAG-L, but since Software AG people are permitted to respond here, I was wondering if someone could let me know if my concerns are reasonable or if, as some had responded on SAG-L, I am overly concerned and Adabas can’t get messed up by this. Some people said that Adabas is able to recover from this state and restore integrity. Someone else said that Adabas can detect changed blocks it lost marked for flushing or flushed from the buffer pool but not yet written to disk, and that in worst case, Adabas won’t stay up but will let me know its inconsistent state (telling me when I have to restore from tape).

Please advise on the best practices of utilizing SRDF for recovering Adabas in a D/R scenario.

Thanks!

Hi Brian,
I do not know SMRC in detail but from the view of Adabas the following is important: if the I/O request to the I/O subsystem gets acknowledged to be o.k. then the I/O should be on the disc – it does not matter if the I/O is copied to several discs (mirroring) or if it is in a non-volatile storage and the physical move is several microseconds later to the disc or whatever. The supplier of the disc has to guarantee this.

Adabas will get problems if there is an outage and the I/Os which where already acknowledged are not on the disc for the autorestart or are only partly there. The big problem there is that Adabas might not immediately find out that the I/O was not done. That means if for example a single block in the Index is incorrect (i.e. only the last part of the block is incorrect) the autorestart might work without abending but the database is inconsistent afterwards. But this problem can happen with or without SMRC and this problem affects than every software running on this system !

For the I/Os which were on-the-way and were it was not acknowledged that they are on disc, for these I/Os Adabas will take care and recover the status correctly. And this is with or without the asynchronous bufferflush.
For the recovery Adabas needs to perform the autorestart and therefore the WORK dataset is needed.

So if you mirror your database you need to mirror ASSO, DATA and WORK and we recommend to mirror the PLOGS as well because you will need the PLOG for example if the mirror is incorrect as well, which should not happen but you never know !

Theoretically you can – if you do not need the SAVE and PLOG for logical recovery, i.e. resetting the file to a certain status – abandon the saving of the database BUT we would never recommend this, because there are so many imponderability in hardware and in software. And it is always better to be on the safe side and to have different options in case of errors.

What you can do is it to think about your SAVE concept and maybe do less Save operations. In the most cases the mirror should work and then the recovery time is only the time what it takes to make the autorestart.

This was now a long explanation ( :roll: ) but I hope it answers all your questions,
Regards,
Uschi

Uschi,

This is great news! Many thanks for your thorough explanation. I am thrilled that Adabas is able to handle appropriate backing out of partial asynchronous buffer flushes as part of its autorestart process!

Our mirroring includes all disk-based datasets, so ASSO, DATA, WORK, PLOG, TEMP, SORT… are all replicated, and EMC does guarantee that the committed write on the production disk will be replicated to the other site.

We do plan to continue with the ADASAV, especially because we often rely on them for onsite restore/recovery processes. We use Delta Save, so tape resources and backup processing are not pain points for us.

I feel more confident that this SRDF process will be sufficient for D/R purposes now, though. And I will have the offsite tapes from the vault just in case something goes wrong.

Many thanks!

Brian