Problem Configuring Parallel Services

Configuring Parallel Services has been difficult.

Response times have fallen apart.

So far we have adjusted the Directory to Data Element Ratios so that, at shutdown, we now have zero cache directory reclaims. The total CPU used by ADABAS has also been reduced by over 2 CPU hours a day.

Does anyone have any suggestions for improving performance?

Does anyone have an idea about the impact to performance when LRDP is set to zero vs the default size of LFIOP?

Is ASYTVS=YES and FMXIO=8 better than the old method of buffer flushes?

If short buffer flushes are better than long buffer flushes, should LFIOP be reduced so that less data is written for each buffer flush?

Will tuning Space Reclaims in Cache Write requests improve performance? How can we reduce the number of Space Reclaims?

Attached is the shutdown report for each of the two nuclei that make up the cluster.
Shutdown.log (202 KB)

Did you consider reducing FMXIO value? Whereas it speeds up buffer flushs it may have negative impact on concurrent read I/Os.

I have looked over your latest statistics and on some previous ones.
I would like to see some statistics before you moved to Parallel Services
I would like to know, whether you still consider your response times unsatisfactory. The latest statistics look okay.

Setting LRDP to zero will increase the number of blocks written to the cache space and could create a small CPU overhead.
The default LRDP size is large and can usually be reduced, but should be kept larger than the high watermark.

Your nuclei have a low update ratio. I suggest that you keep ASYTVS=Yes, but set FMXIO=1.
There is no need to reduce LFIOP (25MB) currently.
Under ASM LFIOP should be kept several times smaller than the clucachesize, which is the case here.
Your bufferflushes work as intended and there seems no problem with them.

Directory reclaims and Space reclaims should not be confused
Directory reclaims are a cause for concern, because valid blocks are invalidated and finding a directory to overwrite consumes CPU.

On the other hand space reclaim in the cache is normal and unavoidable.
You can not really tune it. Once the cache is full blocks get overwritten on a demand basis and the cache will get full sooner or later.
In fact your cache space (1000MB) is very large and could be reduced.
But you always have to ensure that you have a enough directory elements.

This number is calculated

From the number of data elements created out of your clucachesize
(1000 MB) and data element size (in your case 1K). This gives you
1024 000 data elements.

The number of directory elements is now computed from
Dirratio=1 and
Elementratio=1

which means you get as many directory elements as data elements.
If you reduce your cache you would have to change this ratio so that you still have enough directory elements.

Rainer Herrmann
Software AG
Darmstadt

Thanks for the information, and the suggestions.

We are still encountering performance problems, especially with CICS users.

CICS Users get good performance for a couple of hours, in the beginning of the day, and then, performance gets really terrible.

When user response times get bad, the CPU is not running at 100 percent; IBM Workload Manager has ADABAS at Importance Level 1 and Velocity 99. It is not entirely clear.

I’ve attached a Excel spreadsheet was created by our CICS SE. It shows transaction volume and response time for this morning.
.
CICSTransactionVolumeResponseTimesFeb21.xls (85.5 KB)

I’ve reached the limit of attachments I can upload.

If increased I will upload:

  • selected statistics from shutdown reports before Parallel Services was installed.
  • shutdown report from an untuned Parallel Server environment. After some tuning we reduced ADABAS CPU time by over 2 hours in a 24 hour period. Tuning involved eliminating local cache and increasing the number of Directory Elements.

Your limit should now be increased - please try again

We don’t have full shutdown statistics from before Parallel Services. We do have selected statistics from different periods.
ADABASStatisticsBeforeParallelServices.xls (39.5 KB)

A example of Parallel Services where tuning is needed. Our tuning consisted of specifying more Directory Elements (using directory, element ratio parameters) and turning off Local Cache.

“Directory Reclaims” are really costly.

Using Local Cache (CACHE=YES) is probably not going to be a very good idea in a Parallel Services environment. Because: Each nucleus would need its own Local Cache (twice as much memory used with two parallel services nuclei). Each nucleus would need to do its own IO into Local Cache (twice as many reads). Ten (10) times the number of Directory Elements are needed when using Local Cache (a surprise to me).
ADABASShutdownStatsBeforeTuning.txt (422 KB)

Unfortunately your last download arrived here in unreadable format.
But I have looked at some length at all the other data and with the available information these are my tentative conclusions so far:

With ASM parameters set as in your shutdown stats from february,15th I can not see much wrong with ASM itself.
In other words I am not convinced that the performance problems are caused directly by ASM.

What I can see is that many nucleus threads are occupied, which argues that there is an I/O problem.
If this suspicion is correct (I have not enough data), this problems are not cause by writes to PLOG,Work or ASSO and Data (buffer flushes).
There are simply not enough of them.

My suspicion is that the problems come from READ I/Os to ASSO and particularly to DATA.
Note that you get only a full picture of the I/O load if you look at both shutdown statistics.
You can see 55 million DATA reads together over 24 hours.
15 million Data I/Os each go roughly to volume ABPIC8 and ABPIF1.
Over 24 hours this averages more than 170 I/Os per second to each volume.
This will be several times higher during peak periods.
I am not sure without any more data from RMF whether your I/O subsystem is capable of handling this load with good response.

Some more questions:
I suspect that you run significant batch Adabas calls during online period.
If this assumption is true, it is now possible that with ASM batch has a far better chance to get its calls through.
If you have significant batch load during online going to ADABAS (ASM) I suggest:

  1. Put the importance of CICS response time goals above ASM. In other words set importance to 1 for CICS and for ASM to 2. Also what response time goals for CICS did you specify to the workload manager?

  2. If my suspicions are correct about batch during online, you may be well advised to limit or reduce the number of concurrent batchjobs (e.g. initiators etc) during online periods.

You have currently allocated 1200 MB buffer pools and since you have two of them this should be plenty. Your application data just does not seem to be very cache friendly.

The global cache (clucachesize) of 1000MB does not help you very much.
You could reduce this to 250M provided you increase DIRRATIO from 1 to 4.
You are free to specify cache to yes. If you do only make sure that you multiply Dirratio by a factor of 10.

Next versions of ASM will enhance our use of memory in ASM.

Rainer Herrmann