Parameter WRITE_LIMIT

Hello all,

we got performance problems on one of our Adabas databases. It seems like the bottleneck is the throughput of our hard disk storage.
One idea is to try out some different settings of the Adabas parameter WRITE_LIMIT. Our software ag technican (he’s on vacation at the moment) said that we need downtime to change this parameter. After reading the documentation of ADAOPR I think that we can change WRITE_LIMIT whenever we want.
So my question is: Is there any other reason for a shutdown/restart?

We’re using Adabas 6.1.10 on Solaris.

regards

Matthias

Yes, you can change WRITE_LIMIT on the fly, verify the setting with DISPLAY=DYNAMIC_PARAMETERS (or short DI=DYN).

In that scenario there is absolutely no reason to bounce the database,
unless you also want to change any of the static buffer size parameters.

Thank you for answering that quick.

I changed the parameter from 5 to 8. As you wrote you can see the result via adaopr di=bp on the fly.

The parameter WRITE_LIMIT was originally set to 0 (=AUTO). AUTO has chosen a value of 23 resulting in a 2 minute lasting bufferflush that slowing down the user’s dialog (nearly freezing it). So we set the value to 5 and the flush takes only a few seconds.

But now this is slowing down the night batches (because the number of physical writes is increased). So from my point of view there are two possible solutions:
1.) Making a compromise between dialog-time and night batch - maybe the new value 8
2.) Setting WRITE_LIMIT to different values. For example:
6am to 6pm → WRITE_LIMIT=5
6pm to 6am → WRITE_LIMIT=15

What do you think?

Hi Matthias,

I experienced the same in one environment. Users were getting rsp-224 (wcp timeout). Their commands was delayed because of buffer flushes blocking the disks for these commands.

This only happened in development environment and not in production although the database (size and parameters) almost should be identical. I put it aside to be related to a poorer performing IO system in development.

This was under Windows.

You have to consider WRITE_LIMIT with the size of your bufferpool because it is a percentage.

Best regards,
Mogens

Hi Mogens!

Thank you for sharing your experiences.

You’re right. adaopr di=bp shows percentage + size in Byte …

There’s another parameter which has an influence on buffer flush: BFIO_PARALLEL_LIMIT. We set it to 20 (recommendation by software ag) - while having 30 adabas threads.

regards

Matthias

BFIO_PARALLEL_LIMIT=20 doesn’t mean it’ll use 20 threads for the bufferflush :wink:

Out of curiosity, what’s your bufferpool size ?

Output of adaopr di=bp

                       ADANUC Version 6.1.10.10
        Database 11    Buffer Pool Statistics   on 30-JUL-2011 23:27:59


Buffer Pool Size   :   536,870,912

Pool Allocation                        RABNs present
---------------                        -------------
Current     ( 99%) :   536,818,688     ASSO               :        19,761
Highwater   ( 99%) :   536,845,312     DATA               :        50,914
                                       WORK               :             0
                                       NUCTMP             :             0
                                       NUCSRT             :             0

I/O Statistics                         Buffer Flushes
--------------                         --------------
Logical Reads      : 2,104,394,809     Total              :         6,078
Physical Reads     :    23,791,705     To Free Space      :             0
Pool Hit Rate      :            98%
                                       Write Limit  (  8%):    42,949,600
Physical Writes    :     9,550,866     Modified     (  5%):    29,626,368


%ADAOPR-I-TERMINATED,   30-JUL-2011 23:28:01, elapsed time: 00:00:00
$ adamon db=11 interval=1
%ADAMON-I-STARTED,      30-JUL-2011 23:28:53, Version 6.1.10.10 (Solaris 64Bit)

Database 11, startup at 23-JUL-2011 18:57:27
ADANUC Version 6.1.10.10, PID 2191




Commands         I/Os per sec      Throw   Buffer pool
per sec      ASSO DATA WORK PLOG   backs   Hit  Flushs
------------------------------------------------------
       0        0    0    0    0       0   100%      0
    1201        0   20   41   41       0    99%      0
    2798        1   44  100   95       0    99%      0
    2519        0   41   91   85       0    99%      0
    2216        2   19   78   76       0    99%      0
    2370        1   21   80   79       0    99%      0
    2725        0   50   91   91       0    99%      0
    2400        1   20   80   81       0    99%      0
    1230       37 5234   43   41       0    99%      1
    1890        1   18   66   64       0    99%      0
    2730        1   35   91   94       0    99%      0
    2117        1   13   71   71       0    99%      0
    2400      107  264   85   81       0    98%      0
    2321        1   21   79   78       0    99%      0
    2744        0   43   91   91       0    99%      0
    2366        1   19   81   80       0    99%      0
^C

Summary (measurement time: 00:00:15)

                 Totals    Ratio per sec
----------------------------------------
Commands     :    34027             2268
ASSO I/Os    :      154               10
DATA I/Os    :     5862              390
WORK I/Os    :     1168               77
TEMP I/Os    :        0                0
PLOG I/Os    :     1148               76
Throwbacks   :        0                0
Buffer Hit   :       99%
Buffer flushs:        1
  Write Limit:        1

%ADAMON-I-TERMINATED,   30-JUL-2011 23:29:09, elapsed time: 00:00:16

Hi Matthias,

Trawled through my incident back in 2009:

  • - We also experienced longer run times when reducing WRITE_LIMIT.
    - We also had BFIO_PARALLEL_LIMIT considered back then. Originally it was not set but I tried to set to 50 but our tests showed commands still waiting for long time.
    - As we received an error (rsp-224 from WCP) I increased the ADABAS_TIMEOUT to 180 secs (from 90) in WCP to circumvent the problem.
    - Again, the problem only happened in development and not in production.

You should be able to approximately estimate no of IO’s in a buffer flush by calculating blocks in buffer pool with the WRITE_LIMIT. In our case it was approx. 15000 IO’s. This is many.

Basically setting BFIO_PARALLEL_LIMIT should solve this kind of problem by splitting the buffer flushes into smaller chunks letting commands requiring IO come in-between these chunks.

This is the values we run with if it can be of any interest:

                                 dvlp          prod
                                 ----          ----
LBP                              600.000.000   600.000.000
WRITE_LIMIT                      0             30
BFIO_PARALLEL_LIMIT              50            300
WRITE-LIMIT calculated by nuc    70            n/a

Back then we were running ADA5.1.6 – now it is ADA6.1.10.

Hi Mogens!

Thank you for your additional information…

Our WRITE_LIMIT=8. So we got approx. 6000 IOs…

At the moment 8 seems to be a good value for batch processing. But 5 seems to be better for dialog usage…

We got 30 threads. BFIO_PARALLEL_LIMIT is set to 20 - as recommended by software ag.

Our administrator says, the harddisks are still at the limit for ~20 sec. during buffer flush. So our question is: How about setting BFIO_PARALLEL_LIMIT to a lower limit - maybe 10?

I doubt this alone will help sufficiently, you will still need to write out about 40MB per bufferflush.

I’d say try lowering WRITE_LIMIT even further thus increasing the number of
bufferflushes, but making the individual “chunk” that needs to be spit out smaller,
which means the bufferflush IOs will be distributed a bit better.

Hi Mathias,

It don’t think no of threads has any influence regarding this?

I think BFIO_PARALLEL_LIMIT of 20 (or lower) is very low. It should take no time to perform chunks of e.g. 50 (IO’s) and no commands requesting IO should suffer from this. What do you think?

WRITE_LIMIT will not do it alone. You have to set it very low to get the necessary small buffer flushes (can it go below 1 percent - I don’t think?). And then you will also loose the advantage of the buffer pool with regard to reduce the no of writes.

You will still have many IO’s in the buffer flush that other commands will be queued behind. You need to set BFIO_PARALLEL_LIMIT to cut it into smaller chunks.

Agreed, that’s why I said “I doubt this alone will help” :wink:

No, it can’t go below 1 as 0 means “Adabas, please twek it yourself”, ultimately if a combination
of these parameters doesn’t cure it the last resort would be to reduce the bufferpool size to further
reduce the amount of data written with a single bufferflush.

But either way, even with 40MB flushed “at once” I’d think there is a problem in the I/O subsystem
when this stalls the machine for 20 seconds, that sounds too extreme.

We calculated around 15000 IO’s to be performed in a buffer flush (before setting BFIO_PARALLEL_LIMIT). We saw commands hanging for up to 115 secs asuming this was the time for the buffer flush to run:

115 sec / 15000 IO’s = 7,7 ms per IO

Is this bad?

The documentation says http://techcommunity.softwareag.com/ecosystem/documentation/adabas/ada6110os/utils/opr.htm#oprbfioparallellimit

I don’t know it for sure. But I think “other threads” means Adabas threads…

My understanding is: BFIO_PARALLEL_LIMIT = 20 tells adabas to take a maximum of 20 adabas-threads for Buffer flushing. … but I’m not sure. :wink:

You’re right. It sounds strange that a 27MB-Buffer (i.e. WRITE_LIMIT=8 ) takes almost 20 sec. to flush. During that 20 sec, the hard disk are at 100%. Our administrator sent me the following polling of the harddisk-performance (in steps of 10 seconds):
The 4th column is avg. write MB/s
The last column is I/O-percentage

    6.6  283.1  838.4 31046.4  0.0  6.4    0.0   22.0   0  71
    0.0  391.6    0.0 46857.0  0.0 10.0    0.0   25.5   0 100
   27.1  129.7 3467.7 10236.1  0.0  3.7    0.0   23.4   0  42

… it was excactly during a flush. Usual values are ~1MB/s and 5%.

Can anyone explain to me where the overhead comes from? Does the Bufferflush use work, plog or temp?

I don’t think that there is a problem in the I/O subsystem. We got another computer with a separate storage: Same ratio of buffer size and duration of flush.

Good question. The next crazy thing is: The ratio of buffer size and duration is not linear for our Database:

WRITE_LIMIT=5 → ~2 sec.
WRITE_LIMIT=8 → ~16 sec.
WRITE_LIMIT=27 → ~100 sec.

… all measured during dialog time.

This is not my understanding.

A buffer pool flush consists of a number of IO’s let’s say 10000.

BFIO_PARALLEL_LIMIT sets the number of IO’s that is written at the time. If set to 0 (zero), all 10000 IO’s are sent at once. If set to 50, 50 IOs are sent at the time i.e. 200 chunks. This allows IOs from other commands (that is executing in the Adabas threads) to be performed in-between the chunks and not have to wait for all 10000 IOs to be done.

You mentioned earlier that you had 6000 IOs in buffer flush. In 20s this is 3.3ms per IO. Bad … :?:

BTW - How do you messure the duration of the buffer flushes?

It seems to me like the adabas uses it’s threads for the buffer flush. Heres a typical view of our threadtable:

 No     Cmd Count  File  Cmd  Status
 --     ---------  ----  ---  ------
  1    39,118,455     0       Free
  2    39,126,427     0       Free
  3    39,125,781     0       Free
  4    39,123,923     0       Free
  5    39,130,794     0       Free
  6    39,157,760     0       Free
  7    39,077,116     0       Free
  8    39,149,457     0       Free
  9    39,164,027     0       Free
 10    39,106,314     0       Free
 11    39,097,442     0       Free
 12    39,099,225     0       Free
 13    39,089,970     0       Free
 14    39,128,418     0       Free
 15    39,106,669     0       Free
 16    39,131,600     0       Free
 17    39,074,500     0       Free
 18    39,129,085     0       Free
 19    39,141,983     0       Free
 20    39,123,368     0       Free
 21    39,084,364     0       Free
 22    39,107,701     0       Free
 23    39,128,941     0       Free
 24    39,118,600     0       Free
 25    39,117,627     0       Free
 26    39,172,548     0       Free
 27    39,095,743     0       Free
 28    39,115,604     0       Free
 29    39,061,775     0       Free
 30    39,112,318    29   L3  Simple , waiting for DATA / 438214

And here’s a threadtable during buffer flush (before setting BFIO_PARALLEL_LIMIT):

 No     Cmd Count  File  Cmd  Status
 --     ---------  ----  ---  ------
  1    39,113,323     0   ET  Update , active
  2    39,121,563     0   ET  Update , active
  3    39,121,002     0   ET  Update , active
  4    39,118,841     0   ET  Update , active
  5    39,125,835     0   ET  Update , active
  6    39,152,673     0       Free
  7    39,072,027     0   ET  Update , active
  8    39,144,326     0   ET  Update , active
  9    39,159,094     0       Free
 10    39,101,369     0   ET  Update , active
 11    39,092,350     0   ET  Update , active
 12    39,094,080     0   ET  Update , active
 13    39,084,893     0   ET  Update , active
 14    39,123,271     0   ET  Update , active
 15    39,101,597     0   ET  Update , active
 16    39,126,638     0   ET  Update , active
 17    39,069,486     0   ET  Update , active
 18    39,123,684     0   ET  Update , active
 19    39,137,028     0       Free
 20    39,118,211     0   ET  Update , active
 21    39,079,321     0   ET  Update , active
 22    39,102,792     0   ET  Update , active
 23    39,123,987     0       Free
 24    39,113,718     0   ET  Update , active
 25    39,112,834     0   ET  Update , active
 26    39,167,485     0   ET  Update , active
 27    39,090,611     0   ET  Update , active
 28    39,110,459     0   ET  Update , active
 29    39,056,681     0   ET  Update , active
 30    39,107,185     0   ET  Update , active

OK, that’s no proof. But I still think there’s some connection between bufferflush and thread table. Please correct me if I’m wrong.

EDIT: Until now I thought the ETs above are the buffer flush itself. But more likely they are user’s transactions waiting for the buffer flush to be done… :?:

$ adamon db=11 interval=1
%ADAMON-I-STARTED,      03-AUG-2011 14:35:04, Version 6.1.10.10 (Solaris 64Bit)

Database 11, startup at 23-JUL-2011 18:57:27
ADANUC Version 6.1.10.10, PID 2191



Commands         I/Os per sec      Throw   Buffer pool
per sec      ASSO DATA WORK PLOG   backs   Hit  Flushs
------------------------------------------------------
    2471      108  381    2    2       0    97%      0
    2564      264  368   11   10       0    96%      0
    2622      118  257    8    8       0    97%      0
    2888     1052 1627   10   10       0    91%      0
    3239      723 1302   16   14       0    94%      0
    2031      141  379    2    2       0    96%      0
    2508     5198 1248    8    7       0    95%      1  BF_ACTIVE
    2219       43  222    1    3       0    98%      0  BF_ACTIVE
    2676      134  547    5    2       0    96%      0
    2701       60  322    5    3       0    98%      0
    3901       50  311    0    0       0    98%      0
    4866       36  262    0    0       0    98%      0
    3657       66  575    0    0       0    97%      0
...

i.e. between 1.1 and 3.9 seconds.