Parameter WRITE_LIMIT

Exactly - I think this just indicates these commands are queued for the buffer flush to finish.

And in the shot you show above while the buffer flush is running no commands are waiting for IO. This is good. Have you set BFIO_PARALLEL_LIMIT > 0?

But the users are waiting :wink:

Yes. The current value is 20.

Yes, but they would anyway wait for the complete buffer flush to finish. :frowning:
But commands ‘just’ requesting read IOs still can be executed in-between these chunks. :slight_smile:

Today I’ve monitored the other database (different computer, different harddisk but same architecture and scaling).

                       ADANUC Version 6.1.10.10
        Database 1     Buffer Pool Statistics   on  4-AUG-2011 14:01:38


Buffer Pool Size   :   536,870,912

Pool Allocation                        RABNs present
---------------                        -------------
Current     ( 99%) :   536,804,352     ASSO               :        52,193
Highwater   ( 99%) :   536,858,624     DATA               :        34,959
                                       WORK               :             0
                                       NUCTMP             :             5
                                       NUCSRT             :             0

I/O Statistics                         Buffer Flushes
--------------                         --------------
Logical Reads      : 3,760,755,853     Total              :           285
Physical Reads     :    50,174,730     To Free Space      :             6
Pool Hit Rate      :            98%
                                       Write Limit  ( 27%):   150,323,216
Physical Writes    :     1,351,067     Modified     ( 27%):   149,970,944

WRITE_LIMIT is set to 27 (auto) with same buffer pool size. A flush takes ~80 sec. My dialog wasn’t frozen for the complete 80 sec. But it was very slow with response times ~10sec (normally ~0.1 sec.).

So I don’t think there’s a problem with our harddisks or i/o-subsystem …

My recommendation is to increase the buffer pool. The buffer pool is only 512M, this is not much on today’s machines with several GB of memory. You should try to get a bufferpool hit rate of more than 99%. The ADAMON output shows that every second several hundreds, sometimes more than 2000 I/Os are done, also if no buffer flush is active, and this costs performance.

Hello Wolfgang Obmann!

Thank you for your recommendation. Normally we got a Buffer pool hit rate of more than 99%. Examples:

                       ADANUC Version 6.1.10.10
        Database 11    Buffer Pool Statistics   on 12-JUL-2011 11:45:55


Buffer Pool Size   :   536,870,912

Pool Allocation                        RABNs present
---------------                        -------------
Current     ( 99%) :   536,748,032     ASSO               :        32,668
Highwater   ( 99%) :   536,838,144     DATA               :        41,259
                                       WORK               :             0
                                       NUCTMP             :            33
                                       NUCSRT             :             0

I/O Statistics                         Buffer Flushes
--------------                         --------------
Logical Reads      : 1,644,601,452     Total              :         1,007
Physical Reads     :    16,295,330     To Free Space      :            98
Pool Hit Rate      :            99%
                                       Write Limit  ( 27%):   150,323,216
Physical Writes    :     3,581,024     Modified     (  3%):    18,100,224

...

                       ADANUC Version 6.1.10.10
        Database 11    Buffer Pool Statistics   on 19-JUL-2011 07:49:14


Buffer Pool Size   :   536,870,912

Pool Allocation                        RABNs present
---------------                        -------------
Current     ( 99%) :   536,791,040     ASSO               :        48,350
Highwater   ( 99%) :   536,841,216     DATA               :        33,105
                                       WORK               :             0
                                       NUCTMP             :            95
                                       NUCSRT             :             0

I/O Statistics                         Buffer Flushes
--------------                         --------------
Logical Reads      : 1,980,033,185     Total              :         5,992
Physical Reads     :    14,425,071     To Free Space      :             0
Pool Hit Rate      :            99%
                                       Write Limit  (  5%):    26,843,500
Physical Writes    :     9,701,513     Modified     (  1%):     8,773,632
...
                       ADANUC Version 6.1.10.10
        Database 11    Buffer Pool Statistics   on 28-JUL-2011 07:50:59


Buffer Pool Size   :   536,870,912

Pool Allocation                        RABNs present
---------------                        -------------
Current     ( 99%) :   536,269,824     ASSO               :        45,183
Highwater   ( 99%) :   536,845,312     DATA               :        38,077
                                       WORK               :             0
                                       NUCTMP             :            46
                                       NUCSRT             :             0

I/O Statistics                         Buffer Flushes
--------------                         --------------
Logical Reads      : 3,579,667,547     Total              :         3,955
Physical Reads     :    32,178,728     To Free Space      :             0
Pool Hit Rate      :            99%
                                       Write Limit  (  5%):    26,843,500
Physical Writes    :    22,529,577     Modified     (  2%):    14,225,408

But maybe we should try out your recommendation anyway.

I’ve monitored our Database no. 11 for 24 hours using adamon with a 10sec-interval. I got 7784 adamon-lines with no buffer flush. 802 of them ar containing ASSO/DATA-IOs greater than 200, and only 60 of them greater than 1000. So the hundrets and escpecially the thousands can be treated as peak values.

For example: If we double the buffer pool and choose a WRITE_LIMIT of 5, we will surely make our hit rate better. But I think it won’t solve our problem that the user’s dialog freezes during the buffer flush. And that is the real problem.

We have to distinguish between the different situations -

  • Commands that can be executed during the buffer flush:
    read/update commands can be handled in the buffer pool. If an IO is necessary to perform this command, it has to wait for the current chunk to finish which the flush is split into (defined by BFIO_PARALLEL_LIMIT). If BFIO_PARALLEL_LIMIT is not set it has to wait for the complete buffer flush to finish.
    I’m not complete sure if new update commands can be still be executed while a buffer flush is running?

    Commands that have to wait for the buffer flush to finish:
    commit (ET/BT) commands have to wait for the complete buffer flush to finish (defined by WRITE_LIMIT) and cannot come in between the chunks defined by BFIO_PARALLEL_LIMIT.

Please correct me if above assumptions are not correct.

Lowering WRITE_LIMIT seems to result in longer run times for batch jobs. Online may not necessarily suffer from this(?)

No of threads may need be increased not to have all threads blocked by commit commands waiting for a buffer flush to finish.

During a buffer flush Read I/Os compete with the Write I/Os of the buffer flush. Even if BFIO_PARALLEL_LIMIT is set, this means that the Read I/O times are relatively long; and if an application performs several commands performing physical I/Os this can result in long response times. Therefore I would expect that the response times get better, if it is possible to reduce the number of physical I/Os.

A smaller WRITE_LIMIT has the advantage that a single buffer flush is faster, and in case that an autorestart is necessary because of a crash, the autorestart times are smaller, but on the other side the total number of WRITE I/Os may be increased, because updates for the same block which are written only once with a larger WRITE_LIMIT, are written more than once.

I think a number of 30 threads should be enough. Please note that additional threads also mean additional overhead and therefore the performance with more threads usually will not be better.

Documentation says about WRITE_LIMIT: 0 means that Adabas will dynamically choose an appropriate value.

Any idea what Adabas regards as “appropriate value” and what he base his calculation upon?

I don’t know the exact algorithm, but I think the idea is to keep the number of I/Os small:

  • If the WRITE_LIMIT is big the number of read I/Os may increase, because there are too few blocks that can be replaced.
  • If the WRITE_LIMIT is small, the number of write I/Os may increase, because blocks updated more than once may be written in more than one buffer flush.

I don’t know if the used heuristics is really adequate; I think if the database load is not too big, it should be o.k. I see the following problems:

  • Sometimes the database load suddenly changes, for example, a batch program may be started, where the amount of updated blocks is larger than the amount of data that can be written to disk. Then it is necessary that the WRITE_LIMIT is small in order to avoid a buffer overflow.
  • For some customers it is important to keep the downtime after a crash small. This also requires a small WRITE_LIMIT, because the autorestart times depend on the number of blocks in the buffer pool required to be written to disk.

It seem’s like we have to set different WRITE_LIMITs for dialog and batch time. We played with the parameter the past few days. 5 is best for dialog time. 8 is good for batch runs.

The idea is to rise WRITE_LIMIT just before the online backup at 6pm and to set it back to a lower value at 6am. Is this OK? Are there any traps for the unwary?

Thank you for the background information. So we don’t have to change our number of threads…

Very interesting. This would explain the user’s “dialog freeze” and our “threadtable during buffer flush” (see above for an example). But the question is: Is your assumption correct? It would be very helpful if Wolfgang Obmann or Wolfgang Winter can say something about that…

No, this is not correct. If you consider that a buffer flush can have to write thousands of blocks to disk that would cause very long execution times for ETs. For an ET it is required that the WORK and PLOG blocks are written to disk required for an autorestart and ADAREC REGENERATE, and this is done independent of the buffer flush.

Thank you for your statement.

Now it dawns on me. Could it be a buffer flush is slowing down the plogging - especially when ASSO/DATA and PLOG are on the same device? Maybe that’s our real problem :idea:
User are waiting for their ETs because the database is waiting for plog to complete…

Independent of the performance problem, it is generally strongly recommended not to put PLOGs (and backups) on the same device as the database containers (except you use RAID technology, where the PLOGs are required only for handling and software errors, and you often no longer know where the data are stored physically). If the PLOGs are on the same device as some database containers, in case of a disk crash no database recovery is possible: An autorestart of the database is no longer possible, because the database container is stored on the corrupted disk, and a restore/regenerate is no longer possible, because the PLOG is stored on the corrupted disk.

Understood so far. Thanks a lot.

It seems like I have to talk to our system administrator again. He talked about the storage as a RAID-System which can do ~ 40MB/s. During a buffer flush the device has a load of 100%.

But you’re right. It would be better not to narrow down the problem to a pure performance point of view.

To be honest: I’m not a database administrator. And our database-operating was sourced out. But somebody has to do the job. :wink:

Today we narrowed down the problem to END TRANSACTION.

During a buffer flush an END TRANSACTION (for only one stored record) can take up to 12,7 Sec. Normally it takes 0,0 Sec.

This weekend we switch our plog-directory to a different device. Let’s see…

BTW: A BACKOUT TRANSACTION seems to cause no waiting time for the user.

didn’t help :frowning:

Setting the BFIO_PARALLEL_LIMIT to 4 or 40 or 50 didn’t help either.

Next thing I’ll do is to write a short sample program which demonstrates the problem…

Here’s my sample natural program. UNI-U-WKFL is a very simple ADABAS-File with only 4 fields and 1 super-descriptor.

define data local
01 UNI-U-WKFL view UNI-U-WKFL
  02 INIT-ID                 /*   A    8  F
  02 PROGRAM                 /*   A    8  F
  02 SORTIERFELD             /*   A   30  N
  02 DATENFELD   (EM=X(30))  /*   A  120  N
1 #timx  (T)
1 #i1    (I1)
end-define
*
for #i1 = 1 to 100
  callnat 'usr2027n' 1 /* sleep 1 second
  #timx := *TIMX
  init-id := "TEST"
  program := "TEST2"
  compress *TIMX "TEST3........" into sortierfeld leaving no
  compress *TIMX *DEVICE *PROGRAM into datenfeld  leaving no
  store UNI-U-WKFL
  #timx := *TIMX - #timx
  write *timx (EM=HH:II:SS.T) 'duration STORE:' #timx (EM=HH:II:SS.T)
  #timx := *TIMX
  end transaction
  #timx := *TIMX - #timx
  write *timx (EM=HH:II:SS.T) 'duration ET   :' #timx (EM=HH:II:SS.T)
end-for
*
* cleanup
read UNI-U-WKFL by WKFL-SP  /* = INIT+PROG+SORT
= 'TEST'
  if init-id ne "TEST"
    escape bottom
  end-if
  delete
end-read
end transaction
end

Output:

10:45:47.2 duration ET   : 00:00:00.0
10:45:48.2 duration STORE: 00:00:00.0
10:45:48.2 duration ET   : 00:00:00.0
10:45:49.2 duration STORE: 00:00:00.0
10:45:49.3 duration ET   : 00:00:00.1
10:45:50.3 duration STORE: 00:00:00.0
10:46:00.8 duration ET   : 00:00:10.5
10:46:01.8 duration STORE: 00:00:00.0
10:46:01.8 duration ET   : 00:00:00.0
10:46:02.8 duration STORE: 00:00:00.0
10:46:02.8 duration ET   : 00:00:00.0

… guess when the buffer flush took place.

What have you specified for the nucleus parameter UNBUFFERED? If you don’t use the default, try the default. If you use the default, specify ALL.