This application was written over 15 years.
The environment is MF with latest NAT (4.2.5) and ADABAS.
One database (#10 for example) has the FUSER and another database (#20 for example) has all the applications’s files.
The code goes like this (simplified)
PGM1 reads a work file1
gets *DATX
DECIDE on MONTH
VALUE 1
include copycode READFILE ‘FILE1’
ii
ii
VALUE 12
include copycode READFILE ‘FILE12’
END-DECIDE
END-WORK
*
FETCH ‘PGM2’
END
This is copycode READFILE:
READ &1& MULTI-FETCH OF 1000 /* This file has more than 5M records
/* and must be read entirely
ACCEPT IF /* several conditions
WRITE WORKFILE 2
END-READ
The batch job is abending with 3009, subcode 3, command S1 and pointing exactly to the line on PGM1 where the FETCH ‘PGM2’ takes places.
That indicates to me that NATURAL is doing a FIND on FUSER on database 10 to load PGM2, while PGM1 was reading FILE1 on database 20.
This production library has a parm TNAE (I believe) of 2. If I run the same job against a diferent library with TNAE of 3 (still same production environment, same FUSER and application’s files), the job rans fine.
My question is: Is there any rule that force you to create the FUSER on separate database? Performance? Security?
The DBA claims that the code is inneficiente, but the application itself does not require to access data across databases.
He insists that we must to use a 15yrs old “inhouse” solution of adding on PGM1, a callnat to a subp that does direct calls and close database 10 where the FUSER resides.
In 20+ years of SAG experience, I have never had to manage that and when I explained that to him, he said that most definetely every other installation that I have worked at, had the DBAs using some ADABAS and/or NATURAL userexit to manage, which it is exactly my point, where HIM the DBA should address, not the application.
Any feedback will be greatly appreciated.
Hi Carlos;
TNAE just indicates the amount of time that can elapse without an Adabas command being issued. Three seconds versus two is quite a difference.
That indicates to me that NATURAL is doing a FIND on FUSER on database 10 to load PGM2, while PGM1 was reading FILE1 on database 20.
The code you have shown does not indicate this could be happening. The FETCH will not occur until the READ WORK loop is finished.
Question; I understand you simplified the code, but why would you read a workfile to ascertain *DATX. Even assuming there is a date, not *DATX, on a record, you would only be doing a READ WORK FILE ONCE, not a loop.
What would be interesting to know is what has changed in the environment. The system, after all, has been running for 15 years. Whether inefficient or not, it ran. I would ask the DBA what is different with regard to settings in Natural or Adabas. Has the system run with the same versions of Natural and Adabas? There is a USR (do not remember number off hand) that allows you to change data bases.
steve
Another suggestion is to check/change the setting of the Natural profile parameter ADAMODE. I guess that it is set to 0 or 1, while the default is 2. Setting ADAMODE to 2 or 3 will avoid the NAT3009 during FUSER access, because separate Adabas UQEs will be used for user data access and Natural nucleus calls, e.g. system file access to FUSER.
Thank you all, you just confirmed what I already knew, just need to validate and hopeffuly have suggestions.
Steve:
You misunderstod my comments, but Douglas got it. The code it is pretty much this:
READ WORK FILE1 /* JUST FEW RECORDS ON THIS WORKFILE
VALIDATIONS
LOAD ARRAY
END-WORK
Get DATX
DECIDE ON #MONTH / FROM *DATX
VALUE 1
INCLUDE COPYCODE ‘FILE-JAN’
…
…
VALUE 12
INCLUDE COPYCODE ‘FILE-DEC’
END-DECIDE
FETCH ‘PGM2’
END
The copycode:
READ &1&
ACCEPT IF /* VALIDATIONS BASED ON THE ARRAY LOADED FROM
/* WORKFILE 1 READ BEFORE
…
WRITE WORKFILE 2
The copycode is needed because there are 12 files identicals, one for each month, each having 5M records or more, thats why *DATX is used.
The abend is happening EXACTLY on the FETCH ‘PGM2’, because the system does load PMG1 first (database 10), then read 5M records on FILE-&1& (database 20) and then comes back to database 10 to get PGM2.
Last friday after I left, the manager (which does not know SAG products) ask the DBA for explanations.
Today she told that the DBA explained that the design of the database was done SEVERAL “years” ago, before even him the DBA was involved and it would take lot of money/time to redesign it. This is a shop under a lot of pressure from IBM to leave SAG and they would not spend $1.00 on SAG products, even if it would save them $10.00.
Since this a monthly job, both manager and DBA agreed to wait until next month and see what it happens, because the last few times that the job crashed, they did not change ANYTHING, just resubmmited the job on a time with less activivity and more resources.
I will mention Rainer’s suggestion and well and report the results.
Thanks again,
Carlos Siqueira
Hi Carlos;
What still disturbs me is the idea that the program has been running for 15 years, presumably without any problems, then suddenly started to abend with the 3009’s.
By any chance did the problems start to occur when MULTI-FETCH was added?
MultiFetch reduces calls to Adabas. Thus, with a MultiFetch of 1000, there could be a lengthy interval without Adabas calls.
If the DBA is, for whatever reason, reluctant to change TNAE, you might try reducing the MultiFetch to say 100. Yes, this will add 4500 calls to Adabas (assuming 5 million records). However, it might eliminate the 3009’s.
What threw me in your original post was the positioning of the END-WORK. You had it after the DECIDE ON construct. In your last post it is before the DECIDE.
steve
Steve:
It gets better…
The code has been “working” for years, but when the job abends, they just tell operations to restart the job…
Since it is an old code, it did not have the MULTI-FETCH when it crashed previously, I am the one that added the MF and tested in an almost identical environment where it did run fine.
As you might already have guessed, the DBA (which should remain nameless) refuses to consider ANY suggestions.
His answer when I mentioned Rainer’s suggestion?
Software AG recommends us to stay with adamode=0 since we are in an adabas cluster environment.
I did the right thing, found a problem, informed them and tried to provide solutions.
Carlos,
I think that changing the multifetch count will not help, because the NAT3009 occurs for the access to the FUSER system file, but not for access to the user’s data files.
If there is no chance to change either TNAE nor ADAMODE, I hope that you are at least allowed to modify the Natural program. By means of adding an ON ERROR unit you should be able to catch the error:
0010 * ...
0020 * user data processing (READ WORK, READ Adabas data, WRITE WORK)
0030 * ...
0040 FETCH 'PGM2' /* normal continuation if no error occurs
0050 * ...
0060 *
0070 ON ERROR
0080 IF *ERROR = 3009 AND *ERROR-LINE = (0040) THEN
0090 /* we knew it would hit us there
0100 FETCH 'PGM2' /* try again, this time we should succeed
0110 ELSE
0120 WRITE 'Error' *ERROR-NR 'in' *PROGRAM 'in line '*ERROR-LINE
0130 STOP
0140 END-IF
0150 END-ERROR
0160 END
Great idea Rainer!
I will try tomorrow and post the results.
It has become a “me against them” kind of thing.
Now they decide to turn off SYSRDC (which they never knew what it was in the first place until I told them…) because they claim that it is an overhead, even in batch development…
Enough said :roll: