The performance of using auditing is affected most by the following setting:
watt.server.auditGuaranteed
Assuming you have the audit subsystem configured to write to a database (watt.server.auditStore=database), then setting guaranteed auditing equal to true causes all audit events to be written to a file in the file system (by default /IntegrationServer/audit/data) before being asynchronously written to the database. Setting this to false causes all audit events to be written to an in-memory (read volatile) store before being asynchronously written to the database.
To fully understand the performance issues with the audit subsystem, some explanation of the producer / daemon thread pools is needed…
The Producer Pool
A thread from the producer thread pool is used any time the IS needs to write an audit record. The job of the producer thread is to write the audit record to the intermediate store - either file system (guaranteed auditing on) or memory (guaranteed auditing off). While these threads perform their work, the calling service is in a blocked state. Any time a service needs to write an audit record, it must wait for an available thread in the producer pool.
The Daemon Pool
Threads in the daemon pool are simply responsible for “draining” the intermediate store and writing the entries to the database. This is done in the background, but due to synchronization issues (which I’ll explain in a moment) can have a negative impact on the performance of your services. Draining of the audit store use to be a big problem in 6.0.1 in an even moderately high volume environment - the store would fill up faster than it could be drained. Fortunately in 6.1 a new parameter was introduced:
watt.server.auditFetchSize
This controls how many audit records each daemon pulls from the intermediate audit store. The default is 10 (in 6.0.1 it used to only pull one at a time).
Controlling the Pool Size
The min / max threads that are contained within the Producer and Daemon Pools is controlled by the following settings:
watt.server.auditMinPool
watt.server.auditMaxPool
These settings control the size of both pools - so you cannot have a Producer Pool sized differently from your Daemon pool. However, each is a separate pool. So if Min is 1 and Max is 10 (the defaults), each pool has a minimum of 1 thread and maximum of 10 (they don’t max out at 5 each). If you are ever curious to see how many threads are actually getting used, you can view the audit producer threads in IS Administrator under System Threads. They will show up as AuditProducer0-AuditProducerN. Note that a Producer thread will only show up if there was ever enough concurrent demand for it and once created will not go away until the IS is restarted (i.e. the pool will not be brought back down to the minimum size).
The Synchronization Issue
Okay, now to the real issue with guaranteed auditing. To summarize, we have a Producer Pool which is used by services to write audit events to the intermediate store. We have an intermediate store which is either in memory or in the file system. We have a daemon pool which is responsible for draining the intermediate store and writing the entries to a database.
Let’s suppose we have an IS with guaranteed auditing enabled, 250 server threads and a producer / daemon pool max size of 30. Furthermore, lets suppose that all 250 threads are trying to concurrently write audit records.
Now let’s look at the synchronization that is going on:
- The 250 service threads will all be in competition for the 30 producer threads.
- The 30 producer threads are all in competition trying to write to a single file in the file system - only one thread can write at a time. (In actuality, there can be multiple files, but there is a log file against which all threads are synchronized).
- The 30 daemon threads are also in competition with the producer threads since they are also trying to modify (in this case by removing entries from) the intermediate audit store. This is how a daemon thread can actually negatively impact the performance of a service - even though it would seem that the work they are doing is 100% asyncronous from the execution of your service, there is underlying synchronization going on that can negatively impact the speed of your code.
Increasing the size of the pools may or may not help. Your service may be able to get a producer thread from the pool more quickly, but it will most likely have to wait longer to write its entry to the intermediate store since you have just increased the contention against that file.
Turning guaranteed auditing off, will give you vastly superior performance. In my testing it is almost as fast as having no auditing at all. I’m not 100% sure if this eliminates the synchronization issue (since I don’t know what internal mechanism is being used) or just lessens it due to the raw speed of memory versus disk i/o. But if you make the switch, be sure to monitor the java heap, since all of your audit records are now in a memory store.
However, you need to decide if you can live without the audit records in the event of a server failure. If you are just gathering statistics, then I would definitly turn off guaranteed auditing. If you are relying on the audit records for recovery / resubmission, then you may want to think twice about turning off guaranteed auditing. If you have a requirement for both, then I would consider setting up separate IS instances that are specifically tuned based upon SLA’s and/or auditing requirements.
You may also want to consider turning off logging of “unnecessary” events. Session logging may be a good example of this. Session logging can be turned off by setting watt.server.AuditLog.session=false. This will help eliminate unneeded contention within the audit subsystem.
Lastly, make sure you are only auditing what is necessary. In particular, only log pipelines on error - and only if you plan on actually resubmitting using the logged pipeline. Only log top level services and only if you care about gathering statistics about them. Never log the start of a service - success and error should suffice. For Process Models, make sure that “Enable Resubmission” is only checked where absolutely needed. Most importantly - have solid standards around the proper use of auditing and ensure that all of your developers understand them. You can run some simple queries against the audit tables to see if there are unnecessary things being audited (i.e. audit your auditing)…
Hope that helps.