MWS service search Read timed out

Hi Team,

We have facing read time out issue in our production server where WM 9.7 IS and MWS on OS AIX has installed
with version details 9.7.0.0002-0346 and Installed fixes MWS_9.7.0_Fix2 but while searching B2B services in MWS navigation Application>Monitoring>Integtraion>Services

continuously getting [RID:333553] - Read timed out: java.net.SocketTimeoutException: Read timed out

I have tried to view logs in problems.log/error/audit/_full.logs didn’t find much information other than
[RID:333553] - Read timed out: java.net.SocketTimeoutException: Read timed out
Even i have check IS which is configured with MWS and IS home page for package WmTaskClient Socket Timeout (milliseconds) # 60000 ms

I can confirm IS server and other configuration look good and even IS servers connected to
Administration>Systemsettings>TaskEngine>IS servers & Administration>Systemsettings>Servers are connected properly, without any issue

I have tried to search this issue in forum but didn’t find much information to fix my on going issue.

Kindly guide or provide suggestion steps to resolve this Issue

What is the external db used? Oralce/MS SQL?

What is the time-out settings at MWS sysadmin -> Folders > Administrative Folders > Administrative Dashboard > Configuration > CAF Application Runtime Configuration > wm_mon_services > Environment Entry / wsclient-socketTimeout

Increase the wsclient-socketTimeout to larger value. Let me know the status of the issue.

Thanks for quick response.

my MWS is configured with Oracle 11g

While i navigate via sysadmin login in MWS
MWS sysadmin -> Folders > Administrative Folders > Administrative Dashboard > Configuration > CAF Application Runtime Configuration >

I didn’t find anything inside CAF Application Runtime Configuration and it says “configure Global Defaulter” that all

we or developers is not using/build any CAF apps but still can you please help me
what parameter do i need to add inside CAF Application Runtime Configuration

This is missing --> wm_mon_services > Environment Entry / wsclient-socketTimeout

Added screen shot for more information

can you please tell me what all parameters are missing and what are mandatory ?

Go to MWS- > sysadmin -> Folders > Administrative Folders > Administrative Dashboard > Configuration > CAF Application Runtime Configuration >

Search for wm_mon_services and then go to Environment Entry / wsclient-socketTimeout

Screen shots for above please.

Hi Mahesh,

I have already attached screen shot for same in above thread,that was exactly configuration present in my MWS in PRD Env

Configuration > CAF Application Runtime Configuration >

Can you please verify

Hi Rmg/Mahesh/Wang/Holger,

Do we have MWS expert, I am awaited for MWS PROD read time out issue in WM 9.7 Env.

Can anyone please provide resolution steps?

See my attachment, do you see this.

Hi Mahesh,

I have already updated wsclient-socketTimeout # 180000 but it’s didn’t help to resolve on going read time out exception. Is there any other pointers/steps you like to suggest?

Hi Rajiv,

are IS and MWS on the same box or on separate boxes?

Can you do a traceroute (tracert on Windows) from IS to MWS and vice versa?
This will show if there are some network based latency issues.

Can you check the Home-Page of WmMonitor-package?
This might contain a timeout setting.

What is the Fix-Level for Monitor Package and MonitorUI?

Are there any entries in the IS server.log which might be related to this timeout?

Regards,
Holger

Hi Holger,

IS & MWS reside on same AIX box and has MWS Installed fixes MWS_9.7.0_Fix2 and package level fix MON_9.7_Fix0001 with version 9.7

In WmMonitor home page, we can’t see time out setting but it has.
MWS Host,MWS Port,MWS Transport
Database Retries
Enable Data Level Security
Data Level Security Administrator
Reporting Batch Size
Enable JDBC Archive for Server, Service, Activation, Document
Add ‘My webMethods Users’ role to ‘MonitorUsers’ ACL

i have checked mws error/_full/problem and check IS server logs but didn’t find anything other than read Time out exception

CDT (jsf:INFO) [RID:333553] - Read timed out: java.net.SocketTimeoutException: Read timed out
(jsf:INFO) [RID:328088] - Read timed out: java.net.SocketTimeoutException: Read timed out
T (jsf:INFO) [RID:390993] - Read timed out: java.net.SocketTimeoutException: Read timed out
(Framework:INFO) [RID:391001] - Request [1n4qhvdwt87c2esbofd0i4k9r:Administrator] http://localhost:8585/webm.apps.mon.integration.service.list (POST)
(Framework:INFO) [RID:391009] - Request [1n4qhvdwt87c2esbofd0i4k9r:Administrator] http://localhost:8585/webm.apps.mon.integration.service.list (GET)
(jsf:INFO) [RID:391009] - Read timed out: java.net.SocketTimeoutException: Read timed out
(Framework:INFO) [RID:391010] - Request [1n4qhvdwt87c2esbofd0i4k9r:Administrator] http://localhost:8585(GET)

One thing i have notice in my MWS -->
My webMethods --> System Setting --> we have configured 3 ISs including IS/MWS which is running on same AIX box even all IS are up and running where as System Setting --> Task Engine we have configured IS so this IS and MWS are running on same AIX Box.

Can you do a traceroute (tracert on Windows) from IS to MWS and vice versa?
IS and MWS installed on AIX Unix OS so do you know any step or cmds to perform? Or we have to do on this tracert in windows only ?

Hi Rajiv,

when all instances are running on the same box traceroute will not be helpful for thhis investigation.

You would have to loging to the AIX system with your application user (used for running the instances).

The command would be "traceroute " for *nix and "tracert " for Windows (i.e. Command Prompt).

For the Fix-Level:
Extract from MON_9.7_Fix4:

You should check for the latest Fixes:

  • IS_9.7_Core_Fix5
  • MWS_9.7_Fix8
  • MON_9.7_Fix4
  • MON_9.7_MWS_Fix6

Regards,
Holger

Hi Holger,

Thanks for follow up and kind suggestion.

  1. My IS+ MWS both are reside on same AIX box so i don’t think it will create slowness/performance issue.
    I tried traceroute to my host name and i found data/ping looks excellent.

  2. I will do more analysis on MWS Fixes which you suggested and this will be my last option.

  3. How about looking at the MWS configured DB table schema, where i found that when ever my L2 team do search on services in MWS users never provide exact info but instead of that they will give field/unique identifier eg Invoice no/order no so i believe this will create complex or dynamic SQL query to fetch from DB level.

I looked insight DB and MWS schema where data gets stored and found table # WMSERVICEACTIVITYLOG where all service data get stored and found it’s has index on table column # MSGID,CONTEXTID,PROCESSSTEPCONTEXT
and try to get count on this table,found count close to # 12447185.

My Queries:-

  1. Do you think it’s advisable to keep this many records/data in PROD DB any suggestion?
  2. How about indexing on DB column as you see while searching services details in MWS we/users never search with already index columns ie MSGID,CONTEXTID,PROCESSSTEPCONTEXT but instead they go for advance search where field map show FULL Message column with column size 1024 bytes so i don’t think we can index on FULL MESSAGE column Any suggestion on this ?

May be due to this MWS throwing read timeout exception?

Hi Rajiv,

it is hard to give a final answer about how many records should be kept in the WMSERVICEACTIVITYLOG table.

We have build a Flow service which does archiving by build in services from WmMonitor (pub.monitor.archiving:*)and run this on a regular basis via scheduler in IS.

The number of days to retain is configurable via custom properties file.

You will need a separate archive schema in your database for this.
You will have to grant some rights your main schema to archive schema (SELECT, DELETE).
Inside the archive schema in the table OPERATION_PARAMETERS update the records IS_SCHEMA and PROCESS_SCHEMA with the name of your main schema.

See Monitoring Guide for further informations.

To check the SQL executed upon search set Logging Factory for “Monitor (Database Layer)” to Trace and do the search. After that you will find the SQL in the Server-Log.
When running the SQL in a standalone database client (i.e. Oracle SQL Developer for Oracle databases) there is usually a function called “Execution Plan” which will show you how the database interprets the statement.

With this information you can try to create additional indices for simplifiying the Execution Plan.
Most likely when joins are performed over columns not indexed there is potential for doing so.
Not all columns contain data for which indexing is recommended.

Remember to reduce the Logging back to normal value after you have captured the SQL statements as this will create large log-files.

Regards,
Holger

Hi Holger,

We have scheduler to achieve data on regular basis but still we need to keep max 6 month old data in DB

I have check DB level it’s not indexing on all table but mention on above thread it’s set on specific db column only but still there is scope to fine tune on index or we need to find db analyse tool which can give exact info to optimize on indexing on which fields it’s required
DO you any such tool?

We do have all setting present at DB level such granting on main schema to archive schema (SELECT, DELETE) and even we have separate schema for same

I have increase logging and find the exact query we hit from mws

SELECT DISTINCT t13.ROOTCONTEXTID , t13.PARENTCONTEXTID , t13.CONTEXTID , t13.SERVICENAME , t13.SERVERID , t13.USERID , t14.FIRSTSTATUS , t14.FIRSTSTATUS FIRSTSTATUSDECODE, t14.FIRSTTIME , t14.FIRSTTIME FIRSTTIMESTRING, t14.LASTSTATUS , t14.LASTSTATUS LASTSTATUSDECODE, t14.LASTTIME , t14.LASTTIME LASTTIMESTRING, t14.DURATION , t13.CUSTOMCONTEXTID , t11.FULLMESSAGE FROM WMSERVICE t13, WMSERVICE_MIN_MAX t14, WMSERVICEACTIVITYLOG t11 WHERE t14.CONTEXTID = t13.CONTEXTID AND t13.AUDITTIMESTAMP = t14.LASTTIME AND t13.CONTEXTID = t11.CONTEXTID AND (UPPER(t11.FULLMESSAGE) LIKE ‘%ABCD%’ AND (t14.LASTTIME >= ? AND t14.LASTTIME <= ?)) ORDER BY t14.LASTTIME DESC

There are lots of relation will perform while hitting this sql query from mws level.

DO you suggest any exact fixes for this issue or indexing is only way to resolve this issue?

Hi Rajiv,

try the statement in SQL Developer and check the execution plan for it.
This will show how the statement is interpreted and how much each part costs.

We have added the following indices to both IS- and Archive-Schema to speed up the archiving process:


create index wmservice_1ix on WMSERVICE (ROOTCONTEXTID) tablespace WEBMINDX parallel nologging;
create index wmprocess_1ix on WMPROCESS (ROOTCONTEXTID) tablespace WEBMINDX parallel nologging;
create index wmserviceactivitylog_1ix on WMSERVICEACTIVITYLOG (ROOTCONTEXTID) tablespace WEBMINDX parallel nologging;
create index WMPD_DEPLOYMENTTIME on WMPROCESSDEFINITION (DEPLOYMENTTIME) tablespace WEBMINDX parallel nologging;

You might want to add indices for the CONTEXTID the same way.

This will ease the evaluation of the joins “t14.CONTEXTID = t13.CONTEXTID” and “t13.CONTEXTID = t11.CONTEXTID”.

After doing so recheck the statements execution plan and compare them.

Regards,
Holger

Hi Holger,

I don’t have much expertise on sql query and even my db team raise there hand and ignore to touch this query and told me this is product tables which has written scripts so nothing can be done.

can you do analysis and tell me below sql execution plan and indexing looks good as per your knowledge

Hi Rajiv,

i have checked the statemement against one of my QA environmnents.

Please find the execution plan attached.

The biggest pain point seems to be the full table scan for the Upper(FullMessage)-part of the statement, by I doubt that there is much which can tweaked here.

Regards,
Holger

Hi Holger,

Is it possible for you to past following things?

  1. SQL query for this execution plan
  2. Please let me know what all changes you made or how you tweak this sql query so that i can sync with by db team

Appreciate your response.