Hashtable size/capacity to hold data...

Hi,

In our project, we plan to use HASHTABLE as temporary catch during real time update from one db table to other db table(Yes we can do the update from one db to other db through db trigger itself, but this is the requirement i.e need to use webMethods to do the update from one table other table, its bi-directional that too in real time. Since its bi-directional update, wM notification will take the update as done in their db itself, to avoid we use one column in db to differentiate the update who has done, whether the update done by wM or their application).

Could you please tell me how much records can be loaded in HASHTABLE(at a time there may be 1000’s of records will be updated from one table to other, that too, we are planing to use 4 hashtable of different table update operation, each record will contain 40 columns).

I guess this depends on HEAP memory, if am right, how can we measure, the total load HASHTABLE can hold, like maximum record length with each record 40 column.

Had a thought of using database in Wm for temporary entry, which can hold more data, but if we use db once again in wM it may reduce the performance right?

Your valuable views please.

Regards,
Sam

The number of entries is constrained by max int. Other than that, you’ll be constrained by available memory. The hashtable doesn’t store the object itself–just a reference. So the item “stored” in the hashtable can be anything and any size.

I’m not I understand why you’d be gathering records into any temporary structure. Can you elaborate?

Hi Rob,

Could you please clear on this “The number of entries is constrained by max int”, what is ‘max int’?

Reason for maintaining the hashtable is, if update from X table to Y table and Y table to X happened at the same time for the same record from both direction[Bi-directional concurrent data sync], to avoid the clash, am using the hashtable.

Here using startup service will create a HASHtable in memory with no data in begining. If any update happens in X table and Y table, our notifications (each table has individual notification) will pick and invoke a flow service which maintain the get value & put value in hashtable[put and get in synchronized function], the service invoked for the record will make an entry in hashtable if that entry not present in hashtable with one more status(lock flag as N or Y) maintained by wM in hashtable default value for “lock flag”=N, once entry happened in hashtable, will start updating in Y table once done the update it will remove the entry from hashtable only if that “lock flag”=N, as said early before updating in Y table by wM, if any update taken place Y table, other notification will start its function as above, so if it sees the same entry in HASHtable, will changed the column “lock flag”=Y and send mail to support team to do the manual sync for that record.

Hope am clear on the above statement. Please tell me if am not clear on the above statement.

By the way I know HASHTABLE will hold key/value pair(value is an object), but you have mentioned as reference…

Regards,
Sam

The size of a hastable is stored by the hashtable in an int variable. MAXINT is the largest int possible and is a constant.

You’re right that the hashtable stores a new object of a key/value pair. The key is the hash value of the stored object. The value is a reference to the stored object, not the object itself. I mentioned it because you seemed concerned about the size of objects that hashtable could handle–the answer is that the hashtable doesn’t store the objects directly so there is no “max size” object that it can handle. It has a reference to them, though this is under the covers from a programming viewpoint.

You may want to look at the GEAR documentation, specifically, the “Data Synchronization - XXXX” docs. GEAR material can be downloaded from Advantage. The docs describe a couple of data synchronization scenarios, including what you’re trying to do with keeping two DBs in sync. It describes a technique using “latching”, which uses logical locks to prevent contention and circular updates. There are built-in services, in pub.synchronization in WmPublic, to support this.

HTH.

Hi Rob,
Have gone through the pdf’s, here follows my understanding from “DatasynchronizationXXXX.pdf”[correct me if am wrong :)]
NativeID - UniqueID of the source db table/Target db table.
CorrelationID - ID to map between the source db table record & target db table record.
Key-cross reference table - Maintaining intermediate repository which has [NativeID, CanonicalID] mapping (They are maintained using db). Entries will be made in key-cross reference table, when the records are generated in source or target & it will be used during update/delete both in target/source db.
In our project: Records were already present in both db tables, so creating an entry in “key-cross reference table” is no possible & also I presume it’s time consuming.
Smart trigger - As name implies it’s a trigger which is smart enough to avoid the echo update by wM(To make the trigger as smart trigger some business changes has to be done in source/target systems where we are doing bi-directional update)
In our project: Thought of including a column in source/target db tables (which tells who updated that record like that, i.e. webMethods or their own applications)

BTW, planned to use hash table for temporary entry to avoid clash in concurrent bi-directional data sync.(If any issue comes we can send the notification to the concern team to take action. Issues like n/w problem, db not reachable, or record not present in target db table to do the update in target, by that concern team will do the update manually in target db).
In one-way sync, not using hash table, just update in target, if any failure NOTIFICATION will be sent.
Regards,
Sam

That sounds right. It seemed from your description that you didn’t need to worry about key cross-referencing so you can ignore that stuff.

Another possible technique to avoid update contention is to carry a last-updated field on the tables and in the event (probably already have this?). When processing the event to update the target table if the record in the table has an updated date that is after the event updated date, then there is a collision.

Hi Rob,

Hope you are meaning that when the update about to happen in target system(because of some update in source), other update happened in target system by it’s own application(update which happened after the update in source. i.e. updated target system time is greater than source updated time). For this case we are about to use HASHTABLE with a column “lock flag” value as Y/N in hashtable. [If my understanding as above is wrong on your statement “When processing the event to update the target table if the record in the table has an updated date that is after the event updated date, then there is a collision.” please tell me again]

when the source system updated, NOTIFICATION will be triggered in turn flow service which makes an entry in hashtable with status as N, before doing the update in target, that too only if, the same entry(using primary key which is unique b/w source and target) is not present in hashtable, if entry already present with “lock status” as N, it means some update process for the same primarykey is already in process in reverse direction, during this time we change the entry in hashtable for the same primary key as Y and sent a NOTIFICATION MAIL to do the manual update. which fix the issue.

If the collision dosn’t happen, the entry will be with status N and do update in target system, once that got successful, we delete the entry from HT only if status is N(so there won’t be over load in HT).

These are for Bi-directional concurrent data sync.
For one-way sync, its very plain, no hashtable, just update if any error send notification mail.

Yes, that is what I meant.

Let’s back up a bit.

Here are the needs:

  • N-way data sync (2-way at the moment). Updates can originate from any system.
  • Detect collisions when changes are made in mulitple places simultaneously or nearly simultaneously.
  • Prevent echo updates.

You mentioned that you don’t want to use a DB to track this due to peformance concerns. Additional considerations:

  • A hashtable approach will work with just 1 IS, serving as both source and target for the integrations. If performance or availability demands require the addition of another IS, with source and target components hosted on different IS instances, the solution will need to be reimplemented.

  • A hashtable will lose the current state if IS fails or is taken down. Collisions may be missed when IS is restored.

The latching facilities provided by IS use DB tables. This supports the use of multiple IS instances and will maintain state over IS restarts. While there will be a performance hit, you’ll need to guage which scenario is less desirable–slower sync’s or missed collisions. You’ll also want to consider if the performance hit is significant enough to matter.

Chapter 10 in the Publish-Subscribe Developer’s Guide describes latching/echo suppression far better than the Data Sync docs (sorry for steering you to the wrong docs). It specifically mentions near-simultaneous updates and how to address that, though the advice is a bit thin.

IMO, you’ll want to seriously evaluate what the latching services and DB table provide before rolling your own implementation. You can ignore all the stuff about key cross-references–deferring until you have two or more systems that don’t use the same keys.

HTH

Yes Rob, Db is more reliable than HT I agree (considering this point…)

BTW, Hashtable can’t be used in cluster. i.e., if there are 2 IS in cluster, and if we create the HASHTABLE in startup service, do they use only one Hashtable or each IS will have their own Hashtable to use… Comment please.

Each IS would have its own hashtable, completely unaware of the other.

With the latch services, assuming both instances are configured to use the same DB, then each instance will be aware of all latches.

Hi Rob,

Atlast we are about to use database [not hashtable]. Thanks for your help/information on this.