Handling million DB records

Hello All,

In a project I have to transfer data from one DB to another DB through webMethods. Some of the tables have over 9 lac records that are to be transferred everyday. Now I don’t want to select all of these at a stretch as it causes memory shortage. I am thinking of selecting a bunch of records at a time based on a selection criteria (Which depends on the table field types). But I feel there can a better solution to this.

I would like to know if any of you had handled such cases. Please share your ideas, views or implementation plans.

Spcifications:
Source & Target Dbs: MS SQL Servers.
WM: 6.5
JDBC Adapter Ver: 6.0.3.1

Cheers:)
Guna

Guna,

Is there transformation occurring to the data, or any other reason for having webMethods in the middle?

Or are you just moving data from one DB to another DB each day?

If you are just batch moving/copying the data every day, webMethods can certainly do it, but it is not the best tool. Something like EMS DB Comparer for SQL Server might be more appropriate.

Now if you need to do real-time synchronization between the two DBs, then webMethods might be more appropriate.

Phil

I don’t have much exposure to core MS-SQLServer however I believe there is something called BCP (Bulk Copy Program) in MSSQLServer , please investigate and see if that can be helpful . Also SqlServer 2005 provides something named SQL Server Integration Services (SSIS) which is quite appropriate for ETL applications .

webMethods is definitely not the best idea to achieve this .

Hi Phil & dhruvwm,

Thanks for your suggestions. As per the company norms all the data and application integrations should be done through our integration layer. So I have no other way except to design a better solution for this data transfer so that the work is done and the webMethods systems are not heavily loaded. Thanks for your inputs.

Cheers
Guna

OK, since this has to happen on webM…

Is this all going to happen on one IS (meaning you select from one database and insert/update on the other database without having to go through a broker)?

Is there anything that needs to happen to the data, or is it just pure transfer?

Hello Phil,

There is no transformation as such. Its a pure transformation (apart from date format changes). We were successful in negotiating to have an identity field in the tables which have such huge records so that we can query based on the range of id values. So I got a work around here.

It would be good to have some insights to have a strategy without having additional ID field in the tables.

Cheers
Guna

Guna,

I had the same problem, when having to select large amount of data (10k+ rows having over 150 collumns) from JDE dbase to use in mapping.
The JDBC adapter had no chance pulling this amount of data to the IS, and I was getting OutOfMemory even before the JDBC adapter finished.

The workaround for this could be not to use JDBC adapters at all, and configure the incoming and outgoing connections in WmDB and then access them in java service.
This way you might be able to get the data as stream into the IS (if it’s blobs) or somehow optimize the ResultSet object returned, and have them stored on filesystem straight away or directly stream them into another java service that would perform the insert.

I have seen this working quite ok, but wM weren’t very happy of us using the WmDB connection in our own java code.

It is a pity, that Wm JDBC adapters don’t support streams as inputs/outputs.

Hope this helps

//Matt

I’ve had to deal with a similar scenario before and the solution we came up with was similar to what you suggested in your initial post:

Define a batch size (number of records) that you want to process at a time, then

  • Loop (i.e. REPEAT)
    – Flag -number of records (i.e. by updating a certain column in the table)
    – If number of records flagged is 0, then EXIT the loop
    – Retrieve the flagged records
    – Do something
    – Update/Delete the flagged records so they are not retrieved again

Most databases will give you some mechanism to update n-number of rows. From what I remember, for SQL*Server we simply used a Custom SQL Adapter and set the Maximum Row field to the number of records we wanted to process at a time.

  • Percio