JDBC : Select multiple records using NOT in query

Hi All,

We have the below scenario.

1)We need to fetch almost 5k records from one database A
2)compare this 5k to an another table in other database B which has around 3 lakhs records
3) return the list of records from table in database B which doesn’t match this 5k records from database A.

We used “not in” in select query but the performance is very poor.

Doing further R and D , we thought of creating a temporary table in the database B and inserting this 5k into that and writing nested query.

But, inserting 5 k records every time the service being called takes much time and batch insert didn’t work as we want the temporary table to delete the rows after that session(that is delete on commit).Batch insert commits automatically so we couldn’t see any records after batch insert.

Db link is suggested by few but we are not sure how far it is feasible.

Could you please suggest any alternative way to implement this.

Thanks,
Sathya

Hi Sathya,

can you describe this a bit more in detail please?
Especially the adapter services implemented so far and an outline of you flow service.

How complex is the condition you need to check to determine if the entry is in the second database or not?

Please provide some more informations about versions (+ FIxes if available):

  • IntegrationServer (incl. ART)
  • JDBC Adapter
  • Database(s)
  • Database driver(s)

Regards,
Holger

Hi Holger,

The overall idea is to fetch ID(users who have installed the app) from one table in database A and compare it with ID in a table(contains total list of users) in database B and give the list of ID(list of users who have not installed the app) other than the ID fetched from database A.

The adapter service is implemented as below.
1)dynamic sql to fetch 5k records from Database A
2)created a where clause for these records in a comma separated string
3)passed this string as "where columnname NOT IN " to another dynamic sql query for database B.

Now this NOT IN query causes performance issues as it takes so much time to retreive and sometimes get heap error too.

So , decided to create a temporary table in database B and insert this 5k and call as nested selected query but i was not able to insert as batch insert commits after insert . and data in temp table goes after it is committed.

Also,DBA team advised not to use DB link as it is not a feasible approach.

  • IntegrationServer (incl. ART) = 9.0
  • JDBC Adapter =6.5
  • Database(s) = oracle
  • Database driver(s) = you mean the jar files which we place in lib directory?

Thanks,
Sathya

Hi Sathya,

yes, I was referring to the jar files in lib directory (of JDBC-Adapter usually as long as they are not required elsewhere).

Dependent on the JVM your are running you should most likely be using an ojdcb6.jar from JDBC Driver 11g R2.
Inside the Jar there is a META-INF/MANIFEST.MF, which contains a tag for the exact implementation version.

Which version is the oracle database running at?

Any Fixes applied to IS, ART, JDBC-Adapter?

Can you try to execute the second query with the comma-separated where clause in Oracle SQL Developer to extract the execution plan?
This will show why this takes so long. Most likely there are some indices missing on the columns used for comprarison causing a full table search instead of an indexed one.

What should happen wtih the result list containing the ertries not in second db?

One option I currently think of is to query both databases independently (for the second one only the fields needed for the comparison should be retrieved). Then do the comparison in a custom java service on IS.

Maybe other members have other approaches available for this.

Regards,
Holger

Hi,

The where in clause executes faster.

We use order By in the query and I guess that must be causing the performance issue.

Moreover, it takes so much time when I use sql developer like 3 minutes but in designer it either takes 10 mins or heap error.

I will check on the details you asked and update the post by tomorrow.

The results from second db will be used to send SMS to the ID’s(ID’s who have not installed the app), where as the result from first DB will be used to send push notification to ID installed the app.

Thanks,
Sathya

As a suggestion, how about exploring the below options:

1> Using optimized join query to get the match and mismatch data from table A and table B. Take the help from DBA for optimized sql query

2> Using some third party tool and put the match and mismatch data in a table and let wM pick the records from that table to process further

3> Else ask DBA to fetch the match and mismatch data and provide you the table with records to process in wM

Hi,

Are you using a Dynamic or a static SQL adapter service?

Can you get the exact SQL sequence the IS is sending to the server and pass it through a performance analyser (EXPLAIN in Oracle, etc)?

If you can analise the time, cpu, io, memory and network of the call, are you reaching any performance ceiling?

Flow services are not the best choice for data intensive comparison/manipulation; I would suggest you bring all your data into a memory structure and send that to a Java service (depending on the free RAM you have - here I’m basically converting those databases into in-memory databases).

Profiling your code can also pinpoint performance problems.

The way you described, you are doing three data-transfer bulk operations (read A, send compacted A to B, read query of A-B); now, depending of size and complexity of your data, you may have to find a simpler way.

As in all very large data operations, you should also ask “I am doing the right questions?”. For the final application result you want, is there another, simpler, way to achieve your goal?

Good luck.

Hi,

ORDER BY should not be that expensive when used for an indexed field.
The SELECT itself usually is the most expensive part dependent on if the fields to be retrieved and/or compared are indexed or not and of the size (number of rows) of the table.

Therefore first of all we need the “Explain Plan” for the query.

When using a Dynamic SQL or Custom SQL template, you can pass a complete where clause (Custom SQL) or the complete List of IDs (Dynamic SQL) as a Parameter.

Can you provide some snapshots of your adapter services?

Regards,
Holger

Hi,

I am using a dynamic sql query and passing ID’s in where clause which I created from the ID retrieved from Database A. I did the explain plan as you said. PFA the same.

Thanks,
Sathya
testing.html (171 KB)

Hi,

Thanks for the reply.
But, I believe even that takes more time as it need to compare with 3 to 5 lakh records.

I will try and update you. I have shared the explain plan in the below reply .

Thanks,
Sathya

Apologize for the reply from Other ID as I didn’t notice that other ID is already logged in my system.

As mentioned in above replies can you please give your views on the explain plan results

thanks,
Sathya

Hi Sathya,

unfortunately the explain plan is incomplete.

First thing to check for is verifiying all the Hash Joins if the fields used for comparison do have indices.

If they are primary keys for the table they should have an index by default.
If not you should add indices for these columns.

Regards,
Holger