1)We need to fetch almost 5k records from one database A
2)compare this 5k to an another table in other database B which has around 3 lakhs records
3) return the list of records from table in database B which doesn’t match this 5k records from database A.
We used “not in” in select query but the performance is very poor.
Doing further R and D , we thought of creating a temporary table in the database B and inserting this 5k into that and writing nested query.
But, inserting 5 k records every time the service being called takes much time and batch insert didn’t work as we want the temporary table to delete the rows after that session(that is delete on commit).Batch insert commits automatically so we couldn’t see any records after batch insert.
Db link is suggested by few but we are not sure how far it is feasible.
Could you please suggest any alternative way to implement this.
The overall idea is to fetch ID(users who have installed the app) from one table in database A and compare it with ID in a table(contains total list of users) in database B and give the list of ID(list of users who have not installed the app) other than the ID fetched from database A.
The adapter service is implemented as below.
1)dynamic sql to fetch 5k records from Database A
2)created a where clause for these records in a comma separated string
3)passed this string as "where columnname NOT IN " to another dynamic sql query for database B.
Now this NOT IN query causes performance issues as it takes so much time to retreive and sometimes get heap error too.
So , decided to create a temporary table in database B and insert this 5k and call as nested selected query but i was not able to insert as batch insert commits after insert . and data in temp table goes after it is committed.
Also,DBA team advised not to use DB link as it is not a feasible approach.
IntegrationServer (incl. ART) = 9.0
JDBC Adapter =6.5
Database(s) = oracle
Database driver(s) = you mean the jar files which we place in lib directory?
yes, I was referring to the jar files in lib directory (of JDBC-Adapter usually as long as they are not required elsewhere).
Dependent on the JVM your are running you should most likely be using an ojdcb6.jar from JDBC Driver 11g R2.
Inside the Jar there is a META-INF/MANIFEST.MF, which contains a tag for the exact implementation version.
Which version is the oracle database running at?
Any Fixes applied to IS, ART, JDBC-Adapter?
Can you try to execute the second query with the comma-separated where clause in Oracle SQL Developer to extract the execution plan?
This will show why this takes so long. Most likely there are some indices missing on the columns used for comprarison causing a full table search instead of an indexed one.
What should happen wtih the result list containing the ertries not in second db?
One option I currently think of is to query both databases independently (for the second one only the fields needed for the comparison should be retrieved). Then do the comparison in a custom java service on IS.
Maybe other members have other approaches available for this.
Are you using a Dynamic or a static SQL adapter service?
Can you get the exact SQL sequence the IS is sending to the server and pass it through a performance analyser (EXPLAIN in Oracle, etc)?
If you can analise the time, cpu, io, memory and network of the call, are you reaching any performance ceiling?
Flow services are not the best choice for data intensive comparison/manipulation; I would suggest you bring all your data into a memory structure and send that to a Java service (depending on the free RAM you have - here I’m basically converting those databases into in-memory databases).
Profiling your code can also pinpoint performance problems.
The way you described, you are doing three data-transfer bulk operations (read A, send compacted A to B, read query of A-B); now, depending of size and complexity of your data, you may have to find a simpler way.
As in all very large data operations, you should also ask “I am doing the right questions?”. For the final application result you want, is there another, simpler, way to achieve your goal?
ORDER BY should not be that expensive when used for an indexed field.
The SELECT itself usually is the most expensive part dependent on if the fields to be retrieved and/or compared are indexed or not and of the size (number of rows) of the table.
Therefore first of all we need the “Explain Plan” for the query.
When using a Dynamic SQL or Custom SQL template, you can pass a complete where clause (Custom SQL) or the complete List of IDs (Dynamic SQL) as a Parameter.
Can you provide some snapshots of your adapter services?