By “how much jvm” I assume you mean CPU and memory allocated to the JVM.
While this is a factor to a degree, the other concern is loading a 700MB - 1GB file completely into memory. And having the content of that file replicated 2 or more times within memory – e.g. during read, copied during mapping, etc. The data could be in memory 2-3 times.
The flat file and batch insert services (and XML and document services) tend to lead to solutions that load everything into memory at once. For most “event-driven” solutions (using this term loosely) this isn’t a concern. But when dealing with a large amount of data, such as in this case, it may cause a failure by exhausting memory, particularly if the JVM is doing other work at the same time.
The key here is to structure the reading and writing of the data in chunks, never all at once. If you have a single document list that holds all of the records, you’ve read everything into memory. Using a stream to read the data is but one step to follow.
- Use stream to read the data.
- Use iteration to read X records at a time. The number can vary, depending on record size, memory available to the JVM, etc. Testing will help you determine the “optimal” batch size.
- Write those records using batch insert.
- Repeat 2 - 3 until stream is EOF.