Processing an excel file retrieved via FTP

I feel like I must be missing something obvious here…

What I want to do: Have a scheduled service that logs into an ftp server, downloads all the excel files there, and processes them one at a time.

What I’ve done:
I grabbed the EXCEL_contentHandler package from the shareware section, and the appropriate poi jars (poi-3.2-FINAL-20081019.jar, commons-logging-1.1.jar, junit-3.8.1.jar and log4j-1.2.13.jar) are in my classpath. I was able to successfully compile the MSExcelDocumentToRecord java services etc.

When I build up the very simple flow of:
1: pub.client.ftp.login
2: pub.client.ftp.ls
3: loop over /dirlist
–4: pub.client.ftp.get
–5: MSExcelDocumentToRecord
6: pub.client.ftp.logout

I can’t figure out what to map to the input of MSExcelDocumentToRecord - when I map the *content from the pub.client.ftp.get to the *file_stream input, I get a class cast exception, which makes sense to me, because I don’t actually have a stream - I have the whole file.

If I set the “large file threshold” on the ftp.get to low enough to have *content_stream populated, I get null in my results - content_stream appears to contain the string “com.wm.util.tspace.Reservation_FileImpl$ReservationInputStream”

I know just enough java to be dangerous, so I’m in over my head here. Anyone have any ideas of where I’ve gone wrong?

Greg

I’m not sure if this will help as I’m a newbie myself, but what I’m doing to obtain files from an FTP is with the following flow:

ftp (complete service, so I just type the parameters and the service will do everything)

then I do a getFile service, which opens up a local file and outputs the bytes as a byte array or as a byte stream,

then I do a stringToDocument and finally a documentToRecord service.

So maybe you could do a loop over dirlist and inside the loop use the getFile method to produce either a byte array and use itlike I do or a stream so you can plug it in your other services.

Thanks,

I was hoping to avoid writing a file to the filesystem, with all the potential pitfalls that that involves. It is increasingly looking like I’m going to have to do that though.

Well, as far as I understand it, the FTP service will always write the file to disk, is it not?

So if you are already doing it, you just need to read the file and pass the byte array/stream to the Excel service, no?

when you call pub.client.ftp.get, don’t give value for localFile parameter and set the largefilethreshold to a large enough value, by default it will get the whole file as bytes in the *content, then you can direct use it, no local file writing.

breaking the ftp up, you don’t have to write the file to disk (which is what I did from the start) Unfortunately it looks like my problem is on the excel side of things, since even when I have an excel file in the filesystem and specify the filename directly, I get null.

I’ll have to do some more reading on here to see if I can figure out where I’ve gone wrong.

Thanks