Excel file type - HTML

Hi everyone,
I have a client who sends an Excel document via email - I am using WM to save the attachment, I then use a parsing tool: Itemfield to parse the document into an XML document and then back again to WM to read the XML document into a SQL database.

I have just ran into an issue that I can’t seem to solve…I have just discovered the client is saving the Excel document as a web page…issue: the Itemfield programming was created using the Excel being saved as Microsoft Excel workbook (testing file was created slightly different than production file) - the difference is the source behind each type. I could go through the daunty task of rewritting the Itemfield code but I would like to know if anyone knows if I could use WM to convert the file from the email into the proper file type.

Any ideas?
Tracey

I think a better solution would be for the client to save the file as a workbook instead of as an web page. I don’t know of a dependable way to parse Microsoft Office HTML into an XML document and would hate to have to try.

Mark

Thanks Mark - I agree but the client isn’t quite as agreable. I am working on them but I need to explore other options.

Once you have the excel attachment(stream object ffData) to your flow servic make use of WmFlatFile adapter parsing technique.

so create a flatfile(CSV) Dictionary/FFSchema according to excel file format and parse the document to IDATA document with pub.flatfile:convertToValues and map to a canonical document (XML) finally use DocumentToXMLString to create XML file as expected.

This is bit work around,not sure you agree with it.

HTH,
RMG

I’m willing to try it but I have a question: the Excel file has headings at the top - will this interfer with saving the file as a flat file?

TT

Tracey,

There are libraries (discussed in various threads here on wMUsers) that can parse an Excel file in its native format (.xls).

If your client can export the excel workbook (assuming there is only one worksheet in the workbook) into CSV format, then you can use the WmFlatFiles package to process it including any header rows that may be present. The FF package services are discussed here in detail as well.

There is no benefit in their saving the file as HTML and, in fact, that makes your job much, much more difficult.

HTH,

Mark

Unfortunately our client was previously sending the document in CSV format and everything was great - they changed their process and are now sending the sheets from an automated web program that creates this lovely HTML excel sheets…so this is what I’m stuck with:(

I am going to continue chatting with them to enforce the importance of us at least getting the Excel as the proper Microsoft Workbook - we’ll see what happens. Thank you for your input and ideas - much appreciated!!!

TT

I just found out the client is originally saving the document as an XML spreadsheet - has anyone worked with these before? I’m going to request a copy and see if I can work with it instead of the XLS file.

Tracey,

I haven’t worked with XMLSpreadsheet data, but at least you should be able to deal with that more easily than HTML. Microsoft XML is bloated for spreadsheet formatting information (fonts and the like), but you should be able to use pub.xml:queryXMLNode to extract only the portions you care about or if things are really ugly, you could use an XSL style sheet to transform the document into one with only the data without any formatting tags.

See these links for info on Microsoft’s XMLSpreadsheet support:

Mark

XML spreadsheet?never heard about this…Thanks mark for pointing links…

I believe doing XSLT transformation to standard XML is a good option for this task,Tracey you should start with this.If it could be in CVS format life would be more easier.

HTH,
RMG

I would only fall back to XSLT, if the data from the spreadsheet could not be extracted easily from all of the font, color, border formatting tags.

It could be that the data in question might be easy to extract with pub.xml:queryXMLNode in which case there would be no need to complicate things with XLST.

I’m so excited - I had ask for some advice from the instructor I had during training with and his solution was the following:

  • Use “pub.file:getFile” (and specify the file to get)
  • Then, “pub.string:bytesToString” (mapping getFiles’ “body/bytes” to the input
  • Then, “pub.xml:xmlStringToXMLNode” (mapping the “string” to “xmldata”)
  • Then, “XMLNodeToDocument” to get the info from the file.

It works! I can read the data - I just need to read the data into the database - let’s hope the client can send this sheet!

Thanks again for all your advice!
Tracey