Could anyone offer a suggestion to get me started? Here’s the situation:
I was just given a project whereas I will receive a .csv (comma delimited) file, but I was told that the headers could shift. Was told that one day when the contact downloads the report, the columns could be in a different order. For example, today he could download the report for me to then pick up and map and it is like this:
NAME ADDRESS CITY STATE ZIPCODE
Tomorrow, he could create the .csv for me and it could be like this:
ADDRESS NAME CITY ZIPCODE STATE
I was told that they will not do anything to address this and it will be my job to make sure that when I get the .csv file, I am properly mapping the data. Looks like they will always send the header names (ADDRESS, NAME, CITY) regardless, so I would think that this will help somewhat, but being that I’ve always dealt with flat files that always had stagnant positions, I’m currently confused as to how to approach this.
Currently thinking perhaps that I can’t make a flat file for this and will instead have to first convert the .csv to a xml and then maybe I can make this work, but I don’t see a service involving .csv to .xml. This job doesn’t want us to use Java unless it’s a last resort and we have no choice.
Any suggestions? Unfortunately, I was given a short timeline on this.
A flat file schema and the flat file services might be able to assist with some parts of this but my thinking is it would be simpler to create your own custom services to handle this type of parsing.
For this to work, and for the integration to be able to manipulate the parsed fields appropriately, you’ll need to consider the following guidelines:
Either field order or field naming must be consistent. If both vary there is no hope of automating this.
The record and field delimiters should be consistent. But depending upon the header record you may be able to dynamically determine them. For example, if the number of fields is always the same, the record delimiter will be at a constant position. Also, if the field names are always alphanum, you can determine the field delimiter by reading for the first non-alphanum.
The data file must use release and quoted release characters if delimiters can appear within the data. For example, if the field delimiter is a comma any field containing a comma must be either be enclosed in quotes or the comma in the field must be prefixed with a release char. I mention this because address fields generally can have virtually any character (including carriage returns and/or line feeds) which can mess up parsing if they are not escaped/released.
This is the rough flow I’d create, named perhaps convertToValues:
Read the header record; determine delimiters if necessary/appropriate
…readRecord (call service to read the next record)
…extractFields (call service to get fields and return an IS document)
…appendToDocumentList or if iterate return control doc and the just read document
readRecord – responsible for reading one record from the file, honoring release chars; does not parse for fields
extractFields – responsible for reading the fields of a record and placing each into a field in an IS document; the names of the fields would be passed in, read from the header reccord; this too must honor release chars
Avoid using pub.string:tokenize–it won’t do what you need. You may need to create a Java service for extractFields. You’ll need a Java service to create an IS document with dynamic field names.
Hope this helps give you an approach you can pursue.