Can WmFlatFile handle something like this?

Hello - I need to parse a very simple comma-delimited file containing multiple orders. Each row represents one order in the format below (somewhat simplified):


OrderNum,Date,ItemA,Qty,ItemB,Qty,ItemC,Qty,... (upto 200 line items)

I need to parse this row into the usual IData structure:


Header
 - Order Number
 - Order Date
Line (array of upto 200 lines)
 - Item ID
 - Quantity

As far as I know, WmFlatFile cannot parse this file as the delimiters for the header and line elements are identical (comma). Is my assumption correct? Am I better off writing a Java service?

The flat file services can parse this. Just not directly to the record structure you’re looking for. The flat file schema definition should reflect the structure of the flat file, not the desired structure.

Thus, the record definition would look like:

Order

  • OrderNum
  • Date
  • ItemA
  • QtyA
  • ItemB
  • QtyB
    etc.

Then you can map these records to the desired document structure.

If you’re not inclined to define 200 repeating line item fields you could define the flat file schema such that the fields after Date are returned in undefdata. Then you can use a tokenize or split service to parse the item fields and other utility services to convert to the document list.

Hey Rob - thanks for your insight.

Yes, I was trying to avoid building that 200-record flat file schema. :slight_smile:

The alternative of mapping out unparsed data from ‘undefdata’ using a tokenize service didn’t sit right either. I was considering ‘ps.util.string:parse’ (it has a more correct approach to missing fields, compared to ‘pub.string:tokenize’.) However, tokenizers don’t handle CSV parsing natively (e.g. parsing quoted CSV data with embedded ‘,’).

So I ended up ditching WmFlatFile altogether, and using this open-source SourceForge package:
‘CSV4180 - CSV parser based on RFC 4180’
[url]https://sourceforge.net/projects/csv4180/[/url]

I imported their JAR into my webMethods package and wrote the two Java services below. (Put the jar under /code/jars. Import ‘com.sunsetbrew.csv4180.*’)

My flow uses these services to process the CSV row by row, generating an order for each row.

[‘getCSVReader’ Java Service - create CSVReader object from inputStream]


IDataCursor cursor = pipeline.getCursor();

if (cursor.first("inputStream"))
{
  InputStream inputStream = (InputStream) cursor.getValue();
  // Create a BufferedReader from the input stream
  BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
  CSVReader csvReader = new CSVReader( reader);
  IDataUtil.put(cursor, "CSVReader", csvReader);
}

[ ‘parseRow’ Java service - takes ‘CSVReader’ input, returns a CSV row as string list ]


// Input pipeline
IDataCursor pipelineCursor = pipeline.getCursor();
	CSVReader csvReader = (CSVReader)IDataUtil.get( pipelineCursor, "CSVReader" );
pipelineCursor.destroy();

//temporary ArrayList variable hold parsed row data
ArrayList<String> fieldsArrayList = new ArrayList<String> ();

try{
	// Read current line's fields into the ArrayList variable
	csvReader.readFields(fieldsArrayList);
	if (csvReader.isEOF()) {
		return; // Don't populate 'field' output when EOF reached
	} 
} catch (IOException e) { 
	// promote to webMethods ServiceException used by webMethods
	throw new ServiceException(e);
}


// Build output string list
String[] field = new String[fieldsArrayList.size()];
for (int j = 0; j < field.length; j++) {
	field[j] = (String) fieldsArrayList.get(j);
}

// Output pipeline
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, "field", field);

pipelineCursor_1.destroy();

Another Java ‘Array-slicing’ utility service (using System.arraycopy) is used to extract out specific parts of the string list for mapping.

I was thinking of String.split() as it handles empty values as desired but it too does not deal with release characters/quotes which is absolutely necessary for correctly parsing delimited data. :frowning:

Any way, sounds like a reasonable approach! The flat file support in IS is decent but man editing flat file schemas a real PITA. They need to do a better editor for those. Indeed, on a prior project it was easier for me to write a service to read a CSV of the fields and generate the FF schema using built-in services. I fear editing them because it is too easy to mess up.

:slight_smile: Thanks Rob – flat file schemas scare me. They work brilliantly in many instances, but the documentation is way dense and its not clear that its unsuitable for certain cases (like this one).

Thanks again for that gem of a comment above:

In case anyone else was interested, here’s the Java array slicer utility service.

[Java service ‘getStringArraySlice’ -This service takes a string array as input and copies a specified part of it into a new string array in the output ]


// pipeline
IDataCursor pipelineCursor = pipeline.getCursor();
	String[]	sourceArray = IDataUtil.getStringArray( pipelineCursor, "sourceArray" );
	String	sourceIndex = IDataUtil.getString( pipelineCursor, "sourceIndex" );
	String	length = IDataUtil.getString( pipelineCursor, "length" );
pipelineCursor.destroy();

String[] arraySlice = new String[Integer.parseInt(length)];

System.arraycopy( sourceArray,         		// source array
                  Integer.parseInt(sourceIndex),     // source index
                  arraySlice,        // destination array
                  0,           // destination index
                  Integer.parseInt(length)   // copy length 
                );
// pipeline
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, "arraySlice", arraySlice );
pipelineCursor_1.destroy();

This flow code pulls it all together:


'getCSVReader'
REPEAT
    'parseLine'
    BRANCH on 'field' output
          $null: EXIT with success
          $default: process directly with 'getStringArraySlice' or add to an output list

“The flat file schema definition should reflect the structure of the flat file, not the desired structure.”

That line was missing a key word. It should be:

“The flat file schema definition should reflect the structure of the flat file, not the desired target structure.”

Thanks Rob - absolutely precise as usual!