Can WmFlatFile handle something like this?

sonam · September 8, 2011, 6:55am

Hello - I need to parse a very simple comma-delimited file containing multiple orders. Each row represents one order in the format below (somewhat simplified):


OrderNum,Date,ItemA,Qty,ItemB,Qty,ItemC,Qty,... (upto 200 line items)

I need to parse this row into the usual IData structure:


Header
 - Order Number
 - Order Date
Line (array of upto 200 lines)
 - Item ID
 - Quantity

As far as I know, WmFlatFile cannot parse this file as the delimiters for the header and line elements are identical (comma). Is my assumption correct? Am I better off writing a Java service?

reamon · September 8, 2011, 7:03pm

The flat file services can parse this. Just not directly to the record structure you’re looking for. The flat file schema definition should reflect the structure of the flat file, not the desired structure.

Thus, the record definition would look like:

Order

OrderNum
Date
ItemA
QtyA
ItemB
QtyB
etc.

Then you can map these records to the desired document structure.

If you’re not inclined to define 200 repeating line item fields you could define the flat file schema such that the fields after Date are returned in undefdata. Then you can use a tokenize or split service to parse the item fields and other utility services to convert to the document list.

sonam · September 9, 2011, 10:44am

Hey Rob - thanks for your insight.

Yes, I was trying to avoid building that 200-record flat file schema.

The alternative of mapping out unparsed data from ‘undefdata’ using a tokenize service didn’t sit right either. I was considering ‘ps.util.string:parse’ (it has a more correct approach to missing fields, compared to ‘pub.string:tokenize’.) However, tokenizers don’t handle CSV parsing natively (e.g. parsing quoted CSV data with embedded ‘,’).

So I ended up ditching WmFlatFile altogether, and using this open-source SourceForge package:
‘CSV4180 - CSV parser based on RFC 4180’
[url]https://sourceforge.net/projects/csv4180/[/url]

I imported their JAR into my webMethods package and wrote the two Java services below. (Put the jar under /code/jars. Import ‘com.sunsetbrew.csv4180.*’)

My flow uses these services to process the CSV row by row, generating an order for each row.

[‘getCSVReader’ Java Service - create CSVReader object from inputStream]


IDataCursor cursor = pipeline.getCursor();

if (cursor.first("inputStream"))
{
  InputStream inputStream = (InputStream) cursor.getValue();
  // Create a BufferedReader from the input stream
  BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
  CSVReader csvReader = new CSVReader( reader);
  IDataUtil.put(cursor, "CSVReader", csvReader);
}

[ ‘parseRow’ Java service - takes ‘CSVReader’ input, returns a CSV row as string list ]


// Input pipeline
IDataCursor pipelineCursor = pipeline.getCursor();
	CSVReader csvReader = (CSVReader)IDataUtil.get( pipelineCursor, "CSVReader" );
pipelineCursor.destroy();

//temporary ArrayList variable hold parsed row data
ArrayList<String> fieldsArrayList = new ArrayList<String> ();

try{
	// Read current line's fields into the ArrayList variable
	csvReader.readFields(fieldsArrayList);
	if (csvReader.isEOF()) {
		return; // Don't populate 'field' output when EOF reached
	} 
} catch (IOException e) { 
	// promote to webMethods ServiceException used by webMethods
	throw new ServiceException(e);
}


// Build output string list
String[] field = new String[fieldsArrayList.size()];
for (int j = 0; j < field.length; j++) {
	field[j] = (String) fieldsArrayList.get(j);
}

// Output pipeline
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, "field", field);

pipelineCursor_1.destroy();

Another Java ‘Array-slicing’ utility service (using System.arraycopy) is used to extract out specific parts of the string list for mapping.

reamon · September 10, 2011, 12:26am

I was thinking of String.split() as it handles empty values as desired but it too does not deal with release characters/quotes which is absolutely necessary for correctly parsing delimited data.

Any way, sounds like a reasonable approach! The flat file support in IS is decent but man editing flat file schemas a real PITA. They need to do a better editor for those. Indeed, on a prior project it was easier for me to write a service to read a CSV of the fields and generate the FF schema using built-in services. I fear editing them because it is too easy to mess up.

sonam · September 13, 2011, 7:38am

Thanks Rob – flat file schemas scare me. They work brilliantly in many instances, but the documentation is way dense and its not clear that its unsuitable for certain cases (like this one).

Thanks again for that gem of a comment above:

In case anyone else was interested, here’s the Java array slicer utility service.

[Java service ‘getStringArraySlice’ -This service takes a string array as input and copies a specified part of it into a new string array in the output ]


// pipeline
IDataCursor pipelineCursor = pipeline.getCursor();
	String[]	sourceArray = IDataUtil.getStringArray( pipelineCursor, "sourceArray" );
	String	sourceIndex = IDataUtil.getString( pipelineCursor, "sourceIndex" );
	String	length = IDataUtil.getString( pipelineCursor, "length" );
pipelineCursor.destroy();

String[] arraySlice = new String[Integer.parseInt(length)];

System.arraycopy( sourceArray,         		// source array
                  Integer.parseInt(sourceIndex),     // source index
                  arraySlice,        // destination array
                  0,           // destination index
                  Integer.parseInt(length)   // copy length 
                );
// pipeline
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, "arraySlice", arraySlice );
pipelineCursor_1.destroy();

This flow code pulls it all together:


'getCSVReader'
REPEAT
    'parseLine'
    BRANCH on 'field' output
          $null: EXIT with success
          $default: process directly with 'getStringArraySlice' or add to an output list

reamon · September 13, 2011, 6:49pm

“The flat file schema definition should reflect the structure of the flat file, not the desired structure.”

That line was missing a key word. It should be:

“The flat file schema definition should reflect the structure of the flat file, not the desired target structure.”

sonam · September 19, 2011, 10:17am

Thanks Rob - absolutely precise as usual!

Topic		Replies	Views
Flat File Parsing - Repeating Data where a line contains an 'array' webMethods , Integration-Server-and-ESB , Flow-and-Java-services	5	2286	April 2, 2021
problem with ff schema webMethods , Integration-Server-and-ESB , Flow-and-Java-services	17	3209	April 2, 2021
Question on flat file - diff header order webMethods , Integration-Server-and-ESB , Flow-and-Java-services	2	1341	April 2, 2021
WebMethods Flat File Schema webMethods , Integration-Server-and-ESB , EDI	4	2923	April 2, 2021
Flat File Processing Tutorial Knowledge base webMethods , Integration-Server-and-ESB , suite	0	8553	January 7, 2015

Can WmFlatFile handle something like this?

Related topics