I have a situation where in I need to read and process a tab delimited flat file available over the web, perform some business process on the data and the insert them to the database. I am aware that if it was a xml file I could use built in services like loadDocument, documentToRecord services and then process the data.
What would be the best way of reading data from a tab delimited flat file using webMethods Integration server. Are there any built in services already available in webMethods Integration server that I can readily use and make my life much simpler? Any inputs in this regard will be greatly appreciated.
Hi, Aravind. webMethods Integration Server offers a string tokenizer in the WmPublic package.
The name of the service is pub.string:tokenize. The Service In variables for this service are inString and delim. The Service Out variable is a String List.
In your particular example, you can leave the Service In variable delim empty because the default settings for the pub.string:tokenize service tell it to look for one of the following: tab (\t), carriage return (\r), or new line (\n). If your flat file delimiter was “xxx”, though, you would need to specify “xxx” as the value of delim.
Thanks Dan.
I have found that the default also includes the space character which is real annoying.
In addition, specifying “abc” will not give you “abc” as the delimiter but “a”, “b” and “c” all as delimiters. If you actually put the string in ‘delim’ in quotes, then the quote becomes a delimiter as well.
At the moment I’m playing around to see what wm will do so I may have missed something very simple in the documentation somewhere. So if you know different then I’d be more than happy to find out.
Hello,
unfortunately, that’s the way how the StringTokenizer works (delimiter can be char only, not a string).
We have had the sane problem, and we have wrote our own StringTokenizer which accepts string delimiter as well.
Regards
Dalibor
Dalibor, have you seen a performance difference when using your StringTokenizer on character-delimited strings versus using the webMethods pub.string:tokenize service?
Thanks for your feedback.
When using pub.string:tokenize, is there a way to manually specify the new line character as the only delim on the inbound, to get away from it using the space character as a default as well? Or would I have to write my own java service using StringTokenize to get around this?
In the meantime, I created the following java service and mapped charNewLine var into the delim var of pub.string:tokenize, and it does the job for now.
// pipeline
String charNewLine = “\n”;
// pipeline
IDataHashCursor pipelineCursor = pipeline.getHashCursor();
pipelineCursor.last();
pipelineCursor.insertAfter( “charNewLine”, charNewLine );
pipelineCursor.destroy();
You can set “special” characters in input parameters by using the “larger editor” in the input dialog. Right click on the text box and select “Use larger editor”. There you put a newline or tab. I believe that will work but it may actually put CR/LF if you’re on a Win box so you’ll need to test it.
Another technique that is a variation on what you put together but supports more special characters is the service I posted in another message. Review the code at [url=“wmusers.com”]wmusers.com to see if that might be useful.
Thanks Rob, that works great. FYI, I’m working on a Wintel box and didn’t get a line feed.
Dan, sorry, I didn’t perform these kind of comparisons.
How to skip a Line When reading a Flat File . It is a comma seperated file , the first line is the header description. I want to skip the header description before reading the data. I would appreciate if somebody helps me out immediately on this. Thanks in advance.
I do not know of any way in stanadrd WmFlatFile schemas and parsing to skip a line. I thnik your best bet would be to write some java code to drop a line and then send this stream to Flat file process. Or create custom content handler. Depending on where file is comming from. If you read it from disc would be best to use java services to red the file.
How about build the Header record as part of your schema/flat file record structure.
When looping thru your records, skip the first record. You can implement this logic using the BRANCH statement.
If is mandatory that you do it before reading the data then you have to write a preprocessor like Igor suggested.
Any idea how to read a blank value as a field value (maybe insert a space or something)?
eg. value1|value2||value4
inbetween value 2 and value4 there is a value 3 but in the flat file it is blank (not even a space) when after tokenize the field is skip when mapped to the column.