Tokenize Problems Regex

Hi,

I’m haing a problem tokenizing a flat file using the webMethods services.

The file contains a number of fields (some may be empty):

field0| field1| field2||field4|||field7

I would expect pub.string:tokenize to give me a list as (when tokenizing by |):
0 field0
1 field1
2 field2
3 null or blank
4 field4
5 null or blank
6 null or blank
7 field5

However, it is skipping the null values to give me:
0 field0
1 field1
2 field2
3 field4
4 field5

I considered using pub.string:replace to replace all “||” with “| |”, but I also need to check for “|||”, “||||”, etc. I am having difficulty using a regex search to solve this problem.

Any help would greatly be appreciated.

Hello,
Your issue starts with the fact that string:tokenize that you use is probably based on the Java StringTokenizer class which does the same annoying thing. I have a simple tokenize flow service. 100% flow using only WM flows in transformers.

Good day.
Yemi Bedu

makeTokens
makeTokens.zip (2.3 k)

Loop it.

Do a pub.string:replace, then do a Repeat, checking for pub.string:indexOf (substring = “||”). Do a branch, and break from the loop if %value% == -1. The $default branch of the loop is another pub.string:replace.

The structure looks like this:

replace (searchString = “||”, replaceString = “| |”)
REPEAT (repeat-on SUCCESS)
__indexOf (substring = “||”)
__BRANCH (switch = “/value”)
____-1: SEQUENCE
______EXIT ‘$loop’
____$default: SEQUENCE
______replace (searchString = “||”, replaceString = “| |”)
tokenize (delim = “|”)

Try pub.string:replace with the following inputs:

searchString: “|||”
replaceString: "| "
useRegex: true

  • Don’t include the " " - just to illustrate space in replaceString

It will convert:
field0| field1| field2||field4|||field7
into
field0| field1| field2| | field4| | | field7

Now pub.string:tokenize will give you 8 elements (but you’ll have to trim them for the space added.