Splitting Flat Files

Hi Experts,

I hope you can help. :slight_smile: I have a flat file that comes in from the mainframe. The layout looks like this:

H<header>
E<email>
S<subject>
B<body>
B<body>
A<attachment>
A<attachment>
A<attachment>
E<email>
S<subject>
B<body>
B<body>
A<attachment>
A<attachment>
A<attachment>
T<transid>

I need to split the file in multiple files so I get something like this:

File1:
E<email>
S<subject>
B<body>
B<body>
A<attachment>
A<attachment>
A<attachment>

File2:
E<email>
S<subject>
B<body>
B<body>
A<attachment>
A<attachment>
A<attachment>

And save the files into another directory. Anyone has an idea if this is possible?

Thank You,
Sebastian

Sebastian,

After getting the file from mainframe, use pub.string:tokenize and specify the delimiter as ‘E’ and you’ll get a value list.In your case, you’ll have 3 entries in the value list.check the starting character of each entry in the value list and if it starts with ‘E’, write the value list entry to a file.

HTH
ramesh.

Hmm, it didn’t work too well b/c when I put in “E” as delimiter then if for example the email address has an “E” within then it is not processed correctly. How can I determine the delimiter as this first character only?

I don’t think the flat file adapter can accomplish what you want. Why is the email coming as a flat file? Can you not send the email directly to Wm and then use the SMTP services to get the contents and so forth.

Actually what Ramesh propose would work great b/c I am trying to divide the content of the flat file into pieces and then save it back into to text file. I guess what I am looking for is a regular expression that I can put in the delimiter that will searche for an “E” in the first string only of each line. Can we use regular expression as delimiter? Any one knows what this Regular Expression should look like? Many THanks!

correction: it will search for “E” as the first letter in the first string of the line.

I tried: /[1]/ and ^E but these are no good b/c they look into all strings of the given line.


  1. Ee ↩︎

This may not be the most elegant approach, but it will work. You will need to fill in the details yourself. Read in the file contents, convert them to a string and tokenize the string. Loop over the resulting list of values and look at the first character of each (you can use pub.string:substring). Do nothing for H and T and append all others to a new string list. If the first character is E, increment a counter. If it’s not the first iteration, use pub.string:makeString on the new list and drop the new list variable. Write the resulting string to a unique file name. Do one more write after your loop to catch the last iteration. Hope this helps,

Tim

you could also try

E<

as your expression… It should work

I. Monzon,

That is almost perfect but now what happened: in the valueList the white spaces between the strings were removed and the capitol "E"s were removed from other strings. Other than that that would work perfectly. Should I add something to it?

Thanks!!
Sebastian

It appears from your original post that the file comes in with some kind of carriage return or newline character between each line. If this is true, you can use the tokenize service without specifying a delimiter and it will give you a string list with each line as a separate element. You can then manipulate this list in a variety of ways to get what you want. If this is not the case, and the lines are not of fixed length, then it will be more difficult to parse the text.

Tim

Tim,

Yes, the FlatFIle has carriage return on each line. If I tokenize without specifying a delimiter how can I then divide the file?

Thanks,
Sebastian

Sebastian,

I think you cannot use regex as a delimiter.sending a sample flow.

HTH
ramesh.


splitFile.zip (5.8 k)

Sebastian,

Forgot to mention a point.edit the psutilities config file and specify the allowed path string.let me know if you have any problems with the flow i sent.

ramesh.

You can loop over the list of values and do several different things. In the service I described earlier, I took the substring of the first character of each value, stored it in a variable and branched on its possible values. I was able to produce the output you require.

Tim

I dont think you can use regular exp as the delimiter for the tokenize service.i tried using E< as Gordon suggested, but that checks for ‘E’ in the middle of the string also.i think what sebastian wanted was to check only at the beginning of the string.
correct me if i’m wrong.

ramesh.

Any text specified as the delimiter will have all occurrences of this text removed by the tokenize service. If this is undesirable, as appears to be the case here, then it’s better not to specify a delimiter and let tokenize split the file into values that can be manipulated.

Tim

Try this,

Pub.string.replace - replace E< by ***E<
Tokenize on ***

you can use anything like *** as replacement separator.
Just make sure that you makeit as unique as possible, something like
SEBASTIANSEPARATOR or SSSSSSEBASTIANSSSSS, you get the Idea
It is the simpler way even thought It is not very polished.

You could also try to work with pub.string.indexOf to get position for your ocurrences and then work with a pub.string.substring before writing to file… but I think the above approach is a good workaround.

hth

Try replacing E<email> by ***E<email>

and then follow as I explained
This should get rid of your capital letters problem.

I put ~E as separator and works perfectly. Thank You guys!