HELP - commas in the data of a comma delimited text file

Needing assistance please. I’ve spent the entire day trying to figure this out and need some direction.
Let’s say I have a comma delimited .txt file like so (real file would have hundreds of lines, but for brevity, I mentioned just two below):

Joe,Johnson,240 Benson Drive,Pittsburgh,PA,Lily,Jessica,Rich
Richard,Smith,19 Slide Street,Charlotte,NC,George,Jimmy

First line shows basic info about Joe and (Lily,Jessica,Rich) are his children. Children field.
Second line shows basic info about Richard whereas George and Jimmy are his children. Children field.

The customer is sending the information like this, whereas I complained about the fact that commas were being shown in the children field, instead of being presented like this “Lily,Jessica,Rich” or Lily,Jessica,Rich. I feel strongly they need to fix the file, but they are saying that they can’t help me.

Because I don’t know any better and don’t know if I should take this issue to a higher level, I looked to see if there’s anything I can do to help. These children fields will always be at the end of each record. I cannot figure out how / what the correct regular expression would be to perhaps say (after I do a getFile/bytesToString - find all text after the 5th comma and wrap the text that follows to the end of the record in double quotes" or "find all text after the 5th comma and change all commas after that to a different delimiter like a pipe.

Joe,Johnson,240 Benson Drive,Pittsburgh,PA,“Lily,Jessica,Rich”
Richard,Smith,19 Slide Street,Charlotte,NC,“George,Jimmy” or…

Joe,Johnson,240 Benson Drive,Pittsburgh,PA,Lily|Jessica|Rich
Richard,Smith,19 Slide Street,Charlotte,NC,George|Jimmy or…

I would think that maybe the string replace command could help but can’t think of the correct regular expression.

Please advise if I’m thinking about this the right way or if you have other suggestion. I have already looked at the guide on regular expressions and still can’t figure this out.

Thanks.

Hi Joe,

primarily you are right with the assumption that the file format should be changed/corrected to mark anything behind the 5th comma as one field regardless if it contains further field delimiters or not.

But if I remember right it should be possible in the FlatFile Dictionary and/or FlatFile Schema to define a field which should take the rest of the line after a certain number of fields were separated.

This last field of such a record can than be tokenized to split the name of the children into a list of names for further processing.

Have a look at the FlatFile-Developers Guides (for working with flat files) and the IS Built-In-Services Guide (for tokenize).

Regards,
Holger