webMethods parsing flat file with comma in the field

webMethods 10.15 Integration Server on premise

I need to parse a flat delimited file in webMethods using a flat file schema where a delimiter value (a comma) can be present in a field as a "real comma and not as a delimiter like this:

“Shipping date”,“End Point”,“Plant”,“Ship-to”
“2025-03-12”,“ES-VIL”,“SUF”,“9347 Martinrea Honsel Spain, S.L.U.”

As you can see, there is a comma here: Spain, S.L.U.

How can I specify, that this kind of comma (inside the surrounding quotes) is NOT a delimiter? I have tried to use the Quoted Release Character and also the Release Character but to my understanding those are only applicable when creating a flat file and not when parsing one.

Kind regards Mikael from DFDS

I found the reason: Setting Quotes Release Character to " did the trick

2 Likes

Hi @Mikael_Lund ,

Thanks for sharing the info. Can you please share the variations of the Quoted Release character you had tried and which failed to parse delimiter comma coming in the actual value

1 Like

People may find the info at https://www.ietf.org/rfc/rfc4180.txt helpful. This describes a commonly used way to structure delimited files in a way that preserves all data.The flat file parser of wM IS is very flexible, and supports the approach described.

This summary may be helpful:

Definitions

A field is an atomic piece of data. For example, a field might be an account number or a person’s last name.

A record consists of one or more fields that are logically related.

The structure of a record defines its record type, which can be named using a record identifier. For example, an invoice header, invoice line item, header and trailer records, etc.

A field delimiter separates fields within a record.

A quoted release character is used to surround field data so that if a delimiter appears within the field data it is treated as its literal value. Any delimiter characters that appear within field that is surrounded by the quoted release character will not be treated as delimiters. Usually the " character (0x22).

A named record has a record identifier present as a field within the record.

An anonymous record does not have a record identifier present as a field within the record. Only one record type that is anonymous can exist within a single flat file.

A flat file consists of a list of records.

A record delimiter separates records within a flat file.

Flat File Rules

The format rules below assure that any data content can be successfully transmitted between systems.

Records are separated by the record delimiter. For example:

aaa|bbb|ccc CRLF
xxx|yyy|zzz CRLF

The last record should be terminated by the record delimiter.

Each record contains one or more fields separated by the field delimiter. Empty trailing fields may be omitted. “Extra” fields beyond the last defined field (specified separately) can be ignored by read processes (parser dependent – wM IS flat file parser can do this). For example, if 3 fields are defined, any fields beyond (bold fields) are ignored:

aaa|bbb CRLF
xxx|yyy|zzz|mmm|nnn CRLF

Leading and trailing spaces of a field can be ignored by read processes. (parser dependent – wM IS flat file parser can do this)

Each field may be enclosed in double quotes. The double quotes are not considered part of the field data. For example:

aaa|“bbb”|ccc CRLF

If data within a field contains a record delimiter, field delimiter or double quote, the field must be enclosed in double quotes. For example, the shaded lines are one record:

aaa|“b a line CRLF
break in field 2”|ccc CRLF
xxx|“y|yy”|zzz CRLF

A double quote within the data must be preceded by another double quote. For example:

“aaa”|“b”“bb”|“ccc” CRLF

Common challenges

More often than not, people forget about specifying the record delimiter. It is very common for the record delimiter to be CRLF Or LF (Unix-driven). People assume it is always one of those. But it does not have to be. It can be any byte/byte sequence. Some parsers treat CRLF and LF the same. Some will look for LF and treat the CR as field data. So it is important to explicitly specify the record delimiter, not just the field delimiter.

Surrounding all columns with quotes is something that is done more often than necessary. Unless the field contains a delimiter (field or record) or a quoted release character, surrounding the field in quotes is not needed. For large files, unnecessary quoting can add significant size to the file. Systems generating delimited files often don’t provide control at this level, unfortunately. But if you can control it, place quotes around a field only when needed. wM IS flat file generation supports this.

Avoid using Excel to edit delimited files. The “help” it provides changes things in undesired ways more often than not.

1 Like