People may find the info at https://www.ietf.org/rfc/rfc4180.txt helpful. This describes a commonly used way to structure delimited files in a way that preserves all data.The flat file parser of wM IS is very flexible, and supports the approach described.
This summary may be helpful:
Definitions
A field is an atomic piece of data. For example, a field might be an account number or a person’s last name.
A record consists of one or more fields that are logically related.
The structure of a record defines its record type, which can be named using a record identifier. For example, an invoice header, invoice line item, header and trailer records, etc.
A field delimiter separates fields within a record.
A quoted release character is used to surround field data so that if a delimiter appears within the field data it is treated as its literal value. Any delimiter characters that appear within field that is surrounded by the quoted release character will not be treated as delimiters. Usually the " character (0x22).
A named record has a record identifier present as a field within the record.
An anonymous record does not have a record identifier present as a field within the record. Only one record type that is anonymous can exist within a single flat file.
A flat file consists of a list of records.
A record delimiter separates records within a flat file.
Flat File Rules
The format rules below assure that any data content can be successfully transmitted between systems.
Records are separated by the record delimiter. For example:
aaa|bbb|ccc CRLF
xxx|yyy|zzz CRLF
The last record should be terminated by the record delimiter.
Each record contains one or more fields separated by the field delimiter. Empty trailing fields may be omitted. “Extra” fields beyond the last defined field (specified separately) can be ignored by read processes (parser dependent – wM IS flat file parser can do this). For example, if 3 fields are defined, any fields beyond (bold fields) are ignored:
aaa|bbb CRLF
xxx|yyy|zzz|mmm|nnn CRLF
Leading and trailing spaces of a field can be ignored by read processes. (parser dependent – wM IS flat file parser can do this)
Each field may be enclosed in double quotes. The double quotes are not considered part of the field data. For example:
aaa|“bbb”|ccc CRLF
If data within a field contains a record delimiter, field delimiter or double quote, the field must be enclosed in double quotes. For example, the shaded lines are one record:
aaa|“b a line CRLF
break in field 2”|ccc CRLF
xxx|“y|yy”|zzz CRLF
A double quote within the data must be preceded by another double quote. For example:
“aaa”|“b”“bb”|“ccc” CRLF
Common challenges
More often than not, people forget about specifying the record delimiter. It is very common for the record delimiter to be CRLF Or LF (Unix-driven). People assume it is always one of those. But it does not have to be. It can be any byte/byte sequence. Some parsers treat CRLF and LF the same. Some will look for LF and treat the CR as field data. So it is important to explicitly specify the record delimiter, not just the field delimiter.
Surrounding all columns with quotes is something that is done more often than necessary. Unless the field contains a delimiter (field or record) or a quoted release character, surrounding the field in quotes is not needed. For large files, unnecessary quoting can add significant size to the file. Systems generating delimited files often don’t provide control at this level, unfortunately. But if you can control it, place quotes around a field only when needed. wM IS flat file generation supports this.
Avoid using Excel to edit delimited files. The “help” it provides changes things in undesired ways more often than not.